Re: [Wikidata] Speed of indexing new Wikidata lexemes

2020-07-03 Thread fn

Oh, dear. I forgot my own ticket :D Thanks!

On 03/07/2020 12:09, Nicolas VIGNERON wrote:

Hi,

I notice everyday the same issue. I've seen other people saying the same 
thing.


This ticket https://phabricator.wikimedia.org/T240328 reports this 
problem (or a related one at least).


Cheers,
~nicolas

Le ven. 3 juil. 2020 à 11:50, mailto:f...@imm.dtu.dk>> a 
écrit :


Dear Wikidata people,


I regularly experience that the Wikidata lexemes, forms (and perhaps
senses?) are not that quickly indexed to the degree that newly entered
Wikidata lexemes and forms are not possible to reference as property
values.

I am wondering whether other editors face the same issue? Maybe I am
the
only one bothered by the issue. I tend to use the compound and usage
example property often where this is an issue.

Labels from newly entered Q-items also suffer lag in indexing, but in
this case the Q-identifier can be used for the lookup.

I am wondering whether there is a "secret" way of referencing a newly
entered item in the same way as Q-items. I have tried, e.g., L2133,
Lexeme:L2133, https://www.wikidata.org/wiki/Lexeme:L2133, but none of
them lookup.

I believe the issue has been a particular problem since the wb_terms
table was dropped? But apparently, there was also an issue in 2018
https://phabricator.wikimedia.org/T196896 that seems to have been
solved
by Stas at that point.

I am wondering whether this is worth a Phabricator ticket?


Finn Årup Nielsen
https://people.compute.dtu.dk/faan/


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Speed of indexing new Wikidata lexemes

2020-07-03 Thread fn

Dear Wikidata people,


I regularly experience that the Wikidata lexemes, forms (and perhaps 
senses?) are not that quickly indexed to the degree that newly entered 
Wikidata lexemes and forms are not possible to reference as property 
values.


I am wondering whether other editors face the same issue? Maybe I am the 
only one bothered by the issue. I tend to use the compound and usage 
example property often where this is an issue.


Labels from newly entered Q-items also suffer lag in indexing, but in 
this case the Q-identifier can be used for the lookup.


I am wondering whether there is a "secret" way of referencing a newly 
entered item in the same way as Q-items. I have tried, e.g., L2133, 
Lexeme:L2133, https://www.wikidata.org/wiki/Lexeme:L2133, but none of 
them lookup.


I believe the issue has been a particular problem since the wb_terms 
table was dropped? But apparently, there was also an issue in 2018 
https://phabricator.wikimedia.org/T196896 that seems to have been solved 
by Stas at that point.


I am wondering whether this is worth a Phabricator ticket?


Finn Årup Nielsen
https://people.compute.dtu.dk/faan/


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Status of Wikidata Query Service

2020-02-11 Thread fn


I am sorry to bring more problems to the table, but the indexing of 
lexemes in the "ordinary" elasticsearch-based search is now also often 
slow. The Q-items are also indexed slowly, but there you can at least 
type in the Q-number in the edit field and it will lookup the item. For 
L-numbers, I have not found a way to type in and one would have to wait 
for some minutes before L-items are indexed.


An example use case is the entry of "fordømme" and "dømme" where one 
links to the other by P5238, see 
https://www.wikidata.org/wiki/Lexeme:L245454. As is apparent from the 
edit histories, I waited over 10 minutes for the indexing before I could 
link the two lexemes.



best regards
Finn Årup Nielsen

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Status of Wikidata Query Service

2020-02-07 Thread fn


Better update granularity is probably good and may be a good priority.

It is (still) unclear for me as a tool writer whether I can do anything. 
For instance it is not clear to me whether the parallel SPARQL queries 
that comes when a user visit a Scholia page is important for the load on 
WDQS (not likely) or it is miniscule (likely).


As far as I understand on http://ceur-ws.org/Vol-2073/article-03.pdf 
much of the query load comes via Magnus. I presume another big chunk is 
from the genewiki people.


If robotic queries are sources of problems then tool writers/users can 
do something. But fixing issues would require the WMF to tell if it 
really is a problem and what the problems are.



best regards
Finn

On 07/02/2020 14:32, Guillaume Lederrey wrote:

Hello all!

First of all, my apologies for the long silence. We need to do better in 
terms of communication. I'll try my best to send a monthly update from 
now on. Keep me honest, remind me if I fail.


First, we had a security incident at the end of December, which forced 
us to move from our Kafka based update stream back to the RecentChanges 
poller. The details are still private, but you will be able to get the 
full story soon on phabricator [1]. The RecentChange poller is less 
efficient and this is leading to high update lag again (just when we 
thought we had things slightly under control). We tried to mitigate this 
by improving the parallelism in the updater [2], which helped a bit, but 
not as much as we need.


Another attempt to get update lag under control is to apply back 
pressure on edits, by adding the WDQS update lag to the Wikdiata maxlag 
[6]. This is obviously less than ideal (at least as long as WDQS updates 
are lagging as often as they are), but does allow the service to recover 
from time to time. We probably need to iterate on this, provide better 
granularity, differentiate better between operations that have an impact 
on update lag and those which don't.


On the slightly better news side, we now have a much better 
understanding of the update process and of its shortcomings. The current 
process does a full diff between each updated entity and what we have in 
blazegraph. Even if a single triple needs to change, we still read tons 
of data from Blazegraph. While this approach is simple and robust, it is 
obviously not efficient. We need to rewrite the updater to take a more 
event streaming / reactive approach, and only work on the actual 
changes. This is a big chunk of work, almost a complete rewrite of the 
updater, and we need a new solution to stream changes with guaranteed 
ordering (something that our kafka queues don't offer). This is where we 
are focusing our energy at the moment, this looks like the best option 
to improve the situation in the medium term. This change will probably 
have some functional impacts [3].


Some misc things:

We have done some work to get better metrics and better understanding of 
what's going on. From collecting more metrics during the update [4] to 
loading RDF dumps into Hadoop for further analysis [5] and better 
logging of SPARQL requests. We are not focusing on this analysis until 
we are in a more stable situation regarding update lag.


We have a new team member working on WDQS. He is still ramping up, but 
we should have a bit more capacity from now on.


Some longer term thoughts:

Keeping all of Wikidata in a single graph is most probably not going to 
work long term. We have not found examples of public SPARQL endpoints 
with > 10 B triples and there is probably a good reason for that. We 
will probably need to split the graphs at some point. We don't know how 
yet (that's why we loaded the dumps into Hadoop, that might give us some 
more insight). We might expose a subgraph with only truthy statements. 
Or have language specific graphs, with only language specific labels. Or 
something completely different.


Keeping WDQS / Wikidata as open as they are at the moment might not be 
possible in the long term. We need to think if / how we want to 
implement some form of authentication and quotas. Potentially increasing 
quotas for some use cases, but keeping them strict for others. Again, we 
don't know how this will look like, but we're thinking about it.


What you can do to help:

Again, we're not sure. Of course, reducing the load (both in terms of 
edits on Wikidata and of reads on WDQS) will help. But not using those 
services makes them useless.


We suspect that some use cases are more expensive than others (a single 
property change to a large entity will require a comparatively insane 
amount of work to update it on the WDQS side). We'd like to have real 
data on the cost of various operations, but we only have guesses at this 
point.


If you've read this far, thanks a lot for your engagement!

   Have fun!

       Guillaume




[1] https://phabricator.wikimedia.org/T241410
[2] https://phabricator.wikimedia.org/T238045
[3] 

Re: [Wikidata] Weekly Summary #397 (Wikidata events)

2020-01-07 Thread fn



On 06/01/2020 17:30, Léa Lacroix wrote:


  Events 

  * /New: you can Wikidata-related events in the calendar of Wikimedia
Space

/
  * Upcoming: next Wikidata office hour


Perhaps you already know this: There is also a bit of Wikidata event 
information in Wikidata and you can get an overview in Scholia on coming 
deadlines and events: https://tools.wmflabs.org/scholia/event/


Currently, it shows that Wiki Workshop 2020 has a submission deadline at 
2020-01-17	


The individual events each has its own page, e.g., the Wiki Workshop 
2020 is here: https://tools.wmflabs.org/scholia/event/Q75538824


Last year's Wikimedia Hackathon is here: 
https://tools.wmflabs.org/scholia/event/Q44062313


On these pages, there are related events (past and future) based on 
time/location and people.



/Finn
https://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANN] WDumper - Generate customized Wikidata RDF dumps

2019-12-11 Thread fn

Hi Benno,


Thanks for the contribution.

Does your tool work for lexemes and other lexicographic data. When I 
view "Filter entities" then I do not see the ability to set properties 
such as dct:language and ontolex:sense.



best regards
Finn Årup Nielsen
https://people.compute.dtu.dk/faan/

On 11/12/2019 15:08, Benno Fünfstück wrote:

Hi everyone,

I am happy to announce a new tool I've been working on for the last few 
months, WDumper.

The tool is available at https://tools.wmflabs.org/wdumps/.

The idea is to provide a user interface to easily generate RDF dumps for 
subsets of the data contained in Wikidata.
As an example, the tool can generate dumps with only english labels or 
for a subset of the properties.


The tool is based on Wikidata Toolkit and processes the original JSON 
dumps provided by Wikidata.

When you submit a request to create a dump, it will be added to a queue.
The queue is processed in regular intervals (the maximum wait time in 
queue is 1h).


You can view a list of created dumps on 
https://tools.wmflabs.org/wdumps/dumps.
The generated dump can either be downloaded directly or uploaded to 
Zenodo for archival, which also generates a DOI for easy referencing in 
scientific publications.


I want to thank Prof. Dr. Markus Krötzsch for the original idea for this 
tool and support during the development of the tool.
If you have any questions, feel free to ask them by mail or create an 
issue on the GitHub page: https://github.com/bennofs/wdumper. The 
current version does not have a lot of features yet, so ideas for 
extending the tool with additional filters or options that you'd like to 
use are valuable feedback as well.


Also a small word of caution: while I did of course test the tool, the 
Wikidata data model is quite complex. Since the tool is new, bugs are 
more likely, so always apply a sanity check to the results.

If you find bugs, please tell me or create an issue on GitHub.

Regards,
Benno Fünfstück


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata Query Service update lag

2019-11-14 Thread fn


Besides waiting for the new updater, it may be useful to tell us, what 
we as users can do too. It is unclear to me what the problem is. For 
instance, at one point I was worried that the many parallel requests to 
the SPARQL endpoint that we make in Scholia is a problem. As far as I 
understand it is not a problem at all. Another issue could be the way 
that we use Magnus Manske's Quickstatements and approve bots for high 
frequency editing. Perhaps a better overview and constraints on 
large-scale editing could be discussed?


Yet another thought is the large discrepancy between Virginia and Texas 
data centers as I could see on Grafana [1]. As far as I understand the 
hardware (and software) are the same. So why is there this large 
difference? Rather than editing or BlazeGraph, could the issue be some 
form of network issue?



[1] 
https://grafana.wikimedia.org/d/00489/wikidata-query-service?panelId=8=1=now-7d=now


/Finn



On 14/11/2019 10:50, Guillaume Lederrey wrote:

Hello all!

As you've probably noticed, the update lag on the public WDQS endpoint 
[1] is not doing well [2], with lag climbing to > 12h for some servers. 
We are tracking this on phabricator [3], subscribe to that task if you 
want to stay informed.


To be perfectly honest, we don't have a good short term solution. The 
graph database that we are using at the moment (Blazegraph [4]) does not 
easily support sharding, so even throwing hardware at the problem isn't 
really an option.


We are working on a few medium term improvements:

* A dedicated updater service in Blazegraph, which should help increase 
the update throughput [5]. Finger crossed, this should be ready for 
initial deployment and testing by next week (no promise, we're doing the 
best we can).
* Some improvement in the parallelism of the updater [6]. This has just 
been identified. While it will probably also provide some improvement in 
throughput, we haven't actually started working on that and we don't 
have any numbers at this point.


Longer term:

We are hiring a new team member to work on WDQS. It will take some time 
to get this person up to speed, but we should have more capacity to 
address the deeper issues of WDQS by January.


The 2 main points we want to address are:

* Finding a triple store that scales better than our current solution.
* Better understand what are the use cases on WDQS and see if we can 
provide a technical solution that is better suited. Our intuition is 
that some of the use cases that require synchronous (or quasi 
synchronous) updates would be better implemented outside of a triple 
store. Honestly, we have no idea yet if this makes sense and what those 
alternate solutions might be.


Thanks a lot for your patience during this tough time!

    Guillaume


[1] https://query.wikidata.org/
[2] 
https://grafana.wikimedia.org/d/00489/wikidata-query-service?orgId=1=1571131796906=1573723796906_name=wdqs=8

[3] https://phabricator.wikimedia.org/T238229
[4] https://blazegraph.com/
[5] https://phabricator.wikimedia.org/T212826
[6] https://phabricator.wikimedia.org/T238045

--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] References to newspaper articles behind paywalls like newspapers.com

2019-11-12 Thread fn
I see a gradual shift from open news articles to articles behind 
paywall, that I guess must be a natural adjustment of the income model 
as newspapers transition from paper to the Internet. Good article, 
suitable for sources in Wikipedia and Wikidata, will probably more and 
more likely be behind paywalls. We need to get use to that and it has 
also been the case for books - to some degree - for a long time.


The Internet Archive, tools like Andy Mabbett refers to and the 
'archive-url' parameter for (the English) Wikipedia's cite news template 
as well as Wikidata's P1065 may be a good solution, but we may also keep 
in mind that it might not be sustainable copyright-wise in the long 
term, cf. discussions around controlled lending and Society of Authors' 
concerns 
https://publishingperspectives.com/2019/01/copyright-battle-internet-archives-open-library-authors-guild-society-of-authors/ 
(not currently behind paywall :)


/Finn


On 12/11/2019 08:17, Gerard Meijssen wrote:

Hoi,
It depends. When you approach it from a scientific point of view, you 
write your paper and are personally responsible for what you write. The 
bias of your paper may include the effort you take to verify sources. In 
Wikipedia it is NOT your paper and it is NOT only your responsibility. 
At the opposite end are the papers in the language you do not understand 
and consequently not accepted as a source, there are the papers, books 
that have not been


So a Wikipedia is biased because of the limiting of sources and as it is 
NOT a personal responsibility, there is also the notion if we should 
accept the bias that limiting brings.

Thanks,
        GerardM

On Tue, 12 Nov 2019 at 08:03, Andra Waagmeester > wrote:


I don't if I agree with "just citing" the newspaper article. Why not
push for resolvable citations if we have the technology? There is
not much value in a citation if you can't access the source to
verify, don't you think?

On Tue, Nov 12, 2019 at 6:38 AM Wynand van der Walt
mailto:wynli...@gmail.com>> wrote:

I like what Andy is doing, As a librarian, however, an
alternative would be to only cite the newspaper without the URL
as this analog citation would be valid, meaning just cite the
newspaper article.

Regards,

Wynand van der Walt
Head Librarian: Technical Services
Rhodes University Library



On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett
mailto:a...@pigsonthewing.org.uk>>
wrote:

On Mon, 11 Nov 2019 at 16:44, PWN
mailto:pariswritersn...@gmail.com>> wrote:

 > I’m constantly encountering newspaper articles that have
disappeared from Google and
 > are no longer viewable or even discoverable via Google.
 > They are often the sole reference url for statements, yet
are behind a paywall - notably
 > newspapers.com .
 > How is the community handling the paywalling of
historical newspaper resources?

Whenever I cite something on Wikidata, or Wikipedia, I
submit a copy
to the Internet Archive's Wayback Machine using the add-on for
Firefox:

https://github.com/jonathanmccann/archive-url-firefox-addon

-- 
Andy Mabbett

@pigsonthewing
http://pigsonthewing.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Important, Critical issues related to Wikidata

2019-08-21 Thread fn



On 21/08/2019 13:20, Houcemeddine A. Turki wrote:

Dear Ms.,
I thank you for your efforts. I tried to contact Ms. Léa Lacroix and Ms. 
Lydia. However, I failed due to my participation to many sessions. I saw 
Ms. Lydia when returning to the hotel on the third day. But, I had to go 
for the old town. I had nine points to raise:


1. Instance of and Subclass of are not well defined for users although 
they are quite different. I ask if these two properties can be merged as 
is-a. This will be easier to process by users and developers.


I would say in most Semantic Web applications there is a distinction 
between class and instance. I wonder if you can give examples of the 
problem. I see that diseases are instances of diseases which may be 
contested. Are you aware of WikiProject Ontology


https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology


2. There is a large misuse of Wikidata properties. I ask if it will be 
possible to add description logics to Wikidata properties so that such 
matters will not happen.


We have property suggestions and ShEx which might help. There is also 
SQID webservice https://tools.wmflabs.org/sqid which is of help.



3. There is a lack of several labels, statements and references in 
Wikidata. I ask if it will be interesting for Wikimedia Deutschland to 
work with us on using citation indexes to enrich and verify the 
information provided by Wikidata.


I wonder if you could be more concrete here.

It more sounds like editor issues?


4. Legal issues related to data integration into Wikidata still exist. 
There was a session in Wikimania about this. However, this should be 
ameliorated.


I which way should it be ameliorated? Should we leave CC0?


5. Some statements are more important than other ones. For example, 
medical signs that rarely exist or that are not specific should given 
less weight than important signs. Using fuzzy logic is important here.


That has also bothered me. To a certain extent qualifiers may be used.


6. Several CC-0 Online Lexicons already exist. I ask if there is a 
possibility to speed up integrating them to LexData.


Could you give some pointers to this? For modern Danish, I know 
Q66371001 which is CC0, but not much more. I am under the impression 
that Basque has been setup automatically presumable from an existing 
resource.



7. Wikidata property constraints should be ameliorated. For example, 
medical signs of a disease should be associated to the corresponding 
statuses of the disease.


I suppose one could create items for stages.


8. Wikidata Labels can be used to translate Mediawiki messages using the 
principle of Reasonator.


Perhaps to a certain extent. Currently translatewiki is used. That has 
review option. I am not sure that Wikidata can be used to translated 
messages such as "Write a helpful note for {{GENDER:$1|$1}} and future 
reviewers.‎"


I am not sure what "the principle of Reasonator" is.


9. In a Wikidata statement, the object can be a triple. For example, X 
Known for (X Co-founder of Y). I ask if this fact can become supported 
by Wikidata.


Qualifiers to a certain extent makes that possible. Perhaps new 
properties are necessary.



best regards
Finn



I ask if we can have an online meeting to discuss these nine points
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci

+21629499418

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Registration is open for #FORCE2019 in Edinburgh

2019-07-03 Thread fn
There is a bit more context in Scholia for FORCE2019: 
https://tools.wmflabs.org/scholia/event/Q65010059


/Finn

On 03/07/2019 14:28, Jérémie Roquet wrote:

Hi Violeta,

Thank you for your email.

Le mer. 3 juil. 2019 à 13:22, Violeta Ilik  a écrit :

You can now sign up to join us for the next installment of the popular, 
solutions-focused FORCE11 meeting – in Edinburgh October 16-17, with workshops 
on the 15th.


May I suggest, like several of us already have last year, to add some
context as to how this conference is relevant to people on the
Wikidata mailing list?

Thanks at lot!



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-20 Thread fn
Is that with SSD harddisks? Isn't the bottleneck the io traffic to the 
harddisks? (I suppose your are not loading into RAM?) What was you 
hardware configuration?


best regards
Finn
http://people.compute.dtu.dk/faan/

On 20/06/2019 14:37, Adam Sanchez wrote:

For your information

a) It took 10.2 days to load the Wikidata RDF dump
(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
The bigdata.jnl file turned to be 1.3T

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):16
On-line CPU(s) list:   0-15
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:  1
CPU MHz:   1200.476
CPU max MHz:   3000.
CPU min MHz:   1200.
BogoMIPS:  4197.65
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  20480K
RAM: 128G

b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data

09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content

The virtuoso.db file turned to be 340G.

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):12
On-line CPU(s) list:   0-11
Thread(s) per core:2
Core(s) per socket:6
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 63
Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:  2
CPU MHz:   1199.920
CPU max MHz:   3800.
CPU min MHz:   1200.
BogoMIPS:  6984.39
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  15360K
NUMA node0 CPU(s): 0-11
RAM: 128G

Best,


Le mar. 4 juin 2019 à 16:37, Vi to  a écrit :


V4 has 8 cores instead of 6.

But well, it's a server grade config on purpose!

Vito

Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey 
 ha scritto:


On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:


AFAIR it's a double Xeon E5-2620 v3.
With modern CPUs frequency is not so significant.


Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
2.10GHz (so v4 instead of v3, but the difference is probably minimal).


Vito

Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez  ha 
scritto:


Thanks Guillaume!
One question more, what is the CPU frequency (GHz)?

Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
 a écrit :


On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  wrote:


Hello,

Does somebody know the minimal hardware requirements (disk size and
RAM) for loading wikidata dump in Blazegraph?


The actual hardware requirements will depend on your use case. But for
comparison, our production servers are:

* 16 cores (hyper threaded, 32 threads)
* 128G RAM
* 1.5T of SSD storage


The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
The bigdata.jnl file which stores all the triples data in Blazegraph
is 478G but still growing.
I had 1T disk but is almost full now.


The current size of our jnl file in production is ~670G.

Hope that helps!

 Guillaume


Thanks,

Adam

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org

[Wikidata] Gender statistics on the Danish Wikipedia

2019-03-05 Thread fn

Dear any Wikidata Query Service expert,


In connection with an editathon, I have made statistics of the number of 
women and men on the Danish Wikipedia. I have used WDQS for that and the 
query is listed below:


SELECT ?count ?gender ?genderLabel
WITH {
  SELECT ?gender (COUNT(*) AS ?count) WHERE {
?item wdt:P31 wd:Q5 .
?item wdt:P21 ?gender .
?article schema:about ?item.
?article schema:isPartOf 
  }
  GROUP BY ?gender
} AS %results
WHERE {
  INCLUDE %results
  SERVICE wikibase:label { bd:serviceParam wikibase:language "da,en". }
}
ORDER BY DESC(?count)
LIMIT 25

http://tinyurl.com/y8twboe5

As the statistics could potentially create some discussion (and ready 
seems to have) I am wondering whether there are some experts that could 
peer review the SPARQL query and tell me if there are any issues. I hope 
I have not made a blunder...


The minor issues I can think of are:

- Missing gender in Wikidata. We have around 360 of these.

- People on the Danish Wikipedia not on Wikidata. Probably tens-ish or 
hundreds-ish!?


- People not being humans. The gendered items I sampled were all 
fictional humans.



We previously reached 17.2% females. Now we are below 17% due to 
mass-import of Japanese football players, - as far as we can see.



best regards
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Gender gap statistics

2019-01-22 Thread fn

(sorry for cross-posting on wiki-research and wikidata)


For an event, I am trying for find statistics about the gender gap. At 
one point there was a nice website with some information. Now it seems 
to be gone.


https://denelezh.dicare.org/gender-gap.php

redirects to https://wdcm.wmflabs.org/WDCM_BiasesDashboard/ but I get 
"502 Bad Gateway"



For http://whgi.wmflabs.org/gender-by-date-of-birth.html I see changes 
statistics on a web page. (when I view this page it seems as if the CSS 
is missing)


I have trying Wikidata Query Service. I have a few results here: 
https://www.wikidata.org/wiki/User:Fnielsen/Gender


The bad news is that more complex SPARQL queries time out, e.g. Persons 
across Wikipedias and genders, - I can only do it on Wikisource and 
Wikiquote.


For instance, I find the ratio of female biographies on the Danish 
Wikipedia to be 17.2% compared to the total number of biographies. It 
would be interesting to know how this number compares with other 
Wikipedias. If I include Wikipedia as a parameter in the SPARQL query it 
times out. (Writing a script could possibly solve it).


For WHGI, I see som CSV files. For instance, 
https://figshare.com/articles/Wikidata_Human_Gender_Indicators/3100903 
It reports 9447 and 51774 for women and men, respectively. My SPARQL 
query does not give these values...


Do we have updated statistics on contributor gender, particularly for 
Danish Wikipedia? I know we have some papers on gender and Wikipedia, 
see https://tools.wmflabs.org/scholia/topic/Q17002416
"Gender Markers in Wikipedia Usernames" displays 4,6% or 1.2% for 
females, while "The Wikipedia Gender Gap Revisited: Characterizing 
Survey Response Bias with Propensity Score Estimation" estimates up to 
23%. The old UNU survey 
https://web.archive.org/web/20110728182835/http://www.wikipediastudy.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf 
seems not to do gender statistics per Wikipedia.


https://stats.wikimedia.org seems not to have any gender statistics.


--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata as software metadata repository

2018-12-19 Thread fn

Hi Amirouche,

Maybe you want to talk to Katherine Thornton. You can read her paper here:

Modeling the Domain of Digital Preservation in Wikidata
https://tools.wmflabs.org/scholia/work/Q41533080
https://ipres2017.jp/wp-content/uploads/7.pdf

You can see some of the software in Wikidata in this Scholia listing:
https://tools.wmflabs.org/scholia/use/

I do not think we have a guix property. You can suggest one here:
https://www.wikidata.org/wiki/Wikidata:Property_proposal

I do not know where the notability level is for software, but for big 
programs, items for individual versions exist, see, e.g., STATA 13.0

https://tools.wmflabs.org/scholia/use/Q32106849

/Finn

On 12/19/18 11:38 PM, Amirouche Boubekki wrote:

Hello,

I am investigating with several people other the rainbow in GNU project 
as part of guix [0].


Our goal is to make our packages easier to discover by our users via 
full-text search or structured queries.


Questions:

a) I see Arch and Debian have properties. What would it take to have a 
guix property?


b) Is there already a group of people working together to put in place a 
list of requirements

for software entities to be considered good in the sens of wikidata?

c) What level of notoriety requires a software to be included in wikidata?

Thanks in advance!

[0] http://gnu.org/s/guix is both a package manager and an Operating System


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Running bots on Wikidata lexemes

2018-12-04 Thread fn
I recall reading in an announcement for Wikidata lexemes that we should 
not (yet) run (large) import jobs for Wikidata lexemes. I cannot 
immediately find that message.


I am wondering what the attitude of the Wikidata developers and users 
are wrt. import jobs for Wikidata lexemes?


Specifically, I am thinking on importing DanNet, a Danish lexical 
resource in RDF for which we recently gained a property for one of type 
of data: https://www.wikidata.org/wiki/Property:P6140



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2018-10-04 Thread fn


Denny's project is a very interesting.

We already have Wikidata and Magnus Manske's autodesc which can create 
paragraph-length natural language for some types of items.


Example:

https://tools.wmflabs.org/autodesc/?q=Q18618629==long=text==html_infobox=yes_template=

"""Denny Vrandečić is a Croatia researcher, programmer, and computer 
scientist.

He was born on February 27, 1978 in Stuttgart.
He studied at Karlsruhe Institute of Technology from October 2004 until 
June 2010, University of Stuttgart from September 1998 until February 
2004, University of Stuttgart from September 1997 until February 2004, 
and Geschwister-Scholl-Gymnasium. He worked for Google from October 
2013, for Wikimedia Deutschland from March 2012 until September 2013, 
and for Karlsruhe Institute of Technology from 2004 until 2012."""


Currently he seems to support English, French and Dutch.

I think Magnus Manske would accept pull request to other languages at 
https://bitbucket.org/magnusmanske/reasonator/src/9c58fadb7b72a791142fc158aebe38d9a4b98d92/public_html/auto_long_desc.js?at=master=file-view-default


So how would we go beyond Magnus? Would the Wikidata representation 
suffice? I have seen Q50827579 and Q28819478 for Wikidata to language 
generation, but I am not aware of running applications and are they 
better than Magnus' hard-coded approach?


I have been experimenting a bit the other way. Ordia can go from natural 
language to Wikidata-lexemes (for single Danish example):


>>> from ordia.base import Base
>>> base = Base()
>>> base.words_to_form_ids('der kom en soldat marcherende henad 
landevejen'.split(), language='da')
[['L3064-F1'], ['L3065-F3', 'L3065-F6'], ['L2022-F1', 'L3073-F3'], 
['L3074-F1'], ['L3075-F5'], ['L3215-F1'], ['L3216-F2']]


Writing the encyclopedic text in "Wikidata-lexemesh" could perhaps ease 
translation, particularly after 18 October when senses are planned to be 
enabled.


/Finn


On 09/29/2018 08:42 PM, Pine W wrote:
Forwarding because this (ambitious!) proposal may be of interest to 
people on other lists. I'm not endorsing the proposal at this time, but 
I'm curious about it.


Pine
( https://meta.wikimedia.org/wiki/User:Pine )


-- Forwarded message -
From: *Denny Vrandečić* mailto:vrande...@gmail.com>>
Date: Sat, Sep 29, 2018 at 6:32 PM
Subject: [Wikimedia-l] Wikipedia in an abstract language
To: Wikimedia Mailing List >



Semantic Web languages allow to express ontologies and knowledge bases in a
way meant to be particularly amenable to the Web. Ontologies formalize the
shared understanding of a domain. But the most expressive and widespread
languages that we know of are human natural languages, and the largest
knowledge base we have is the wealth of text written in human languages.

We looks for a path to bridge the gap between knowledge representation
languages such as OWL and human natural languages such as English. We
propose a project to simultaneously expose that gap, allow to collaborate
on closing it, make progress widely visible, and is highly attractive and
valuable in its own right: a Wikipedia written in an abstract language to
be rendered into any natural language on request. This would make current
Wikipedia editors about 100x more productive, and increase the content of
Wikipedia by 10x. For billions of users this will unlock knowledge they
currently do not have access to.

My first talk on this topic will be on October 10, 2018, 16:45-17:00, at
the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second,
longer talk on the topic will be at the DL workshop in Tempe, AZ, October
27-29. Comments are very welcome as I prepare the slides and the talk.

Link to the paper: http://simia.net/download/abstractwikipedia.pdf

Cheers,
Denny
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: wikimedi...@lists.wikimedia.org 

Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
?subject=unsubscribe>



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Register for FORCE2018 by this Friday and save $100

2018-09-11 Thread fn
I am a bit late on this, but FORCE2018 has two Wikidata people speaking 
(as Violeta's link shows): Dario Taraborelli and Daniel Mietchen, see 
also Scholia's profile: https://tools.wmflabs.org/scholia/event/Q56579271


I have never attended the conference, but I suppose it is most relevant 
to people in research and research supporting roles such as research 
librarians, - like WikiCite people.



/Finn

On 09/04/2018 02:28 PM, Violeta Ilik wrote:

Thank you for addressing this issue.
Yes, members of the wikidata community have participated in Force11 
events in the past and also this year: http://sched.co/F7u7
Members of the Force11 actually visited the Wikidata headquarters last 
year in Berlin when our 2017 conference was held there. See that program 
for reference:

https://docs.google.com/spreadsheets/d/1DH8FfDOGOqzYpCjVJK4aZcVq0vk4d-77fx7MUHRLV_Q/edit#gid=719985695


A friendly suggestion: please look up the event, in this case Force11 
[https://www.force11.org/meetings/force2018], before sending emails that 
contain suggestions to blacklist people.


Violeta



On Tue, Sep 4, 2018 at 8:03 AM, Léa Lacroix > wrote:


Hello all,

Lydia and I, as moderators of this list, are filtering the
announcements about conferences, keeping only the ones that have a
link to Wikidata or could interest the broad community. The criteria
of having to pay for the conference or not is not taken in account.
In that case, the conference includes open data in its topic, and
some members of the Wikicite group have been there in the past,
that's why we thought it would be interesting for the Wikidata
community.

However, we don't tolerate spam, multiple-reminders for the same
event, and we try to reduce the noise as much as possible for the
~1200 subscribers of this mailing-list.

Thanks for your understanding,
Léa

On 30 August 2018 at 18:57, Violeta Ilik mailto:ilik.viol...@gmail.com>> wrote:

Are you talking about the email I sent to register for the
Force11 conference?

Violeta


On Thursday, August 30, 2018, Jérémie Roquet
mailto:jroq...@arkanosis.net>> wrote:

2018-08-30 16:43 GMT+02:00 Pine W mailto:wiki.p...@gmail.com>>:
 > I'm going to ask the opinion of the Wikidata list
moderators here. This
 > email appears to be a soloicitation to pay for attendance
to an event, which
 > I would consider to be a junk email and would treat
accordingly including by
 > blacklisting the sender. Do the Wikidata list moderators
agree?

I didn't want to be the first to raise what I wasn't sure to
be an
issue, but I feel the same.

Best regards,

-- 
Jérémie


___
Wikidata mailing list
Wikidata@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





-- 
Léa Lacroix

Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de 

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts
Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig
anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata SPARQL query logs available

2018-08-23 Thread fn



I was wondering why our research section was number 8. Then I recalled 
our dashboard running from 
"http://people.compute.dtu.dk/faan/cognitivesystemswikidata1.html;. It 
updates around each 3 minute all day long...


/Finn

On 08/23/2018 09:57 PM, Daniel Mietchen wrote:

I just ran Max' one-liner over one of the dump files, and it worked
smoothly. Not sure where the best place would be to store such things,
so I simply put it in my sandbox for now:
https://www.wikidata.org/w/index.php?title=User:Daniel_Mietchen/sandbox=732396160
.
d.
On Tue, Aug 7, 2018 at 6:06 PM David Cuenca Tudela  wrote:


If someone could post the 10 (or 50!) more popular items, I would really 
appreciate it :-)

Cheers,
Micru

On Tue, Aug 7, 2018 at 5:59 PM Maximilian Marx  
wrote:



Hi,

On Tue, 7 Aug 2018 17:37:34 +0200, Markus Kroetzsch 
 said:

If you want a sorted list of "most popular" items, this is a bit more
work and would require at least some Python script, or some less
obvious combination of sed (extracting all URLs of entities), and
sort.


   zgrep -Eoe '%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ[1-9][0-9]+%3E' 
dump.gz | cut -d 'Q' -f 2 | cut -d '%' -f 1 | sort | uniq -c | sort -nr

should do the trick.

Best,

Maximilian
--
Dipl.-Math. Maximilian Marx
Knowledge-Based Systems Group
Faculty of Computer Science
TU Dresden
+49 351 463 43510
https://kbs.inf.tu-dresden.de/max

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Etiamsi omnes, ego non
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WikiData use and users

2018-06-01 Thread fn

Hi Heather,

"Hay's tools" can list the ones that appear on the Toolforge server and 
indexed:


https://tools.wmflabs.org/hay/directory/#/keyword/wikidata

It lists 110 tools currently


/Finn


On 06/01/2018 08:38 AM, Heather Ford wrote:

Hi there,

I'm doing some research on WikiData and wondering whether there is a 
list of projects/sites that either a) make use of WikiData to power 
their projects or that b) WikiData extracts data from in order to 
populate items. I can see some projects listed in External Tools [1] but 
can't seem to find lists of projects beyond this.


Can anyone help?

Many thanks.

Best,
Heather.

[1] https://www.wikidata.org/wiki/Wikidata:Tools/External_tools

Dr Heather Ford
Senior Lecturer, School of Arts & Media , 
University of New South Wales
w: hblog.org  / EthnographyMatters.net 
 / t: @hfordsa 






___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

2017-11-29 Thread fn
Dario's 1-3 is fine by me. Rather than P361 (part of), I think - as 
James Heald - that P972 (catalog) would be better. It is also used for 
artworks, for instance,


https://www.wikidata.org/wiki/Q44015154 P972 
https://www.wikidata.org/wiki/Q42661788


I see P972 and the associated identifier P528 as a kind of lightweight 
external identifier.


If the external dataset is available in RDF, then I suppose 
skos:exactMatch (P2888) can be used.


For some items available in other databases I have used "external data 
available at" P1325. But the semantics of that require some data at the 
other end.


---
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/



On 11/25/2017 02:16 PM, James Heald wrote:
Like others in this thread, I would caution *against* overloading P31 
"instance of" if possible.


When a somewhat similar issue came up, re how to artists that were of 
interest the the "Black Lunch Table" project 
https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual 
artists of the African diaspora, the solution adopted (after quite a 
vigorous debate at Project Chat) was to use property P972 "catalog" with 
the value Q28781198 to mark artists that were of interest to the project.


A similar approach could be used here, if a project has a list of works 
of interest, that it would be valuable to record inclusion in.


Best regards,

    James.


On 25/11/2017 04:42, John Erling Blad wrote:

Implicit heterogeneous unordered containers where members sees a
homogeneous parent. The member properties should be transitive to 
avoid the
maintenance burden, like a "tracking property", and also to make the 
parent

item manageable.

I can't see anything that needs any kind of special structure at the 
entity

level. Not even sure whether we need a new container for this, claims are
already unordered containers.

On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett 
wrote:


On 24 November 2017 at 23:30, Dario Taraborelli
 wrote:


I'd like to propose a fairly simple solution and hear your feedback on
whether it makes sense to implement it as is or with some 
modifications.


create a Wikidata class called "Wikidata item collection" [Q-X]


This sounds like Wikimedia categories, as used on Wikipedia and
Wikimedia Commons.

--
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk




---
This email has been checked for viruses by AVG.
http://www.avg.com


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wordnet synset ID

2017-08-21 Thread fn


I have written an email to Roberto Navigli of BabelNet and asked him 
about BabelNet and Wikidata, particularly about mass-upload.



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/


On 08/21/2017 04:56 PM, Denny Vrandečić wrote:
I think we could ask either Yago or BabelNet or both whether they would 
be receptive to release their mappings under a CC0 license, so it can be 
integrated into Wikidata. What I wonder is, if they do that, whether we 
wanted to have that data or not.


On Mon, Aug 21, 2017 at 7:18 AM Peter F. Patel-Schneider 
> wrote:


One problem with BabelNet is that its licence is restrictive, being
the Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
license.  Downloading BabelNet is even more restrictive, requiring also
working at a research institution.

Yago

http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
, which has the less restrictive license Attribution 3.0 Unported
(CC BY 3.0),
has links between Wikipedia categories and Wordnet.  Unfortunately,
it does
not carry these links through to regular Wikipedia pages.   I've
been toying
with making this last connection, which would be easy for those
categories
that a linked to Wikipedia page.

Peter F. Patel-Schneider
Nuance Communications

PS:  Strangely the Yago logo has a non-commercial license.  I don't
know why
this was done.

On 08/15/2017 10:32 AM, Finn Aarup Nielsen wrote:
 >
 > I do not think we have a Wiktionary-wordnet link.
 >
 > But I forgot to write we have a BabelNet Wikidata property,
 > https://www.wikidata.org/wiki/Property:P2581. This property has
been very
 > little used: http://tinyurl.com/y8npwsm5
 >
 > There might be a Wikimedia-Wordnet indirect link through BabelNet
 >
 > /Finn
 >
 >
 > On 08/15/2017 07:22 PM, Denny Vrandečić wrote:
 >> That's a great question, I have no idea what the answer will
turn out to be.
 >>
 >> Is there any current link between Wiktionary and WordNet? Or
WordNet and
 >> Wikipedia?
 >>
 >>
 >> On Tue, Aug 15, 2017 at 10:14 AM  >> wrote:
 >>
 >>
 >>
 >> I have proposed a Wordnet synset property here:
 >>
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wordnet_synset_ID
 >>
 >> The property has been discussed here on the mailing list
more than a
 >> year ago, but apparently never got to the point of a property
 >> suggestion:
 >>
https://lists.wikimedia.org/pipermail/wikidata/2016-April/008517.html
 >>
 >> I am wondering how the potential property fits in with the new
 >> development of the Wiktionary-Wikidata link. As far as I see
the senses,
 >> for instance, at
http://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15
 >> link to wikidata-lexeme Q-items, which I suppose is Wikidata
Q items
 >> once the new development is put into the production system.
So with my
 >> understanding linking Wikidata Q-items to Wordnet synsets is
correct. Is
 >> my understanding correct?
 >>
 >>
 >> Finn Årup Nielsen
 >> http://people.compute.dtu.dk/faan/
 >>
 >>
 >> ___
 >> Wikidata mailing list
 >> Wikidata@lists.wikimedia.org

>
 >> https://lists.wikimedia.org/mailman/listinfo/wikidata
 >>
 >>
 >>
 >> ___
 >> Wikidata mailing list
 >> Wikidata@lists.wikimedia.org 
 >> https://lists.wikimedia.org/mailman/listinfo/wikidata
 >>
 >
 > ___
 > Wikidata mailing list
 > Wikidata@lists.wikimedia.org 
 > https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wordnet synset ID

2017-08-15 Thread fn



I have proposed a Wordnet synset property here: 
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wordnet_synset_ID


The property has been discussed here on the mailing list more than a 
year ago, but apparently never got to the point of a property 
suggestion: 
https://lists.wikimedia.org/pipermail/wikidata/2016-April/008517.html


I am wondering how the potential property fits in with the new
development of the Wiktionary-Wikidata link. As far as I see the senses, 
for instance, at http://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15 
link to wikidata-lexeme Q-items, which I suppose is Wikidata Q items 
once the new development is put into the production system. So with my 
understanding linking Wikidata Q-items to Wordnet synsets is correct. Is 
my understanding correct?



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Tool for consuming left-over data from import

2017-08-04 Thread fn

Dear André,


Great work you have done.

I am wondering whether you are aware of the issues around the Danish 
dataset and the clean up apparently required.


As far as I can determine the German Wikipedia has had a number of 
articles on Danish dolmens and they are also available on Wikidata. As 
far as I can see these items have not been linked with the new Swedish 
additions.


For instance, "Dolmen von Tornby" https://www.wikidata.org/wiki/Q1269335 
has no Danish ID but is probably one of these items: 
https://www.wikidata.org/wiki/Q30240926 and 
https://www.wikidata.org/wiki/Q30240928 or 
https://www.wikidata.org/wiki/Q30114892 or 
https://www.wikidata.org/wiki/Q30114893 which the Alicia bot has added.


There are quite a lot of Danish dolmens on the German Wikipedia 
https://de.wikipedia.org/wiki/Kategorie:Gro%C3%9Fsteingrab_in_D%C3%A4nemark


I am sorry to present you with yet another problem. Perhaps the items 
can be matched by the geo-coordinate.



best regards
Finn




On 08/04/2017 04:57 PM, André Costa wrote:

Hi all!

As part of the Connected Open Heritage project Wikimedia Sverige have 
been migrating Wiki Loves Monuments datasets from Wikipedias to Wikidata.


In the course of doing this we keep a note of the data which we fail to 
migrate. For each of these left-over bits we know which item and which 
property it belongs to as well as the source field and language from the 
Wikipedia list.  An example would e.g. be a "type of building" field 
where we could not match the text to an item on Wikidata but know that 
the target property is P31.


We have created dumps of these (such as 
https://tools.wmflabs.org/coh/_total_se-ship_new.json, don't worry this 
one is tiny) but are now looking for an easy way for users to consume them.


Does anyone know of a tool which could do this today? The Wikidata game 
only allows (AFAIK) for yes/no/skip whereas you would here want 
something like /invalid/skip. And if not are there any 
tools which with a bit of forking could be made to do it?


We have only published a few dumps but there are more to come. I would 
also imagine that this, or a similar, format could be useful for other 
imports/template harvests where some fields are more easily handled by 
humans.


Any thoughts and suggestions are welcome.
Cheers,
André
André Costa |Senior Developer, Wikimedia Sverige 
|andre.co...@wikimedia.se  
|+46 (0)733-964574


Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se 



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikimedia Education] Coursework involving Wikidata?

2017-06-06 Thread fn


Hi Daniel,


A couple of years ago (2014) I included Wikidata as part of some 
information about the Semantic Web. The slides are here:


http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6148/pdf

They are somewhat dated, e.g., WDQS is not mentioned.


/Finn



On 06/04/2017 11:35 AM, Gerard Meijssen wrote:

Hoi,
Do you give attention to the fact that Wikidata is multilingual?
Thanks,
  GerardM

On 4 June 2017 at 09:20, Vojtěch Dostál > wrote:


Hello Shani,

We also experimented with teaching Wikidata skills this year with a
class of social media students at Charles University in Prague.
First 1.5hour class introduced Wikipedia, 2nd was about Wikidata and
the 3rd was a hands-on course in Wikidata Query service. The
students completed the course with a basic understanding of how data
are mined from WD and what types of questions can be answered by WD.

If I had a 4th lesson I'd also explain Petscan and/or QuickStatements.

Vojtech

Dne 4. 6. 2017 5:23 napsal uživatel "Shani Evenstein"
>:

I've been teaching WD as part of my 2 academic courses for the
past two
years. I had one 1.5 hour session dedicated to it, where I
introduce ways
of contributing and ways of using the data. Their task is
usually to add
info regarding the Wikipedia articles they wrote to WD. Students
usually
really like the WD Games. They also like things like
Histropedia, the
timeline tool.
I know Andrew lee has also tried the latter that this year and
created a
session where his students creayed a timelines to explore something.
Other than that, I'm unaware of other efforts to teach with WD,
but since
I'm also working on developing a separate elective about it, I'd
love to
see what you come up with. :-)

Shani..



On 4 Jun 2017 03:00, "Daniel Mietchen"
>
wrote:

 > Hi,
 >
 > I am preparing an elective course on Wikidata as part of a summer
 > school (some bare-bone background at
 > https://www.wikidata.org/wiki/User:Daniel_Mietchen/FSCI_2017
 )
 > and am looking for examples of previous or ongoing coursework
 > involving Wikidata.
 >
 > Thanks for any pointers,
 >
 > Daniel
 >
 > ___
 > Wikidata mailing list
 > Wikidata@lists.wikimedia.org

 > https://lists.wikimedia.org/mailman/listinfo/wikidata

 >
___
Education mailing list
educat...@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/education




___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Get "subject links" via Wikidata API

2017-04-12 Thread fn



On 04/12/2017 05:57 PM, Magnus Manske wrote:

Just say "wd:Q12345" (the author) instead of "?author" ?


Yes, that is what we do all over in Scholia, e.g.,
https://tools.wmflabs.org/scholia/author/Q13520818


The backlinks thing works, but is tedious. You'll need to load the items
via action=wbgetentities to check if that link actually means "author",
or some other property.


We got a question from a reviewer asking why we used SPARQL in Scholia 
and not just MediaWiki API. My initial thought was that it was not 
possible with MediaWiki API, but then I thought of list=backlinks and 
followed by (as Magnus points out) action=wbgetentities.


I was afraid that some place hidden in the MediaWiki API would be a 
query functionality so you could get Wikidata property-filtered 
backlinks, but since Magnus don't point to them, I am pretty sure now 
that no such functionality exists. :)



/Finn


On Wed, Apr 12, 2017 at 4:52 PM >
wrote:


To get the works that an person has written It would use SPARQL with
something link "SELECT * WHERE { ?work wdt:P50 ?author }".

I could also get the authors of a work via Wikidata MediaWiki API.

My question is whether it is possible to get the works of an author
given the author. With my knowledge of the API, I would say it is not
possible, except if you do something "Special:WhatLinksHere"
(list=backlinks) and process/filter all the results.


Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Get "subject links" via Wikidata API

2017-04-12 Thread fn


To get the works that an person has written It would use SPARQL with 
something link "SELECT * WHERE { ?work wdt:P50 ?author }".


I could also get the authors of a work via Wikidata MediaWiki API.

My question is whether it is possible to get the works of an author 
given the author. With my knowledge of the API, I would say it is not 
possible, except if you do something "Special:WhatLinksHere" 
(list=backlinks) and process/filter all the results.



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New Wikidata accounts can't edit labels?

2017-04-12 Thread fn


Experienced wikimedians who supervise hackathons (and are not admins) 
may want to request an "Account creator" role.


I don't see that it is possible to get that role on Wikidata 
(https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions),
so I suppose that one needs to inquire on the Wikipedia of ones 
preferred language.


/Finn


On 04/12/2017 09:32 AM, James Hare wrote:

Alternatively if someone is an admin or otherwise has account creator
rights, they can create accounts for others with an email address.

On Apr 12, 2017, at 3:20 AM, David Cuenca Tudela > wrote:


Hi Daniel,
The usual procedure is explained here:
https://meta.wikimedia.org/wiki/Mass_account_creation
https://www.mediawiki.org/wiki/Help:Mass_account_creation
https://meta.wikimedia.org/wiki/Grants:Learning_patterns/Six-account_limit

Normally to request a temporary lift of the IP cap, they should file a
task as explained on that page. There are some quick workarounds, like
creating accounts with smartphones.

Cheers,
Micru



On Tue, Apr 11, 2017 at 11:26 PM, Daniel Mietchen
> wrote:

Dear all,
I was just pinged by a Wikidata hackathon in Suriname
(cf. https://www.spangmakandra.com/big-data-seminar-suriname
 )
that they can't edit Wikidata any more - see also
https://twitter.com/twitferry/status/851907389087502338
 .
We are musing that this may be due to an IP ban, since more than six
new accounts were registered from the same IP (186.179.xxx.xx).
Can anyone help sort this out quickly, so that the event can move on?
Thanks,
Daniel

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





--
Etiamsi omnes, ego non
___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] What kind of bot for wiktionary in wikidata needs?

2017-03-01 Thread fn



Hi,


It is my understanding that Wikidata for Wiktionary requires new data 
structures or at least new name space (L, F and S), and that is what 
holding people back.


What could be interesting to have would be a prototype (not necessarily 
built with MediaWiki+Wikibase) to see if the suggested scheme is ok.




Finn Årup Nielsen



On 03/01/2017 10:16 PM, Amirouche wrote:

Héllo,


I have been lurking around for some month now. I stumbled upon the
wiktionary in wikidata project
 via for instance this pdf
https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf



Now I'd like to help. For that I want to build a bot to achieve that goal.


My understanding is that a proof of concept of the page 11 of the above
pdf can be good. But I never really did any site scraping. Is there any
abstraction that help in this regard.


My setup:


- homegrown rdf-like database with wikidata loaded from json dumps with
minikanren querying

- GNU Guile

- soon enough dumps from https://en.wiktionary.org/api/


Tx!


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata