Re: [Wikidata] Virtuoso hosted Wikidata Instance

2019-08-14 Thread Jérémie Roquet
Hi!

Le mer. 14 août 2019 à 01:10, Kingsley Idehen  a écrit :
> We have loaded Wikidata into a Virtuoso instance accessible via SPARQL [1]. 
> One benefit is helping to understand Wikidata using our Faceted Browsing 
> Interface for Entity Relationship Types [2][3].

That's great news, thanks!

> Feedback always welcome too :)

So, I've eagerly tried a very simple SPARQL query with a huge result
set, the complete version of which¹ I've known for several years to
timeout in both the official Blazegraph instance and a personal
Blazegraph instance with supposedly all time limits removed:

  PREFIX wd: 
  PREFIX wdt: 

  SELECT ?person WHERE {
?person wdt:P31 wd:Q5
  }

… and while the Virtuoso instance manages to answer pretty quickly, it
seems that it's cutting the result set at 100k triples. Is it the
expected behavior? If so, I suggest you show that in the UI because
apart from the improbable round number of triples, it's not obvious
that the result set is incomplete (in this case, the LDF endpoint
tells us that there should be around 5,4M triples²).

Thanks again!

¹ ie. using the wikibase:label service
² 
https://query.wikidata.org/bigdata/ldf?subject==wdt%3AP31=wd%3AQ5

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Registration is open for #FORCE2019 in Edinburgh

2019-07-03 Thread Jérémie Roquet
Hi Violeta,

Thank you for your email.

Le mer. 3 juil. 2019 à 13:22, Violeta Ilik  a écrit :
> You can now sign up to join us for the next installment of the popular, 
> solutions-focused FORCE11 meeting – in Edinburgh October 16-17, with 
> workshops on the 15th.

May I suggest, like several of us already have last year, to add some
context as to how this conference is relevant to people on the
Wikidata mailing list?

Thanks at lot!

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Register for FORCE2018 by this Friday and save $100

2018-09-04 Thread Jérémie Roquet
Hi Violeta,

Thanks for the information and sorry for my previous email which
wasn't very sensitive.

May I suggest that in the future, you send the program in the very
first email? There's definitely a chance I get interested in an event
if the program looks great, and I assume that's true for some other
people on that list too. On the contrary, discounts right from the
mail subject and links to click are 100% certain to raise eyebrows.
I've re-read the initial email in that thread, and I'd for sure have
the same initial reaction I've had the first time I've read it,
because it says absolutely nothing about Wikidata and its content is
very abstract.

Léa, thanks a lot for the explanations and the upstream work to filter
incoming emails. That might be too early to discuss this topic given
the low volume of such threads, but maybe at some point should we have
different mailing lists for events and for Wikidata itself. I'm
personally very interested in reading your weekly summaries and
in-depth discussions about Wikidata, but I'm not interested at all in
conferences and I've already had to unsubscribe from a bunch of
mailing lists (including wiki-research-l) only because of the volume
of call for papers and other registration deadlines.

Thanks and best regards,

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Register for FORCE2018 by this Friday and save $100

2018-08-30 Thread Jérémie Roquet
2018-08-30 16:43 GMT+02:00 Pine W :
> I'm going to ask the opinion of the Wikidata list moderators here. This
> email appears to be a soloicitation to pay for attendance to an event, which
> I would consider to be a junk email and would treat accordingly including by
> blacklisting the sender. Do the Wikidata list moderators agree?

I didn't want to be the first to raise what I wasn't sure to be an
issue, but I feel the same.

Best regards,

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-11-07 Thread Jérémie Roquet
2017-11-07 15:32 GMT+01:00 Ghislain ATEMEZING :
> In the meantime, what about making chunks of 3.5Bio triples (or any size
> less than 4Bio) and a script to convert the dataset? Would that be possible?

That seems possible to me, but I wonder if cutting the dataset in
independent clusters is not a bigger undertaking compared to making
HDT handle bigger datasets (I'm not saying it is, I've really no
idea).

Best regards,

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-11-07 Thread Jérémie Roquet
2017-11-07 17:09 GMT+01:00 Laura Morales :
> How many triples does wikidata have? The old dump from rdfhdt seem to have 
> about 2 billion, which means wikidata doubled the number of triples in less 
> than a year?

A naive grep | wc -l on the last turtle dump gives me an estimate of
4.65 billions triples.

Looking at https://tools.wmflabs.org/wikidata-todo/stats.php it seems
that Wikidata is indeed more than twice as big as only six months ago.

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-11-07 Thread Jérémie Roquet
Hi everyone,

I'm afraid the current implementation of HDT is not ready to handle
more than 4 billions triples as it is limited to 32 bit indexes. I've
opened an issue upstream: https://github.com/rdfhdt/hdt-cpp/issues/135

Until this is addressed, don't waste your time trying to convert the
entire Wikidata to HDT: it can't work.

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-31 Thread Jérémie Roquet
2017-10-31 21:27 GMT+01:00 Laura Morales :
>> I've just loaded the provided hdt file on a big machine (32 GiB wasn't
> enough to build the index but ten times this is more than enough)
> Could you please share a bit about your setup? Do you have a machine with 
> 320GB of RAM?

It's a machine with 378 GiB of RAM and 64 threads running Scientific
Linux 7.2, that we use mainly for benchmarks.

Building the index was really all about memory because the CPUs have
actually a lower per-thread performance (2.30 GHz vs 3.5 GHz) compared
to those of my regular workstation, which was unable to build it.

> Could you please also try to convert wikidata.ttl to hdt using "rdf2hdt"? I'd 
> be interested to read your results on this too.

As I'm also looking for up-to-date results, so I plan do it with the
last turtle dump as soon as I have a time slot for it; I'll let you
know about the outcome.

>> I'll try to run a few queries to see how it behaves.
>
> I don't think there is a command-line tool to parse SPARQL queries, so you 
> probably have to setup a Fuseki endpoint which uses HDT as a data source.

You're right. The limited query language of hdtSearch is closer to
grep than to SPARQL.

Thank you for pointing out Fuseki, I'll have a look at it.

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-31 Thread Jérémie Roquet
2017-10-31 14:56 GMT+01:00 Laura Morales :
> 1. I have downloaded it and I'm trying to use it, but the HDT tools (eg. 
> query) require to build an index before I can use the HDT file. I've tried to 
> create the index, but I ran out of memory again (even though the index is 
> smaller than the .hdt file itself). So any Wikidata dump should contain both 
> the .hdt file and the .hdt.index file unless there is another way to generate 
> the index on commodity hardware

I've just loaded the provided hdt file on a big machine (32 GiB wasn't
enough to build the index but ten times this is more than enough), so
here are a few interesting metrics:
 - the index alone is ~14 GiB big uncompressed, ~9 GiB gzipped and
~6.5 GiB xzipped ;
 - once loaded in hdtSearch, Wikidata uses ~36 GiB of virtual memory ;
 - right after index generation, it includes ~16 GiB of anonymous
memory (with no memory pressure, that's ~26 GiB resident)…
 - …but after a reload, the index is memory mapped as well, so it only
includes ~400 MiB of anonymous memory (and a mere ~1.2 GiB resident).

Looks like a good candidate for commodity hardware, indeed. It loads
in less than one second on a 32 GiB machine. I'll try to run a few
queries to see how it behaves.

FWIW, my use case is very similar to yours, as I'd like to run queries
that are too long for the public SPARQL endpoint and can't dedicate a
powerful machine do this full time (Blazegraph runs fine with 32 GiB,
though — it just takes a while to index and updating is not as fast as
the changes happening on wikidata.org).

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] How to get direct link to image

2017-10-30 Thread Jérémie Roquet
2017-10-30 18:16 GMT+01:00 Laura Morales :
> - wikidata entry: https://www.wikidata.org/wiki/Q161234
> - "logo image" property pointing to: 
> https://commons.wikimedia.org/wiki/File:0_A.D._logo.png
>
> However... that's a HTML page... How do I get a reference to the .png file? 
> In this case 
> https://upload.wikimedia.org/wikipedia/commons/1/1c/0_A.D._logo.png

That's not exactly straightforward, given you have to make a query for
every single file (or batch of up to 50 of them), but you can get the
URL through the API. In this case:

https://commons.wikimedia.org/w/api.php?action=query=json=imageinfo=url=File%3A0_A.D._logo.png

Best regards,

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Jérémie Roquet
2017-10-27 18:56 GMT+02:00 Jérémie Roquet <jroq...@arkanosis.net>:
> 2017-10-27 18:51 GMT+02:00 Luigi Assom <itsawesome@gmail.com>:
>> I found and share this resource:
>> http://www.rdfhdt.org/datasets/
>>
>> there is also Wikidata dump in HDT
>
> The link to the Wikidata dump seems dead, unfortunately :'(

Javier D. Fernández of the HDT team was very quick to fix the link :-)

One can contact them for support either on their forum or by email¹,
as they are willing to help the Wikidata community make use of HDT.

Best regards,

¹ http://www.rdfhdt.org/team/

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Jérémie Roquet
2017-10-27 18:56 GMT+02:00 Jérémie Roquet <jroq...@arkanosis.net>:
> 2017-10-27 18:51 GMT+02:00 Luigi Assom <itsawesome@gmail.com>:
>> I found and share this resource:
>> http://www.rdfhdt.org/datasets/
>>
>> there is also Wikidata dump in HDT
>
> The link to the Wikidata dump seems dead, unfortunately :'(

… but there's a file on the server:
http://gaia.infor.uva.es/hdt/wikidata-20170313-all-BETA.hdt.gz (ie.
the link was missing the “.gz”)

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Jérémie Roquet
2017-10-27 18:51 GMT+02:00 Luigi Assom :
> I found and share this resource:
> http://www.rdfhdt.org/datasets/
>
> there is also Wikidata dump in HDT

The link to the Wikidata dump seems dead, unfortunately :'(

-- 
Jérémie

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-l] P:238, 239, 240

2013-03-07 Thread Jérémie Roquet
Hi,

2013/3/7 Sven svenmangu...@gmail.com:
 I just created three properties but my Internet connection died before I 
 could update the list of properties page and my smartphone (using to write 
 this) is terrible for editing. Could someone make sure that the list page is 
 updated.

It seems to me that P238 and P239 are duplicates for P229 and P230, aren't they?

I've added P240 to the list.

Best regards,

-- 
Jérémie (Arkanosis)

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Publicity for Wikidata and Wikivoyage

2012-12-05 Thread Jérémie Roquet
Hi,

2012/12/6 Erik Moeller e...@wikimedia.org:
 I'm curious what folks here think about making some noise about the
 two most recent additions to the Wikimedia family around January or
 February. Wikivoyage is still finishing up the beta phase (image
 transfers, logo import etc.) this month, and Wikidata isn't live as a
 repository yet -- but we could set a target date that would work for
 both projects.

 snip

 Thoughts?

Great idea, but isn't it a bit premature for Wikidata? Right now we
only have interwiki links (and an awesome community :) ), which I fear
isn't enough for newcomers to understand how critical this new project
is. Once phase 2 is live, however, this would make sense without any
doubt.

Best regards,

-- 
Jérémie

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Pseudo-empty items

2012-11-14 Thread Jérémie Roquet
Hi Bináris,

2012/11/14 Bináris wikipo...@gmail.com:
 have a look at
 https://www.wikidata.org/w/index.php?title=Q49382rcid=449383setlang=hu
 This item has a German label with no link and nothing else.
 My question is not if it is useful or not (this is up to the community) but
 why do I see it completely empty when I set a language other than German?
 Why do I not see descriptions in every language, only in my chosen one?
 Why can I edit description only in my chosen language?
 Or do I do soemthing wrong?

See ¹ and ².

Best regards,

¹ https://www.wikidata.org/wiki/Wikidata:Project_chat#Fallback_languages
² https://www.wikidata.org/wiki/MediaWiki:Gadget-labelLister.js

-- 
Jérémie

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] wikidata.org is live (with some caveats)

2012-10-30 Thread Jérémie Roquet
2012/10/30 Amir E. Aharoni amir.ahar...@mail.huji.ac.il:
 2012/10/30 emijrp emi...@gmail.com:
 Cool, nice work.

 SUL is not enabled?

 It is, we just discussed it on IRC :)

 Log out, then log in again to some other existing project (like
 https://ca.wikisource.org ) and then to https://www.wikidata.org , and
 it should work.

Check that you're using the same protocol, as if you're logged with
HTTPS and following a link using HTTP, HTTPSEverywhere will not help,
yet :-)

Best regards,

-- 
Jérémie

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Request for comments: syntax for including data on client wikis (aka how to make infoboxes)

2012-05-22 Thread Jérémie Roquet
Hi Daniel, everyone,

2012/5/22 Daniel Kinzler daniel.kinz...@wikimedia.de:
 I have created a first preliminary draft of how data items from the Wikidata
 repository may be accessed and rendered on the client wiki, e.g. to make 
 infoboxes.

 https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax

Great!

 It would be great if you could have a look and let us know about any
 unclarities, omissions, or other flaws - and of course about your ideas of how
 to do this.

 Getting this right is an important part of implementing phase 2 of the 
 Wikidata
 project, and so I feel it's important to start drafting and discussing early.
 Having a powerful but not overly complex way to create infoboxes etc from
 Wikidata items is very important for the acceptance of Wikidata on the clinet
 wikis, I believe.

I've some questions about the design proposed in this draft. I'm not
sure these are actual issues, but I prefer to be sure ;-)

i. It seems to me that the proposed design implies that any access to
data is done through a template transclusion, which could be fine for
the given example (ie. infoboxes — though I'll raise another issue
below, see ii.) but AFAIU forbids direct use of data in an article. Is
this a desirable limitation? Or did I miss something?

ii. I understand that only one item can be included at once (either by
using the data_item attribute or the article links). What if for some
reason we want a template that accesses several items? Is it
reasonable to assume that there will always be an item that links to
every needed item for a given template?

iii. Articles on current wikis use templates named (eg.) “{{Infobox…”.
Shouldn't it be a prerequisite for templates using Wikidata to (be
able to) keep the same name, so that:
 - we don't have to run bots on every single article only to prepend
“#data-template:” but just have to update the template? Arguably, we
would have to edit the articles anyway to remove attributes that are
handled by Wikidata; just to be sure.
 - people don't massively reject the transition to Wikidata because of
the /visible/ syntax change.

iv. Per i., ii. and iii., wouldn't it be desirable to have some syntax
to access an item without relying on template transclusion? This would
enable us to:
 - use data in an article without having to write a template first (solves i.);
 - write templates that can get as many items as they need, either
from the transcluding page or /by themselves/ (solves ii.);
 - update the existing templates to Wikidata without having to edit
the articles and without visible changes from the template user's
perspective — except for what is handled by Wikidata, of course
(solves iii.).

v. What about lua? :)

Best regards,

-- 
Jérémie

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l