Re: [Wikidata] Wikidata HDT dump

2018-10-02 Thread Laura Morales
> You shouldn't have to keep anything in RAM to HDT-ize something as you could > make the dictionary by sorting on disk and also do the joins to look up > everything against the dictionary by sorting. Yes but somebody has to write the code for it :) My understanding is that they keep everything

Re: [Wikidata] Wikidata HDT dump

2018-10-02 Thread Laura Morales
> 100 GB "with an optimized code" could be enough to produce an HDT like that. The current software definitely cannot handle wikidata with 100GB. It was tried before and it failed. I'm glad to see that new code will be released to handle large files. After skimming that paper it looks like they

Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Laura Morales
> a new dump of Wikidata in HDT (with index) is > available[http://www.rdfhdt.org/datasets/]. Thank you very much! Keep it up! Out of curiosity, what computer did you use for this? IIRC it required >512GB of RAM to function. > You will see how Wikidata has become huge compared to other

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-18 Thread Laura Morales
lue. We also made an infobox mockup for the Eiffel Tower for our grant proposal with a sync button next to the Infobox property: https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#Prototype_with_more_focus[https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#P

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-14 Thread Laura Morales
> I was more expecting technical questions here, but it seems there is interest > in how the economics work. However, this part is not easy to write for me. I'd personally like to test a demo of the Databus. I'd also like to see a complete list of all the graphs that are available.

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
Is this a question for Sebastian, or are you talking on behalf of the project?     Sent: Tuesday, May 08, 2018 at 5:10 PM From: "Thad Guidry" <thadgui...@gmail.com> To: "Discussion list for the Wikidata project" <wikidata@lists.wikimedia.org> Cc: "Laura Mor

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
the Wikidata project" <wikidata@lists.wikimedia.org>, "Laura Morales" <laure...@mail.com> Subject: Re: [Wikidata] DBpedia Databus (alpha version) Hi Laura,   I don't understand, is this just another project built on DBPedia, or a project to replace DBPedia entirely?   a valid quest

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Laura Morales
I don't understand, is this just another project built on DBPedia, or a project to replace DBPedia entirely? Are you a DBPedia maintainer?     Sent: Tuesday, May 08, 2018 at 1:29 PM From: "Sebastian Hellmann" To: "Discussion list for the Wikidata project."

Re: [Wikidata] Wikidata HDT dump

2017-12-12 Thread Laura Morales
* T H A N KY O U * > On 7 Nov I created an HDT file based on the then current download link > from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz Thank you very very much Wouter!! This is great! Out of curiosity, could you please share some info about the machine that

Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Laura Morales
> If you want to know when, why, where, etc, you have to > check the qualified "full" statements. All these qualifiers are encoded as additional triples in "all", correct? ___ Wikidata mailing list Wikidata@lists.wikimedia.org

[Wikidata] RDF: All vs Truthy

2017-12-02 Thread Laura Morales
Can somebody please explain (in simple terms) what's the difference between "all" and "truthy" RDF dumps? I've read the explanation available on the wiki [1] but I still don't get it. If I'm just a user of the data, because I want to retrieve information about a particular item and link items

Re: [Wikidata] Wikipedia page from wikidata ID

2017-11-12 Thread Laura Morales
> I am not sure where you are trying to do this and how but  > https://www.wikidata.org/wiki/Special:GoToLinkedPage[https://www.wikidata.org/wiki/Special:GoToLinkedPage] > might be useful. You can call it with an item ID and a wiki code in the URL > and it will redirect you to the article on

Re: [Wikidata] Wikipedia page from wikidata ID

2017-11-11 Thread Laura Morales
> schema:about connects Wikidata item with Wikipedias, e.g., > > Wikidata Query Service: "SELECT * WHERE { ?page schema:about wd:Q80 }" > > The triple is also available directly from the MediaWiki entity: > > https://www.wikidata.org/entity/Q80.nt Thank you! I was looking for "outgoing" links

[Wikidata] Wikipedia page from wikidata ID

2017-11-11 Thread Laura Morales
How can I get the Wikipedia URL of a wikibase:Item ID? Searching online I could only find how to do this using the Mediawiki API, but I was wondering if I can extract/generate URLs from the wikidata graph itself. Thanks. ___ Wikidata mailing list

Re: [Wikidata] Wikidata HDT dump

2017-11-07 Thread Laura Morales
> drops `a wikibase:Item` and `a wikibase:Statement` types off topic but... why drop `a wikibase:Item`? Without this it seems impossible to retrieve a list of items. ___ Wikidata mailing list Wikidata@lists.wikimedia.org

Re: [Wikidata] Wikidata HDT dump

2017-11-07 Thread Laura Morales
How many triples does wikidata have? The old dump from rdfhdt seem to have about 2 billion, which means wikidata doubled the number of triples in less than a year?     Sent: Tuesday, November 07, 2017 at 3:24 PM From: "Jérémie Roquet" To: "Discussion list for the

Re: [Wikidata] Wikidata HDT dump

2017-11-03 Thread Laura Morales
> I’ve created a Phabricator task (https://phabricator.wikimedia.org/T179681) > for providing a HDT dump, let’s see if someone else (ideally from the ops > team) responds to it. (I’m not familiar with the systems we currently use for > the dumps, so I can’t say if they have enough resources for

Re: [Wikidata] Wikidata HDT dump

2017-11-03 Thread Laura Morales
> Thank you for this feedback, Laura.  > Is the hdt index you got available somewhere on the cloud? Unfortunately it's not. It was a private link that was temporarily shared with me by email. I guess I could re-upload the file somewhere else myself, but my uplink is really slow (1Mbps).

Re: [Wikidata] Wikidata HDT dump

2017-11-03 Thread Laura Morales
Hello list, a very kind person from this list has generated the .hdt.index file for me, using the 1-year old wikidata HDT file available at the rdfhdt website. So I was finally able to setup a working local endpoint using HDT+Fuseki. Set up was easy, launch time (for Fuseki) also was quick (a

Re: [Wikidata] Wikidata HDT dump

2017-11-02 Thread Laura Morales
> There is also a command line tool called hdtsparql in the hdt-java distribution that allows exactly this. It used to support only SELECT queries, but I've enhanced it to support CONSTRUCT, DESCRIBE and ASK queries too. There are some limitations, for example only CSV output is supported for

Re: [Wikidata] Wikidata HDT dump

2017-11-01 Thread Laura Morales
> I am currently downloading the latest ttl file. On a 250gig ram machine. I > will see if that is sufficient to run the conversion Otherwise we have > another busy one with  around 310 gig. Thank you! > For querying I use the Jena query engine. I have created a module called > HDTQuery

Re: [Wikidata] Wikidata HDT dump

2017-11-01 Thread Laura Morales
> It's a machine with 378 GiB of RAM and 64 threads running Scientific > Linux 7.2, that we use mainly for benchmarks. > > Building the index was really all about memory because the CPUs have > actually a lower per-thread performance (2.30 GHz vs 3.5 GHz) compared > to those of my regular

Re: [Wikidata] Wikidata HDT dump

2017-10-31 Thread Laura Morales
> I've just loaded the provided hdt file on a big machine (32 GiB wasn't enough to build the index but ten times this is more than enough) Could you please share a bit about your setup? Do you have a machine with 320GB of RAM? Could you please also try to convert wikidata.ttl to hdt using

Re: [Wikidata] Wikidata HDT dump

2017-10-31 Thread Laura Morales
> @Laura: I suspect Wouter wants to know if he "ignores" the previous errors > and proposes a rather incomplete dump (just for you) or waits for Stas' > feedback. OK. I wonder though, if it would be possible to setup a regular HDT dump alongside the already regular dumps. Looking at the dumps

Re: [Wikidata] Wikidata HDT dump

2017-10-31 Thread Laura Morales
@Wouter > Thanks for the pointer!  I'm downloading from > https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz now. Any luck so far? ___ Wikidata mailing list Wikidata@lists.wikimedia.org

Re: [Wikidata] How to get direct link to image

2017-10-30 Thread Laura Morales
> You can also use the Wikimedia Commons API made by Magnus: https://tools.wmflabs.org/magnus-toolserver/commonsapi.php > It will also gives you metadata about the image (so you'll be able to cite > the author of the image when you reuse it). Is the same metadata also available in the Turtle/HDT

[Wikidata] How to get direct link to image

2017-10-30 Thread Laura Morales
- wikidata entry: https://www.wikidata.org/wiki/Q161234 - "logo image" property pointing to: https://commons.wikimedia.org/wiki/File:0_A.D._logo.png However... that's a HTML page... How do I get a reference to the .png file? In this case

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> KBox is an alternative to other existing architectures for publishing KB such > as SPARQL endpoints (e.g. LDFragments, Virtuoso), and Dump files. > I should add that you can do federated query with KBox as as easier as you > can do with SPARQL endpoints. OK, but I still fail to see what is

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> @Laura : you mean this list http://lov.okfn.org/lov.nq.gz ? > I can download it !! > > Which one ? Please send me the URL and I can fix it !! Yes you can download it, but the nq file is broken. It doesn't validate because some URIs contains white spaces, and some triples have an empty

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> No, the idea is that each organization will have its own KNS, so users can > add the KNS that they want.  How would this compare with a traditional SPARQL endpoint + "federated queries", or with "linked fragments"? ___ Wikidata mailing list

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> Thanks to report that. I remember one issue that I added here > https://github.com/pyvandenbussche/lov/issues/66 Yup, still broken! I've tried just now. ___ Wikidata mailing list Wikidata@lists.wikimedia.org

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> Also, for avoiding your users to re-create the models, you can pre-load > "models" from LOV catalog. The LOV RDF dump is broken instead. Or at least it still was the last time I checked. And I don't broken in the sense of Wikidata, that is with some wrong types, I mean broken as it doesn't

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> @Wouter: As Stas said, you might report that error. I don't agree with Laura > who tried to under estimate that "syntax error". It's also about quality ;) Don't get me wrong, I am all in favor of data quality! :) So if this can be fixed, it's better! The thing is, that I've seen so many

Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Laura Morales
> The first part of the Turtle data stream seems to contain syntax errors for > some of the XSD decimal literals.  The first one appears on line 13,291: > > Notice that scientific notation is not allowed in the lexical form of > decimals according to XML > Schema Part 2: >

Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Laura Morales
> is it possible to store a weighted adjacency matrix as an HDT instead of an > RDF? > > Something like a list of entities for each entity, or even better a list of > tuples for each entity. > So that a tuple could be generalised with propoerties. Sorry I don't know this, you would have to ask

Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Laura Morales
> You can mount te jnl file directly to blazegraph so loading and indexing is > not needed anymore. How much larger would this be compared to the Turtle file? ___ Wikidata mailing list Wikidata@lists.wikimedia.org

Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Laura Morales
647674624 On Fri, Oct 27, 2017 at 5:08 PM, Laura Morales <laure...@mail.com> wrote: > Hello everyone, > > I'd like to ask if Wikidata could please offer a HDT [1] dump along with the > already available Turtle dump [2]. HDT is a binary format to store RDF data, > which is

Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Laura Morales
> Would it be an idea if HDT remains unfeasible to place the journal file of > blazegraph online? > Yes, people need to use blazegraph if they want to access the files and query > it but it could be an extra next to turtle dump? How would a blazegraph journal file be better than a Turtle dump?

[Wikidata] Wikidata HDT dump

2017-10-27 Thread Laura Morales
Hello everyone, I'd like to ask if Wikidata could please offer a HDT [1] dump along with the already available Turtle dump [2]. HDT is a binary format to store RDF data, which is pretty useful because it can be queried from command line, it can be used as a Jena/Fuseki source, and it also uses

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-26 Thread Laura Morales
Adding 2.2 million German organisations to Wikidata Laura,   Talk to OpenCorporates and ask those questions yourself. Get involved ! :)   -Thad +ThadGuidry[https://plus.google.com/+ThadGuidry]    On Wed, Oct 25, 2017 at 3:22 AM Laura Morales <laure...@mail.com[mailto:laure...@mail.com]>