> You shouldn't have to keep anything in RAM to HDT-ize something as you could
> make the dictionary by sorting on disk and also do the joins to look up
> everything against the dictionary by sorting.
Yes but somebody has to write the code for it :)
My understanding is that they keep everything
> 100 GB "with an optimized code" could be enough to produce an HDT like that.
The current software definitely cannot handle wikidata with 100GB. It was tried
before and it failed.
I'm glad to see that new code will be released to handle large files. After
skimming that paper it looks like they
> a new dump of Wikidata in HDT (with index) is
> available[http://www.rdfhdt.org/datasets/].
Thank you very much! Keep it up!
Out of curiosity, what computer did you use for this? IIRC it required >512GB
of RAM to function.
> You will see how Wikidata has become huge compared to other
lue.
We also made an infobox mockup for the Eiffel Tower for our grant proposal with
a sync button next to the Infobox property:
https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#Prototype_with_more_focus[https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#P
> I was more expecting technical questions here, but it seems there is interest
> in how the economics work. However, this part is not easy to write for me.
I'd personally like to test a demo of the Databus. I'd also like to see a
complete list of all the graphs that are available.
Is this a question for Sebastian, or are you talking on behalf of the project?
Sent: Tuesday, May 08, 2018 at 5:10 PM
From: "Thad Guidry" <thadgui...@gmail.com>
To: "Discussion list for the Wikidata project" <wikidata@lists.wikimedia.org>
Cc: "Laura Mor
the Wikidata project" <wikidata@lists.wikimedia.org>,
"Laura Morales" <laure...@mail.com>
Subject: Re: [Wikidata] DBpedia Databus (alpha version)
Hi Laura,
I don't understand, is this just another project built on DBPedia, or a project
to replace DBPedia entirely?
a valid quest
I don't understand, is this just another project built on DBPedia, or a project
to replace DBPedia entirely? Are you a DBPedia maintainer?
Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann"
To: "Discussion list for the Wikidata project."
* T H A N KY O U *
> On 7 Nov I created an HDT file based on the then current download link
> from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
Thank you very very much Wouter!! This is great!
Out of curiosity, could you please share some info about the machine that
> If you want to know when, why, where, etc, you have to
> check the qualified "full" statements.
All these qualifiers are encoded as additional triples in "all", correct?
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
Can somebody please explain (in simple terms) what's the difference between
"all" and "truthy" RDF dumps? I've read the explanation available on the wiki
[1] but I still don't get it.
If I'm just a user of the data, because I want to retrieve information about a
particular item and link items
> I am not sure where you are trying to do this and how but
> https://www.wikidata.org/wiki/Special:GoToLinkedPage[https://www.wikidata.org/wiki/Special:GoToLinkedPage]
> might be useful. You can call it with an item ID and a wiki code in the URL
> and it will redirect you to the article on
> schema:about connects Wikidata item with Wikipedias, e.g.,
>
> Wikidata Query Service: "SELECT * WHERE { ?page schema:about wd:Q80 }"
>
> The triple is also available directly from the MediaWiki entity:
>
> https://www.wikidata.org/entity/Q80.nt
Thank you! I was looking for "outgoing" links
How can I get the Wikipedia URL of a wikibase:Item ID? Searching online I could
only find how to do this using the Mediawiki API, but I was wondering if I can
extract/generate URLs from the wikidata graph itself.
Thanks.
___
Wikidata mailing list
> drops `a wikibase:Item` and `a wikibase:Statement` types
off topic but... why drop `a wikibase:Item`? Without this it seems impossible
to retrieve a list of items.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
How many triples does wikidata have? The old dump from rdfhdt seem to have
about 2 billion, which means wikidata doubled the number of triples in less
than a year?
Sent: Tuesday, November 07, 2017 at 3:24 PM
From: "Jérémie Roquet"
To: "Discussion list for the
> I’ve created a Phabricator task (https://phabricator.wikimedia.org/T179681)
> for providing a HDT dump, let’s see if someone else (ideally from the ops
> team) responds to it. (I’m not familiar with the systems we currently use for
> the dumps, so I can’t say if they have enough resources for
> Thank you for this feedback, Laura.
> Is the hdt index you got available somewhere on the cloud?
Unfortunately it's not. It was a private link that was temporarily shared with
me by email. I guess I could re-upload the file somewhere else myself, but my
uplink is really slow (1Mbps).
Hello list,
a very kind person from this list has generated the .hdt.index file for me,
using the 1-year old wikidata HDT file available at the rdfhdt website. So I
was finally able to setup a working local endpoint using HDT+Fuseki. Set up was
easy, launch time (for Fuseki) also was quick (a
> There is also a command line tool called hdtsparql in the hdt-java
distribution that allows exactly this. It used to support only SELECT
queries, but I've enhanced it to support CONSTRUCT, DESCRIBE and ASK
queries too. There are some limitations, for example only CSV output is
supported for
> I am currently downloading the latest ttl file. On a 250gig ram machine. I
> will see if that is sufficient to run the conversion Otherwise we have
> another busy one with around 310 gig.
Thank you!
> For querying I use the Jena query engine. I have created a module called
> HDTQuery
> It's a machine with 378 GiB of RAM and 64 threads running Scientific
> Linux 7.2, that we use mainly for benchmarks.
>
> Building the index was really all about memory because the CPUs have
> actually a lower per-thread performance (2.30 GHz vs 3.5 GHz) compared
> to those of my regular
> I've just loaded the provided hdt file on a big machine (32 GiB wasn't
enough to build the index but ten times this is more than enough)
Could you please share a bit about your setup? Do you have a machine with 320GB
of RAM?
Could you please also try to convert wikidata.ttl to hdt using
> @Laura: I suspect Wouter wants to know if he "ignores" the previous errors
> and proposes a rather incomplete dump (just for you) or waits for Stas'
> feedback.
OK. I wonder though, if it would be possible to setup a regular HDT dump
alongside the already regular dumps. Looking at the dumps
@Wouter
> Thanks for the pointer! I'm downloading from
> https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz now.
Any luck so far?
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
> You can also use the Wikimedia Commons API made by Magnus:
https://tools.wmflabs.org/magnus-toolserver/commonsapi.php
> It will also gives you metadata about the image (so you'll be able to cite
> the author of the image when you reuse it).
Is the same metadata also available in the Turtle/HDT
- wikidata entry: https://www.wikidata.org/wiki/Q161234
- "logo image" property pointing to:
https://commons.wikimedia.org/wiki/File:0_A.D._logo.png
However... that's a HTML page... How do I get a reference to the .png file? In
this case
> KBox is an alternative to other existing architectures for publishing KB such
> as SPARQL endpoints (e.g. LDFragments, Virtuoso), and Dump files.
> I should add that you can do federated query with KBox as as easier as you
> can do with SPARQL endpoints.
OK, but I still fail to see what is
> @Laura : you mean this list http://lov.okfn.org/lov.nq.gz ?
> I can download it !!
>
> Which one ? Please send me the URL and I can fix it !!
Yes you can download it, but the nq file is broken. It doesn't validate because
some URIs contains white spaces, and some triples have an empty
> No, the idea is that each organization will have its own KNS, so users can
> add the KNS that they want.
How would this compare with a traditional SPARQL endpoint + "federated
queries", or with "linked fragments"?
___
Wikidata mailing list
> Thanks to report that. I remember one issue that I added here
> https://github.com/pyvandenbussche/lov/issues/66
Yup, still broken! I've tried just now.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
> Also, for avoiding your users to re-create the models, you can pre-load
> "models" from LOV catalog.
The LOV RDF dump is broken instead. Or at least it still was the last time I
checked. And I don't broken in the sense of Wikidata, that is with some wrong
types, I mean broken as it doesn't
> @Wouter: As Stas said, you might report that error. I don't agree with Laura
> who tried to under estimate that "syntax error". It's also about quality ;)
Don't get me wrong, I am all in favor of data quality! :) So if this can be
fixed, it's better! The thing is, that I've seen so many
> The first part of the Turtle data stream seems to contain syntax errors for
> some of the XSD decimal literals. The first one appears on line 13,291:
>
> Notice that scientific notation is not allowed in the lexical form of
> decimals according to XML > Schema Part 2:
>
> is it possible to store a weighted adjacency matrix as an HDT instead of an
> RDF?
>
> Something like a list of entities for each entity, or even better a list of
> tuples for each entity.
> So that a tuple could be generalised with propoerties.
Sorry I don't know this, you would have to ask
> You can mount te jnl file directly to blazegraph so loading and indexing is
> not needed anymore.
How much larger would this be compared to the Turtle file?
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
647674624
On Fri, Oct 27, 2017 at 5:08 PM, Laura Morales <laure...@mail.com> wrote:
> Hello everyone,
>
> I'd like to ask if Wikidata could please offer a HDT [1] dump along with the
> already available Turtle dump [2]. HDT is a binary format to store RDF data,
> which is
> Would it be an idea if HDT remains unfeasible to place the journal file of
> blazegraph online?
> Yes, people need to use blazegraph if they want to access the files and query
> it but it could be an extra next to turtle dump?
How would a blazegraph journal file be better than a Turtle dump?
Hello everyone,
I'd like to ask if Wikidata could please offer a HDT [1] dump along with the
already available Turtle dump [2]. HDT is a binary format to store RDF data,
which is pretty useful because it can be queried from command line, it can be
used as a Jena/Fuseki source, and it also uses
Adding 2.2 million German organisations to
Wikidata
Laura,
Talk to OpenCorporates and ask those questions yourself.
Get involved ! :)
-Thad
+ThadGuidry[https://plus.google.com/+ThadGuidry]
On Wed, Oct 25, 2017 at 3:22 AM Laura Morales
<laure...@mail.com[mailto:laure...@mail.com]>
40 matches
Mail list logo