First this a bad practice:

http://people/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#Region "New
York" .

You should do
http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Region
dbpedia:NewYork .

that is ,
http://dbpedia.org/resource/New_York

possibly with another object property like
http://xmlns.com/foaf/0.1/based_near

I understand that you have a database of Vcard stuff, but one must keep in
mind that Semantic Web is all about creating links, filling strings is
secondary.



And then there is no trouble with string at all :) .

Jean-Marc Vanel
<http://163.172.179.125:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me>
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
 Chroniques jardin
<http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle>


Le ven. 15 févr. 2019 à 13:02, Ekaterina Danilova <
[email protected]> a écrit :

> Hello
> i would like to ask how TDB2 and Fuseki manages big amounts of string data
> (especially repeating data) and what it the best practices. Does it
> optimize it somehow? Or is it on us to do some improvements.
>
> For example, we have a TDB2 storage which we access via Fuseki and example
> named graph like this:
> [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Region,
> "New
> York"]
> [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Other,
> "long
> long string"]
> [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#NAME, "JOHN
> SMITH"]
>
> So, we have JohnSmith person with 2 properties - "Region" and "Other". One
> of them is short string of New York, other is long string.
> Assume we have 100 000 more people and many of them have same "Region" and
> "other" properties. So, what would be the best approach to storing such
> data?
>
> I created 10 000 more named graphs of people with different names but same
> other properties and tested the performance.
> First I checked 10 000 cases of reading the graphs like this and the
> average time was around 4.4 ms (no matter how long are the strings).
>
> Other option I considered is making "New York" a resource and storing it in
> "cities" named graph and doing the same thing with "long long string". So,
> the idea is to store the actual string only once.I tested reading the
> graphs again on 10 000 cases and didn't notice any change in performance.
> The average load time was still 4.4 ms when instead of "New York" and "Long
> long String" we had resources URIs.
> However, to get the full data, we need to add the actual resources to our
> original JohnSmith graph, which adds overhead since we have to get 2 more
> named graphs. So, it causes quite expectable drop of performance.
>
> So, according to my tests the first case (the one described in the graph
> example) performed the best, but it feels like we are storing too much
> extra information. So, I still wanted to ask on your opinions to such
> approach and learn if TDB store makes some inner optimization to the data.
>

Reply via email to