Re: Splitting data into graphs vs datasets

Daan Reid Thu, 22 Mar 2018 03:36:13 -0700

I would say that using separate datasets is a good idea if you have setsof graphs that just don't belong together. The dataset as anorganisational, abstract container is an excellent idea, in my opinion.


Regards,


Daan

On 22-03-18 11:22, Mikael Pesonen wrote:

Ok seems that using many datasets is not a good idea. I had no bias andnot having any issues with speed, just wanted to see what is best way togo.
On 21.3.2018 20:48, ajs6f wrote:
Those sure are good reasons for using named graphs. But what aboutusing different datasets too?
Consider that you may not be seeing such reasons because it may notactually be as good an idea.
Here's another reason to prefer graphs: There is a standard managementHTTP API for named graphs: SPARQL Graph Store. There is no equivalentfor datasets, so each product rolls its own. That's not good forflexibility if you have to move products.
As for performance, that will depend radically on the implementation.Jena TIM, for example, using hashing for its indexes, so thedifference between having a lot of quads in a dataset and a few isn'tlikely to be that much. Other impls will vary.
Are you sure that performance is going to be improved by separatingout datasets? (I.e. is that the measured bottleneck?) Are you nowhaving problems with queries accidentally querying data they shouldn'tsee, and can your queries be rewritten to fix that (which might alsoimprove performance)? (Jena has a permissions framework that cansecure information down to the individual triple.)
ajs6f
On Mar 21, 2018, at 6:35 AM, Mikael Pesonen<[email protected]> wrote:
Those sure are good reasons for using named graphs. But what aboutusing different datasets too?
btw, I couldn't find info on how to run many datasets with Fuseki. isit just one dataset per fuseki process? -loc parameter forfuseki-server.jar?
Br

On 20.3.2018 14:22, Martynas Jusevičius wrote:
Provenance. With named graphs, it's easier to track where data camefrom:
who imported it, when etc.
You can also have meta-graphs about other graphs.

Also editing and updating data. You can load named graph contents (of
smallish size) in an editor, make changes and then store a newversion inthe same graph. You probably would not want to do this with a largedefault
graph.
On Tue, Mar 20, 2018 at 1:16 PM, Mikael Pesonen<[email protected]>
wrote:
Hi,

I'm using Fuseki GSP, and so far have put all data into one default
dataset and using graphs to split it.

If I'm right there would be benefits using more than one dataset
- better performance - each query is done inside a dataset so lessdata =
faster query
- protection of data - can't "accidentaly" query data from otherdatasets
Downsides:
- combining data from various datasets is heavier task

Is this correct? Any other things that should be considered?

Thank you

--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and
Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
<https://maps.google.com/?q=Etel%C3%A4ranta+10&entry=gmail&source=g>
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
<https://maps.google.com/?q=Kauppiaskatu+5+A&entry=gmail&source=g>
FI-20100 Turku
FINLAND
--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader'sand Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Splitting data into graphs vs datasets

Reply via email to