Re: InsertPic_(12-07(12-07-21-26-31)

2018-12-16 Thread Vincent Ventresque

Are you sure that named graphs have better performance?


I'm not a specialist, and I'd like to know other users' opinion about that 
question. I thinks it depends both on the structure of your data and the 
queries you run.
My use case consisted in using a dataset with +/- 168 M triples, including +/- 10 M 
titles (triples like ?s dcterms:title "some words"), and running queries with 
FILTER(REGEX()) :

-- a query like this took a long time (1 min or more) :

SELECT * WHERE {

?edition dcterms:title ?title .
FILTER(REGEX(?title, $word, 'i')) .

?edition rdarelationships:expressionManifested ?expr .

?expression bnf-roles:r70 ?author .

?author foaf:familyName $name

}

-

-- whereas a query like this one takes about 1 sec :

SELECT * WHERE {

GRAPH <:titles> {
?edition dcterms:title ?title .
FILTER(REGEX(?title, $word, 'i'))
}

?edition rdarelationships:expressionManifested ?expr .

?expression bnf-roles:r70 ?author .

?author foaf:familyName $name

}


 
So, depending on your data, it might be more efficient to use named graphs, e. g. :


SELECT  ?rel (count (?rel) as ?co)
where {
GRAPH <:names> { ?object MKG:English_name 'Pyrilamine' } # <-- HERE : named 
graph for names
?RelAttr owl:annotatedTarget ?object ;
 owl:annotatedSource ?subject ;
 owl:annotatedProperty ?rel ;
 MKG:pyear '1967' .
}
group by ?rel
limit 10
 


---


Then how to build named graphs?


There are several ways, here are 3 methods :

#1) when uploading files in Fuseki web interface, specify the graph URI for the 
file

#2) use tdbloader : java -Xms4096m -Xmx4096m -cp ./fuseki-server.jar 
tdb.tdbloader --graph=$namedGraph --tdb=$configFile $f

#3) use SPARQL INSERT + DELETE

#1 & #2 are fast, but all the triples of the file go into the same graph, so 
maybe you have to have to modify your files first.

#3 is slower, but you don't have to modify your files. see my question on 
StackOverflow : 
https://stackoverflow.com/questions/48500404/sparql-offset-whithout-order-by-to-get-all-results-of-a-query


 


Le 07/12/2018 à 18:16, HYP a écrit :

OK. I explain my project in the following


The KG schema is composed of a set of semantic types like disease or drugs, and 
a set of relations like treated_by(disease, drug).
Then each instance relation, like treated_by(disease_1, drug_1) has an 
annotation  property 'year' which means this triple occur in the 'year'.


My query has two steps. Firstly, query the related triples about some drug, 
like Pyrilamine, and group them according to the relation types and give a 
count. Secondly, query the related nodes in one relation type.


The first step query, like:


SELECT  ?rel (count (?rel) as ?co)
where {
 ?object MKG:English_name 'Pyrilamine' .
 ?RelAttr owl:annotatedTarget ?object ;
  owl:annotatedSource ?subject ;
  owl:annotatedProperty ?rel ;
  MKG:pyear '1967' .
 }
group by ?rel
limit 10


On 12/8/2018 01:00,ajs6f wrote:
Let's slow down here a bit.

We can't give you any reasonable advice until you tell us _much_ more about 
your work. What is the data like? What kinds of queries are you doing? How are 
you running them? What do you expect to happen?

Please give us a great deal more context.

ajs6f

On Dec 7, 2018, at 11:45 AM, HYP  wrote:





I store the 1.4B triples in two steps. Firstly, I made 886 rdf files, each of 
which contains 1615837 triples. Then, I upload them into TDB using Fuseki.
This is a huge job. Are you sure that named graphs have better performance?
Then how to build named graphs?


On 12/7/2018 23:48,Vincent Ventresque wrote:
Do you mean -Xms = 64G ?

N.B. : with 1.4 B triples, you should have better performance using
named graphs.


Le 07/12/2018 à 16:37, 胡云苹 a écrit :
My memory is 64G and my setting is no upper limit.
On 12/7/2018 23:34,Vincent Ventresque
 wrote:

Hello

How do you run fuseki? you can increase java memory limit with
java options :

java -jar -Xms4096m -Xmx4096m fuseki-server.jar

(where  4096m = 4 Go, but could be 8192m or more)

N.B. : I'm not a specialist, don't know if -Xms and -Xmx must be
the same

If I remember correctly the memory limit is 1.2 Go when you run
'./fuseki start' or './fuseki-server'

Vincent




Le 07/12/2018 à 16:23, 胡云苹 a écrit :

Dear jena,

I have built a graph with 1.4 billion triples and store it as a
data set in TDB through Fuseki upload system. Now, I try to make
some sparql search, the speed is very slow.

For example, when I make the sqarql in Fuseki in the following,
it takes 50 seconds. How can I improve the speed?



-








Re: sparql 1.4 billion triples

2018-12-16 Thread Dick Murray
Be very careful using vmtouch especially if you call -dl as you could very
easily and quickly kill a system. I've used this tool on cloud VM's to
mitigate cycle times, think DBAN due to public nature of hardware. It's a
fast way to an irked OS thrashing around.

Dick

On Sun, 16 Dec 2018 19:57 Siddhesh Rane  I'll be happy to document this. I think FAQ would be a good place.
>
> I actually looked further into this and found that the vmtouch
> functionality is provided in the jdk itself.
> java.nio.MappedByteBuffer#load method will bring file pages in memory [1].
> The way it works is similar to vmtouch, i.e. reading a byte from each page
> to cause page fault and load that page in memory [2].
>
> [1]
>
> https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByteBuffer.html#load--
>
> [2]
>
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/nio/MappedByteBuffer.java#l156
>
>
> On Sun, 16 Dec 2018, 6:59 pm ajs6f 
> > This seems to be a Linux-only technique that relies on installing and
> > maintaining vmtouch, correct?
> >
> > It doesn't seem that we could support that as a general solution, but
> > would you be interested in writing something that gives the essentials up
> > for someplace in the Jena docs? I'll admit I'm not sure where it would
> best
> > go, but it might be very helpful to users who can take advantage of it.
> >
> > ajs6f
> >
> > > On Dec 16, 2018, at 6:11 AM, Siddhesh Rane 
> wrote:
> > >
> > > In-memory database has following limitations :
> > >
> > > 1) Time to create the database. Not a problem if you have a dedicated
> > > machine which runs 24/7 where you load data once and the process never
> > > exits. But a huge waste of time if you get hardware during certain time
> > > slots and you have to load data from the start.
> > >
> > > 2) In-memory database is all or nothing. If your dataset can't fit in
> > RAM,
> > > you are out of luck. I had tried using this but many times it would go
> > OOM.
> > > With vmtouch, you can load an index partially, until as much free RAM
> is
> > > available. Something is better than nothing.
> > >
> > > Vmtouch is not doing anything magical. Tdb already uses mmap. When run
> on
> > > its own, Linux will bring most of the index in RAM. But think about the
> > > time it will take for that to happen. If one query takes 50 seconds
> (I've
> > > seen it go to 500-1000s as well), then in 1 hour you would have run
> just
> > 72
> > > queries. If instead your speed was 1s/query you would have executed
> 3600
> > > queries and that would bring more of the index in RAM for future
> queries
> > to
> > > run fast as well. So its also the rate of speedup that matters.
> > > With vmtouch, you vmtouch at the beginning and it gives you a fast head
> > > start and then its your program maintaining the cache.
> > >
> > > Regards,
> > > Siddhesh
> > >
> > >
> > > On Sat, 15 Dec 2018, 9:15 pm ajs6f  > >
> > >> What is the advantage to doing that as opposed to using Jena's
> built-in
> > >> in-memory dataset?
> > >>
> > >> ajs6f
> > >>
> > >>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane 
> > wrote:
> > >>>
> > >>> Bring the entire database in RAM.
> > >>> Use "vmtouch "
> > >>> Get vmtouch from https://hoytech.com/vmtouch/
> > >>>
> > >>> I had used jena for 150M triples and my performance findings are
> > >> documented
> > >>> at
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
> > >>>
> > >>> Regards,
> > >>> Siddhesh
> > >>>
> > >>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn  > >>>
> >  Dear jena,
> >  I have built a graph with 1.4 billion triples and store it as a data
> > set
> >  in TDB  through Fuseki upload system.
> >  Now, I try to make some sparql search, the speed is very slow.
> > 
> >  For example, when I make the sqarql in Fuseki in the following, it
> > takes
> >  50 seconds.
> >  How can I improve the speed?
> >  --
> >  Best wishes!
> > 
> > 
> >  胡云苹
> >  浙江大学控制科学与工程学院
> >  浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
> >  Institute of Cyber-Systems and Control, College of Control Science
> and
> >  Engineering, Zhejiang University, Hangzhou 310027,P.R.China
> >  Email : y...@zju.edu.cn ;hyphy...@163.com
> > 
> > 
> > >>
> > >>
> >
> >
>


Re: sparql 1.4 billion triples

2018-12-16 Thread Siddhesh Rane
I'll be happy to document this. I think FAQ would be a good place.

I actually looked further into this and found that the vmtouch
functionality is provided in the jdk itself.
java.nio.MappedByteBuffer#load method will bring file pages in memory [1].
The way it works is similar to vmtouch, i.e. reading a byte from each page
to cause page fault and load that page in memory [2].

[1]
https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByteBuffer.html#load--

[2]
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/nio/MappedByteBuffer.java#l156


On Sun, 16 Dec 2018, 6:59 pm ajs6f  This seems to be a Linux-only technique that relies on installing and
> maintaining vmtouch, correct?
>
> It doesn't seem that we could support that as a general solution, but
> would you be interested in writing something that gives the essentials up
> for someplace in the Jena docs? I'll admit I'm not sure where it would best
> go, but it might be very helpful to users who can take advantage of it.
>
> ajs6f
>
> > On Dec 16, 2018, at 6:11 AM, Siddhesh Rane  wrote:
> >
> > In-memory database has following limitations :
> >
> > 1) Time to create the database. Not a problem if you have a dedicated
> > machine which runs 24/7 where you load data once and the process never
> > exits. But a huge waste of time if you get hardware during certain time
> > slots and you have to load data from the start.
> >
> > 2) In-memory database is all or nothing. If your dataset can't fit in
> RAM,
> > you are out of luck. I had tried using this but many times it would go
> OOM.
> > With vmtouch, you can load an index partially, until as much free RAM is
> > available. Something is better than nothing.
> >
> > Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
> > its own, Linux will bring most of the index in RAM. But think about the
> > time it will take for that to happen. If one query takes 50 seconds (I've
> > seen it go to 500-1000s as well), then in 1 hour you would have run just
> 72
> > queries. If instead your speed was 1s/query you would have executed 3600
> > queries and that would bring more of the index in RAM for future queries
> to
> > run fast as well. So its also the rate of speedup that matters.
> > With vmtouch, you vmtouch at the beginning and it gives you a fast head
> > start and then its your program maintaining the cache.
> >
> > Regards,
> > Siddhesh
> >
> >
> > On Sat, 15 Dec 2018, 9:15 pm ajs6f  >
> >> What is the advantage to doing that as opposed to using Jena's built-in
> >> in-memory dataset?
> >>
> >> ajs6f
> >>
> >>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane 
> wrote:
> >>>
> >>> Bring the entire database in RAM.
> >>> Use "vmtouch "
> >>> Get vmtouch from https://hoytech.com/vmtouch/
> >>>
> >>> I had used jena for 150M triples and my performance findings are
> >> documented
> >>> at
> >>>
> >>
> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
> >>>
> >>> Regards,
> >>> Siddhesh
> >>>
> >>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn  >>>
>  Dear jena,
>  I have built a graph with 1.4 billion triples and store it as a data
> set
>  in TDB  through Fuseki upload system.
>  Now, I try to make some sparql search, the speed is very slow.
> 
>  For example, when I make the sqarql in Fuseki in the following, it
> takes
>  50 seconds.
>  How can I improve the speed?
>  --
>  Best wishes!
> 
> 
>  胡云苹
>  浙江大学控制科学与工程学院
>  浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
>  Institute of Cyber-Systems and Control, College of Control Science and
>  Engineering, Zhejiang University, Hangzhou 310027,P.R.China
>  Email : y...@zju.edu.cn ;hyphy...@163.com
> 
> 
> >>
> >>
>
>


Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Jean-Marc Vanel
Sorry , I sent a bad link on the list, my assembler file is really this:
https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl




Le dim. 16 déc. 2018 à 13:40, Marco Neumann  a
écrit :

> you are missing the text index in the assembler.
>
> On Sun, Dec 16, 2018 at 12:08 PM Jean-Marc Vanel  >
> wrote:
>
> > Yes indeed,
> > exactly with this assembler file:
> >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial.assembler.ttl
> >
> > And, when the Jena based application is started with this same assembler
> > file, the spatial queries work , as said above in this thread.
> >
> >
> > Le dim. 16 déc. 2018 à 12:18, Marco Neumann  a
> > écrit :
> >
> > > did you create the text index with jena.textindexer
> > > --desc=//config.ttl?
> > >
> > >
> > > On Sun, Dec 16, 2018 at 8:48 AM Jean-Marc Vanel <
> > jeanmarc.va...@gmail.com>
> > > wrote:
> > >
> > > > I fixed a stupid error in text index URI, resulting from pasting, in
> > new
> > > > file jena.spatial+text.assembler.ttl:
> > > >
> > > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> > > >
> > > > Now, it reads:
> > > > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > > > rdf:type text:TextDataset ;
> > > > spatial:dataset   <#dataset> ;
> > > > spatial:index <#indexLucene> ;
> > > > text:dataset   <#dataset> ;
> > > > text:index* <#indexLuceneText>* ;
> > > > .
> > > >
> > > > But still it says "Failed to find the text index" .
> > > >
> > > >
> > > > Le dim. 16 déc. 2018 à 09:34, Jean-Marc Vanel <
> > jeanmarc.va...@gmail.com>
> > > a
> > > > écrit :
> > > >
> > > > > So I tried a new  file jena.spatial+text.assembler.ttl with
> separate
> > > > > Lucene indices for spatial and text:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> > > > >
> > > > > It defines a Dataset with both Dataset types and both Lucene
> indices:
> > > > >
> > > > > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > > > > rdf:type text:TextDataset ;
> > > > > spatial:dataset   <#dataset> ;
> > > > > spatial:index <#indexLucene> ;
> > > > > text:dataset   <#dataset> ;
> > > > > text:index <#indexLucene> ;
> > > > > .
> > > > >
> > > > > Alas , when querying , it says "Failed to find the text index" :
> > > > >
> > > > > WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the
> > text
> > > > > index : tried context and as a text-enabled dataset
> > > > > WARN  org.apache.jena.query.text.TextQueryPF - No text index - no
> > text
> > > > > search performed
> > > > >
> > > > > Is it because rdf:type spatial:SpatialDataset is asserted first in
> > > > > assembler file?
> > > > > So, defining a hybrid Dataset does not work anymore by TTL
> > > specification
> > > > > than by JVM code specification.
> > > > > I definitely need more experts' advice .
> > > > >
> > > > >
> > > > >
> > > > > Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :
> > > > >
> > > > >> > On Dec 15, 2018, at 9:59 AM, Marco Neumann <
> > marco.neum...@gmail.com
> > > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> *Question*: does that make sense to have a unique Lucene index
> > for
> > > > >> text and
> > > > >> >> space ?
> > > > >> >> It is possible at all? If yes, is it good in terms of disk
> space
> > > and
> > > > >> >> performance?
> > > > >> >> Like this:
> > > > >> >> <#indexLucene> a text:TextIndexLucene ;
> > > > >> >>   a spatial:SpatialIndexLucene ;
> > > > >> >>   # etc ...
> > > > >>
> > > > >> I'm sure that would be ideal for many cases, but I'm not at all
> sure
> > > > that
> > > > >> the same index can answer queries of both kinds. Perhaps we can
> > > combine
> > > > >> fields from both, but are the relationships between tuple and
> index
> > > > record
> > > > >> the same in both cases?
> > > > >>
> > > > >> Otherwise, I know we had a conversation at some point in the past
> on
> > > one
> > > > >> of the lists about trying to factor out commonalities between
> > > > jena-spatial
> > > > >> and jena-text, but it didn't go very far at that time and I don't
> > know
> > > > what
> > > > >> the intervening years have done to make it more or less feasible.
> > > There
> > > > >> have been many changes to jena-text in that time and the new
> spatial
> > > > module
> > > > >> is a whole new story. I'd put a link here but searching
> > > > lists.apache.org
> > > > >> hasn't brought it up for me.
> > > > >>
> > > > >> ajs6f
> > > > >>
> > > > >>
> > > > >
> > > > > --
> > > > > Jean-Marc Vanel
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > > > > <
> > > >
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> > > > >
> > > > > 

Re: sparql 1.4 billion triples

2018-12-16 Thread ajs6f
This seems to be a Linux-only technique that relies on installing and 
maintaining vmtouch, correct?

It doesn't seem that we could support that as a general solution, but would you 
be interested in writing something that gives the essentials up for someplace 
in the Jena docs? I'll admit I'm not sure where it would best go, but it might 
be very helpful to users who can take advantage of it.

ajs6f

> On Dec 16, 2018, at 6:11 AM, Siddhesh Rane  wrote:
> 
> In-memory database has following limitations :
> 
> 1) Time to create the database. Not a problem if you have a dedicated
> machine which runs 24/7 where you load data once and the process never
> exits. But a huge waste of time if you get hardware during certain time
> slots and you have to load data from the start.
> 
> 2) In-memory database is all or nothing. If your dataset can't fit in RAM,
> you are out of luck. I had tried using this but many times it would go OOM.
> With vmtouch, you can load an index partially, until as much free RAM is
> available. Something is better than nothing.
> 
> Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
> its own, Linux will bring most of the index in RAM. But think about the
> time it will take for that to happen. If one query takes 50 seconds (I've
> seen it go to 500-1000s as well), then in 1 hour you would have run just 72
> queries. If instead your speed was 1s/query you would have executed 3600
> queries and that would bring more of the index in RAM for future queries to
> run fast as well. So its also the rate of speedup that matters.
> With vmtouch, you vmtouch at the beginning and it gives you a fast head
> start and then its your program maintaining the cache.
> 
> Regards,
> Siddhesh
> 
> 
> On Sat, 15 Dec 2018, 9:15 pm ajs6f  
>> What is the advantage to doing that as opposed to using Jena's built-in
>> in-memory dataset?
>> 
>> ajs6f
>> 
>>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane  wrote:
>>> 
>>> Bring the entire database in RAM.
>>> Use "vmtouch "
>>> Get vmtouch from https://hoytech.com/vmtouch/
>>> 
>>> I had used jena for 150M triples and my performance findings are
>> documented
>>> at
>>> 
>> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
>>> 
>>> Regards,
>>> Siddhesh
>>> 
>>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn >> 
 Dear jena,
 I have built a graph with 1.4 billion triples and store it as a data set
 in TDB  through Fuseki upload system.
 Now, I try to make some sparql search, the speed is very slow.
 
 For example, when I make the sqarql in Fuseki in the following, it takes
 50 seconds.
 How can I improve the speed?
 --
 Best wishes!
 
 
 胡云苹
 浙江大学控制科学与工程学院
 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
 Institute of Cyber-Systems and Control, College of Control Science and
 Engineering, Zhejiang University, Hangzhou 310027,P.R.China
 Email : y...@zju.edu.cn ;hyphy...@163.com
 
 
>> 
>> 



Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Marco Neumann
you are missing the text index in the assembler.

On Sun, Dec 16, 2018 at 12:08 PM Jean-Marc Vanel 
wrote:

> Yes indeed,
> exactly with this assembler file:
>
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial.assembler.ttl
>
> And, when the Jena based application is started with this same assembler
> file, the spatial queries work , as said above in this thread.
>
>
> Le dim. 16 déc. 2018 à 12:18, Marco Neumann  a
> écrit :
>
> > did you create the text index with jena.textindexer
> > --desc=//config.ttl?
> >
> >
> > On Sun, Dec 16, 2018 at 8:48 AM Jean-Marc Vanel <
> jeanmarc.va...@gmail.com>
> > wrote:
> >
> > > I fixed a stupid error in text index URI, resulting from pasting, in
> new
> > > file jena.spatial+text.assembler.ttl:
> > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> > >
> > > Now, it reads:
> > > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > > rdf:type text:TextDataset ;
> > > spatial:dataset   <#dataset> ;
> > > spatial:index <#indexLucene> ;
> > > text:dataset   <#dataset> ;
> > > text:index* <#indexLuceneText>* ;
> > > .
> > >
> > > But still it says "Failed to find the text index" .
> > >
> > >
> > > Le dim. 16 déc. 2018 à 09:34, Jean-Marc Vanel <
> jeanmarc.va...@gmail.com>
> > a
> > > écrit :
> > >
> > > > So I tried a new  file jena.spatial+text.assembler.ttl with separate
> > > > Lucene indices for spatial and text:
> > > >
> > > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> > > >
> > > > It defines a Dataset with both Dataset types and both Lucene indices:
> > > >
> > > > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > > > rdf:type text:TextDataset ;
> > > > spatial:dataset   <#dataset> ;
> > > > spatial:index <#indexLucene> ;
> > > > text:dataset   <#dataset> ;
> > > > text:index <#indexLucene> ;
> > > > .
> > > >
> > > > Alas , when querying , it says "Failed to find the text index" :
> > > >
> > > > WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the
> text
> > > > index : tried context and as a text-enabled dataset
> > > > WARN  org.apache.jena.query.text.TextQueryPF - No text index - no
> text
> > > > search performed
> > > >
> > > > Is it because rdf:type spatial:SpatialDataset is asserted first in
> > > > assembler file?
> > > > So, defining a hybrid Dataset does not work anymore by TTL
> > specification
> > > > than by JVM code specification.
> > > > I definitely need more experts' advice .
> > > >
> > > >
> > > >
> > > > Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :
> > > >
> > > >> > On Dec 15, 2018, at 9:59 AM, Marco Neumann <
> marco.neum...@gmail.com
> > >
> > > >> wrote:
> > > >> >
> > > >> >> *Question*: does that make sense to have a unique Lucene index
> for
> > > >> text and
> > > >> >> space ?
> > > >> >> It is possible at all? If yes, is it good in terms of disk space
> > and
> > > >> >> performance?
> > > >> >> Like this:
> > > >> >> <#indexLucene> a text:TextIndexLucene ;
> > > >> >>   a spatial:SpatialIndexLucene ;
> > > >> >>   # etc ...
> > > >>
> > > >> I'm sure that would be ideal for many cases, but I'm not at all sure
> > > that
> > > >> the same index can answer queries of both kinds. Perhaps we can
> > combine
> > > >> fields from both, but are the relationships between tuple and index
> > > record
> > > >> the same in both cases?
> > > >>
> > > >> Otherwise, I know we had a conversation at some point in the past on
> > one
> > > >> of the lists about trying to factor out commonalities between
> > > jena-spatial
> > > >> and jena-text, but it didn't go very far at that time and I don't
> know
> > > what
> > > >> the intervening years have done to make it more or less feasible.
> > There
> > > >> have been many changes to jena-text in that time and the new spatial
> > > module
> > > >> is a whole new story. I'd put a link here but searching
> > > lists.apache.org
> > > >> hasn't brought it up for me.
> > > >>
> > > >> ajs6f
> > > >>
> > > >>
> > > >
> > > > --
> > > > Jean-Marc Vanel
> > > >
> > > >
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > > > <
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> > > >
> > > > Rule-based programming, Semantic Web
> > > > +33 (0)6 89 16 29 52
> > > > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://
> > irc.freenode.net#eulergui
> > > >  Chroniques jardin
> > > > <
> > >
> >
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> > > >
> > > >
> > >
> > >
> > > --
> > > Jean-Marc Vanel
> > >
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > > <
> > >
> >
> 

Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Jean-Marc Vanel
Yes indeed,
exactly with this assembler file:
https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial.assembler.ttl

And, when the Jena based application is started with this same assembler
file, the spatial queries work , as said above in this thread.


Le dim. 16 déc. 2018 à 12:18, Marco Neumann  a
écrit :

> did you create the text index with jena.textindexer
> --desc=//config.ttl?
>
>
> On Sun, Dec 16, 2018 at 8:48 AM Jean-Marc Vanel 
> wrote:
>
> > I fixed a stupid error in text index URI, resulting from pasting, in new
> > file jena.spatial+text.assembler.ttl:
> >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> >
> > Now, it reads:
> > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > rdf:type text:TextDataset ;
> > spatial:dataset   <#dataset> ;
> > spatial:index <#indexLucene> ;
> > text:dataset   <#dataset> ;
> > text:index* <#indexLuceneText>* ;
> > .
> >
> > But still it says "Failed to find the text index" .
> >
> >
> > Le dim. 16 déc. 2018 à 09:34, Jean-Marc Vanel 
> a
> > écrit :
> >
> > > So I tried a new  file jena.spatial+text.assembler.ttl with separate
> > > Lucene indices for spatial and text:
> > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> > >
> > > It defines a Dataset with both Dataset types and both Lucene indices:
> > >
> > > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > > rdf:type text:TextDataset ;
> > > spatial:dataset   <#dataset> ;
> > > spatial:index <#indexLucene> ;
> > > text:dataset   <#dataset> ;
> > > text:index <#indexLucene> ;
> > > .
> > >
> > > Alas , when querying , it says "Failed to find the text index" :
> > >
> > > WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the text
> > > index : tried context and as a text-enabled dataset
> > > WARN  org.apache.jena.query.text.TextQueryPF - No text index - no text
> > > search performed
> > >
> > > Is it because rdf:type spatial:SpatialDataset is asserted first in
> > > assembler file?
> > > So, defining a hybrid Dataset does not work anymore by TTL
> specification
> > > than by JVM code specification.
> > > I definitely need more experts' advice .
> > >
> > >
> > >
> > > Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :
> > >
> > >> > On Dec 15, 2018, at 9:59 AM, Marco Neumann  >
> > >> wrote:
> > >> >
> > >> >> *Question*: does that make sense to have a unique Lucene index for
> > >> text and
> > >> >> space ?
> > >> >> It is possible at all? If yes, is it good in terms of disk space
> and
> > >> >> performance?
> > >> >> Like this:
> > >> >> <#indexLucene> a text:TextIndexLucene ;
> > >> >>   a spatial:SpatialIndexLucene ;
> > >> >>   # etc ...
> > >>
> > >> I'm sure that would be ideal for many cases, but I'm not at all sure
> > that
> > >> the same index can answer queries of both kinds. Perhaps we can
> combine
> > >> fields from both, but are the relationships between tuple and index
> > record
> > >> the same in both cases?
> > >>
> > >> Otherwise, I know we had a conversation at some point in the past on
> one
> > >> of the lists about trying to factor out commonalities between
> > jena-spatial
> > >> and jena-text, but it didn't go very far at that time and I don't know
> > what
> > >> the intervening years have done to make it more or less feasible.
> There
> > >> have been many changes to jena-text in that time and the new spatial
> > module
> > >> is a whole new story. I'd put a link here but searching
> > lists.apache.org
> > >> hasn't brought it up for me.
> > >>
> > >> ajs6f
> > >>
> > >>
> > >
> > > --
> > > Jean-Marc Vanel
> > >
> > >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > > <
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> > >
> > > Rule-based programming, Semantic Web
> > > +33 (0)6 89 16 29 52
> > > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://
> irc.freenode.net#eulergui
> > >  Chroniques jardin
> > > <
> >
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> > >
> > >
> >
> >
> > --
> > Jean-Marc Vanel
> >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > <
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> > >
> > Rule-based programming, Semantic Web
> > +33 (0)6 89 16 29 52
> > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
> >  Chroniques jardin
> > <
> >
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> > >
> >
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>


-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject

Re: sparql 1.4 billion triples

2018-12-16 Thread Jean-Marc Vanel
yphu,
you didn't share your query.
Maybe the query have questionable features.

Did you try a simple but useful query, like getting the first 10
foaf:Person ?


Le ven. 7 déc. 2018 à 15:53, y...@zju.edu.cn  a écrit :

> Dear jena,
> I have built a graph with 1.4 billion triples and store it as a data set
> in TDB  through Fuseki upload system.
> Now, I try to make some sparql search, the speed is very slow.
>
> For example, when I make the sqarql in Fuseki in the following, it takes
> 50 seconds.
> How can I improve the speed?
> --
> Best wishes!
>
>
> 胡云苹
> 浙江大学控制科学与工程学院
> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
> Institute of Cyber-Systems and Control, College of Control Science and
> Engineering, Zhejiang University, Hangzhou 310027,P.R.China
> Email : y...@zju.edu.cn ;hyphy...@163.com
>
>

-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject

Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
 Chroniques jardin



Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Marco Neumann
did you create the text index with jena.textindexer
--desc=//config.ttl?


On Sun, Dec 16, 2018 at 8:48 AM Jean-Marc Vanel 
wrote:

> I fixed a stupid error in text index URI, resulting from pasting, in new
> file jena.spatial+text.assembler.ttl:
>
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
>
> Now, it reads:
> :spatial_dataset rdf:type spatial:SpatialDataset ;
> rdf:type text:TextDataset ;
> spatial:dataset   <#dataset> ;
> spatial:index <#indexLucene> ;
> text:dataset   <#dataset> ;
> text:index* <#indexLuceneText>* ;
> .
>
> But still it says "Failed to find the text index" .
>
>
> Le dim. 16 déc. 2018 à 09:34, Jean-Marc Vanel  a
> écrit :
>
> > So I tried a new  file jena.spatial+text.assembler.ttl with separate
> > Lucene indices for spatial and text:
> >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
> >
> > It defines a Dataset with both Dataset types and both Lucene indices:
> >
> > :spatial_dataset rdf:type spatial:SpatialDataset ;
> > rdf:type text:TextDataset ;
> > spatial:dataset   <#dataset> ;
> > spatial:index <#indexLucene> ;
> > text:dataset   <#dataset> ;
> > text:index <#indexLucene> ;
> > .
> >
> > Alas , when querying , it says "Failed to find the text index" :
> >
> > WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the text
> > index : tried context and as a text-enabled dataset
> > WARN  org.apache.jena.query.text.TextQueryPF - No text index - no text
> > search performed
> >
> > Is it because rdf:type spatial:SpatialDataset is asserted first in
> > assembler file?
> > So, defining a hybrid Dataset does not work anymore by TTL specification
> > than by JVM code specification.
> > I definitely need more experts' advice .
> >
> >
> >
> > Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :
> >
> >> > On Dec 15, 2018, at 9:59 AM, Marco Neumann 
> >> wrote:
> >> >
> >> >> *Question*: does that make sense to have a unique Lucene index for
> >> text and
> >> >> space ?
> >> >> It is possible at all? If yes, is it good in terms of disk space and
> >> >> performance?
> >> >> Like this:
> >> >> <#indexLucene> a text:TextIndexLucene ;
> >> >>   a spatial:SpatialIndexLucene ;
> >> >>   # etc ...
> >>
> >> I'm sure that would be ideal for many cases, but I'm not at all sure
> that
> >> the same index can answer queries of both kinds. Perhaps we can combine
> >> fields from both, but are the relationships between tuple and index
> record
> >> the same in both cases?
> >>
> >> Otherwise, I know we had a conversation at some point in the past on one
> >> of the lists about trying to factor out commonalities between
> jena-spatial
> >> and jena-text, but it didn't go very far at that time and I don't know
> what
> >> the intervening years have done to make it more or less feasible. There
> >> have been many changes to jena-text in that time and the new spatial
> module
> >> is a whole new story. I'd put a link here but searching
> lists.apache.org
> >> hasn't brought it up for me.
> >>
> >> ajs6f
> >>
> >>
> >
> > --
> > Jean-Marc Vanel
> >
> >
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> > <
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> > Rule-based programming, Semantic Web
> > +33 (0)6 89 16 29 52
> > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
> >  Chroniques jardin
> > <
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> >
> >
>
>
> --
> Jean-Marc Vanel
>
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> <
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> Rule-based programming, Semantic Web
> +33 (0)6 89 16 29 52
> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>  Chroniques jardin
> <
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> >
>


-- 


---
Marco Neumann
KONA


Re: sparql 1.4 billion triples

2018-12-16 Thread Siddhesh Rane
In-memory database has following limitations :

1) Time to create the database. Not a problem if you have a dedicated
machine which runs 24/7 where you load data once and the process never
exits. But a huge waste of time if you get hardware during certain time
slots and you have to load data from the start.

2) In-memory database is all or nothing. If your dataset can't fit in RAM,
you are out of luck. I had tried using this but many times it would go OOM.
With vmtouch, you can load an index partially, until as much free RAM is
available. Something is better than nothing.

Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
its own, Linux will bring most of the index in RAM. But think about the
time it will take for that to happen. If one query takes 50 seconds (I've
seen it go to 500-1000s as well), then in 1 hour you would have run just 72
queries. If instead your speed was 1s/query you would have executed 3600
queries and that would bring more of the index in RAM for future queries to
run fast as well. So its also the rate of speedup that matters.
With vmtouch, you vmtouch at the beginning and it gives you a fast head
start and then its your program maintaining the cache.

Regards,
Siddhesh


On Sat, 15 Dec 2018, 9:15 pm ajs6f  What is the advantage to doing that as opposed to using Jena's built-in
> in-memory dataset?
>
> ajs6f
>
> > On Dec 15, 2018, at 3:04 AM, Siddhesh Rane  wrote:
> >
> > Bring the entire database in RAM.
> > Use "vmtouch "
> > Get vmtouch from https://hoytech.com/vmtouch/
> >
> > I had used jena for 150M triples and my performance findings are
> documented
> > at
> >
> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
> >
> > Regards,
> > Siddhesh
> >
> > On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn  >
> >> Dear jena,
> >> I have built a graph with 1.4 billion triples and store it as a data set
> >> in TDB  through Fuseki upload system.
> >> Now, I try to make some sparql search, the speed is very slow.
> >>
> >> For example, when I make the sqarql in Fuseki in the following, it takes
> >> 50 seconds.
> >> How can I improve the speed?
> >> --
> >> Best wishes!
> >>
> >>
> >> 胡云苹
> >> 浙江大学控制科学与工程学院
> >> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
> >> Institute of Cyber-Systems and Control, College of Control Science and
> >> Engineering, Zhejiang University, Hangzhou 310027,P.R.China
> >> Email : y...@zju.edu.cn ;hyphy...@163.com
> >>
> >>
>
>


Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Jean-Marc Vanel
I fixed a stupid error in text index URI, resulting from pasting, in new
file jena.spatial+text.assembler.ttl:
https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl

Now, it reads:
:spatial_dataset rdf:type spatial:SpatialDataset ;
rdf:type text:TextDataset ;
spatial:dataset   <#dataset> ;
spatial:index <#indexLucene> ;
text:dataset   <#dataset> ;
text:index* <#indexLuceneText>* ;
.

But still it says "Failed to find the text index" .


Le dim. 16 déc. 2018 à 09:34, Jean-Marc Vanel  a
écrit :

> So I tried a new  file jena.spatial+text.assembler.ttl with separate
> Lucene indices for spatial and text:
>
> https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl
>
> It defines a Dataset with both Dataset types and both Lucene indices:
>
> :spatial_dataset rdf:type spatial:SpatialDataset ;
> rdf:type text:TextDataset ;
> spatial:dataset   <#dataset> ;
> spatial:index <#indexLucene> ;
> text:dataset   <#dataset> ;
> text:index <#indexLucene> ;
> .
>
> Alas , when querying , it says "Failed to find the text index" :
>
> WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the text
> index : tried context and as a text-enabled dataset
> WARN  org.apache.jena.query.text.TextQueryPF - No text index - no text
> search performed
>
> Is it because rdf:type spatial:SpatialDataset is asserted first in
> assembler file?
> So, defining a hybrid Dataset does not work anymore by TTL specification
> than by JVM code specification.
> I definitely need more experts' advice .
>
>
>
> Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :
>
>> > On Dec 15, 2018, at 9:59 AM, Marco Neumann 
>> wrote:
>> >
>> >> *Question*: does that make sense to have a unique Lucene index for
>> text and
>> >> space ?
>> >> It is possible at all? If yes, is it good in terms of disk space and
>> >> performance?
>> >> Like this:
>> >> <#indexLucene> a text:TextIndexLucene ;
>> >>   a spatial:SpatialIndexLucene ;
>> >>   # etc ...
>>
>> I'm sure that would be ideal for many cases, but I'm not at all sure that
>> the same index can answer queries of both kinds. Perhaps we can combine
>> fields from both, but are the relationships between tuple and index record
>> the same in both cases?
>>
>> Otherwise, I know we had a conversation at some point in the past on one
>> of the lists about trying to factor out commonalities between jena-spatial
>> and jena-text, but it didn't go very far at that time and I don't know what
>> the intervening years have done to make it more or less feasible. There
>> have been many changes to jena-text in that time and the new spatial module
>> is a whole new story. I'd put a link here but searching lists.apache.org
>> hasn't brought it up for me.
>>
>> ajs6f
>>
>>
>
> --
> Jean-Marc Vanel
>
> http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
> 
> Rule-based programming, Semantic Web
> +33 (0)6 89 16 29 52
> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>  Chroniques jardin
> 
>


-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject

Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
 Chroniques jardin



Re: Build by program a Dataset that is both textual and spatial

2018-12-16 Thread Jean-Marc Vanel
So I tried a new  file jena.spatial+text.assembler.ttl with separate Lucene
indices for spatial and text:
https://github.com/jmvanel/semantic_forms/blob/master/scala/jena.spatial%2Btext.assembler.ttl

It defines a Dataset with both Dataset types and both Lucene indices:

:spatial_dataset rdf:type spatial:SpatialDataset ;
rdf:type text:TextDataset ;
spatial:dataset   <#dataset> ;
spatial:index <#indexLucene> ;
text:dataset   <#dataset> ;
text:index <#indexLucene> ;
.

Alas , when querying , it says "Failed to find the text index" :

WARN  org.apache.jena.query.text.TextQueryPF - Failed to find the text
index : tried context and as a text-enabled dataset
WARN  org.apache.jena.query.text.TextQueryPF - No text index - no text
search performed

Is it because rdf:type spatial:SpatialDataset is asserted first in
assembler file?
So, defining a hybrid Dataset does not work anymore by TTL specification
than by JVM code specification.
I definitely need more experts' advice .



Le sam. 15 déc. 2018 à 21:10, ajs6f  a écrit :

> > On Dec 15, 2018, at 9:59 AM, Marco Neumann 
> wrote:
> >
> >> *Question*: does that make sense to have a unique Lucene index for text
> and
> >> space ?
> >> It is possible at all? If yes, is it good in terms of disk space and
> >> performance?
> >> Like this:
> >> <#indexLucene> a text:TextIndexLucene ;
> >>   a spatial:SpatialIndexLucene ;
> >>   # etc ...
>
> I'm sure that would be ideal for many cases, but I'm not at all sure that
> the same index can answer queries of both kinds. Perhaps we can combine
> fields from both, but are the relationships between tuple and index record
> the same in both cases?
>
> Otherwise, I know we had a conversation at some point in the past on one
> of the lists about trying to factor out commonalities between jena-spatial
> and jena-text, but it didn't go very far at that time and I don't know what
> the intervening years have done to make it more or less feasible. There
> have been many changes to jena-text in that time and the new spatial module
> is a whole new story. I'd put a link here but searching lists.apache.org
> hasn't brought it up for me.
>
> ajs6f
>
>

-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject

Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
 Chroniques jardin