Re: SPARQL limit doesn't work

2022-10-20 Thread Mikael Pesonen



I had to reset all Jena data since server ran out of memory with drop 
graph. Now with clean data paging works. I'll let you know if problem 
repeats.


On 20/10/2022 9.37, Lorenz Buehmann wrote:


On 19.10.22 13:44, Mikael Pesonen wrote:




On 19/10/2022 10.18, Lorenz Buehmann wrote:
Honestly - probably because of lack of knowledge - I don't see how 
that can happen with the text index. You have a single triple 
pattern that is querying the Lucene index for the given pattern and 
returns by default at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then 
doesn't return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still 
have to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could 
create similar dummy index.


What happens for a single property only? 


What does this mean?
you're querying two properties aka two fields in the Lucene query. 
What if you just use skos:prefLabel ?


Pagination should work as you're doing, the Lucene query is 
internally executed once, then cached - for later requests the same 
Lucene documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, 
the default limit of the Lucene text query is at most 10 000 
documents - and I don't think that the outer LIMIT would make it 
to the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
yy\"" "lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 
100 ~75 results. How do I page results correctly?






--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: Re: SPARQL limit doesn't work

2022-10-19 Thread Lorenz Buehmann



On 19.10.22 13:44, Mikael Pesonen wrote:




On 19/10/2022 10.18, Lorenz Buehmann wrote:
Honestly - probably because of lack of knowledge - I don't see how 
that can happen with the text index. You have a single triple pattern 
that is querying the Lucene index for the given pattern and returns 
by default at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then 
doesn't return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still 
have to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could 
create similar dummy index.


What happens for a single property only? 


What does this mean?
you're querying two properties aka two fields in the Lucene query. What 
if you just use skos:prefLabel ?


Pagination should work as you're doing, the Lucene query is 
internally executed once, then cached - for later requests the same 
Lucene documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, 
the default limit of the Lucene text query is at most 10 000 
documents - and I don't think that the outer LIMIT would make it to 
the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
yy\"" "lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 
100 ~75 results. How do I page results correctly?






Re: SPARQL limit doesn't work

2022-10-19 Thread Mikael Pesonen





On 19/10/2022 10.18, Lorenz Buehmann wrote:
Honestly - probably because of lack of knowledge - I don't see how 
that can happen with the text index. You have a single triple pattern 
that is querying the Lucene index for the given pattern and returns by 
default at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then 
doesn't return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still 
have to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could 
create similar dummy index.


What happens for a single property only? 


What does this mean?

Pagination should work as you're doing, the Lucene query is internally 
executed once, then cached - for later requests the same Lucene 
documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, 
the default limit of the Lucene text query is at most 10 000 
documents - and I don't think that the outer LIMIT would make it to 
the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
yy\"" "lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 
100 ~75 results. How do I page results correctly?




--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: Re: SPARQL limit doesn't work

2022-10-19 Thread Lorenz Buehmann
Honestly - probably because of lack of knowledge - I don't see how that 
can happen with the text index. You have a single triple pattern that is 
querying the Lucene index for the given pattern and returns by default 
at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then doesn't 
return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still have 
to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?

What happens for a single property only? Pagination should work as 
you're doing, the Lucene query is internally executed once, then cached 
- for later requests the same Lucene documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - 
and I don't think that the outer LIMIT would make it to the Lucene 
request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 
~75 results. How do I page results correctly?




Re: SPARQL limit doesn't work

2022-10-18 Thread Mikael Pesonen



Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - 
and I don't think that the outer LIMIT would make it to the Lucene 
request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 
~75 results. How do I page results correctly?


--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: SPARQL limit doesn't work

2022-10-18 Thread Lorenz Buehmann
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - and 
I don't think that the outer LIMIT would make it to the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 
~75 results. How do I page results correctly?


SPARQL limit doesn't work

2022-10-18 Thread Mikael Pesonen



I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 ~75 
results. How do I page results correctly?