Re: Getting Symmetric Concise Bounded Description with Fuseki

2018-03-03 Thread James Anderson
good morning;

> On 2018-03-03, at 17:18, Andy Seaborne  wrote:
> 
> ...
> 
> DESCRIBE isn't claiming to be CBD - it does not add the reifications for 
> example. It is the bNode closure and you're right - it's in the forward 
> direction only.

as described, the result is a cbd. the cbd definition describes how to include 
information from reified statements, but does not require it to be included, 
just as it includes provisions for limiting extent.
the request is for a “symmetric concise bounded description”.

> 
>Andy
> 
> On 03/03/18 13:03, Reto Gmür wrote:
>> Hi all,
>> I've noticed that a DESCRIBE query returns the Concise Bounded Description 
>> (CBD) of a resource, i.e. it expands forward properties to the next 
>> non-blank node. Is it possible to configure FUSEKI to also follow properties 
>> backwards when answering a DESCRIBE query? Or is there another query that 
>> would return that? By using the non-standard possibility to query for 
>> blank-nodes I could get the full resource context with multiple queries but 
>> I would refer to do this with a single query and not to use any non-standard 
>> SPARQL feature.
>> Cheers,
>> Reto



Re: Getting Symmetric Concise Bounded Description with Fuseki

2018-03-03 Thread Andy Seaborne

Hi Reto,

Only be adding java code, which can be dynamically loaded, to change the 
describe handler.


A different approach is to build up a CONSTRUCT successively, 
introspecting on the data eachtime.


start with

CONSTRUCT WHERE { ?x ?p ?subject }

then see which properties have blank node subjects and expand to:

CONSTRUCT WHERE { ?x ?p ?subject .
  ?o1 :someProperty1 ?x .
  ?o2 :someProperty2 ?x . }

adding to the pattern to expand it out. That way, it re-locates the 
blank nodes each time.


DESCRIBE isn't claiming to be CBD - it does not add the reifications for 
example. It is the bNode closure and you're right - it's in the forward 
direction only.


Andy

On 03/03/18 13:03, Reto Gmür wrote:

Hi all,

I've noticed that a DESCRIBE query returns the Concise Bounded Description 
(CBD) of a resource, i.e. it expands forward properties to the next non-blank 
node. Is it possible to configure FUSEKI to also follow properties backwards 
when answering a DESCRIBE query? Or is there another query that would return 
that? By using the non-standard possibility to query for blank-nodes I could 
get the full resource context with multiple queries but I would refer to do 
this with a single query and not to use any non-standard SPARQL feature.

Cheers,
Reto



Re: Getting Symmetric Concise Bounded Description with Fuseki

2018-03-03 Thread ajs6f
You can replace Jena's DESCRIBE behavior with whatever you like:

https://jena.apache.org/documentation/query/extension.html#describe-handlers

ajs6f

> On Mar 3, 2018, at 8:03 AM, Reto Gmür  wrote:
> 
> Hi all,
> 
> I've noticed that a DESCRIBE query returns the Concise Bounded Description 
> (CBD) of a resource, i.e. it expands forward properties to the next non-blank 
> node. Is it possible to configure FUSEKI to also follow properties backwards 
> when answering a DESCRIBE query? Or is there another query that would return 
> that? By using the non-standard possibility to query for blank-nodes I could 
> get the full resource context with multiple queries but I would refer to do 
> this with a single query and not to use any non-standard SPARQL feature.
> 
> Cheers,
> Reto



Re: Streaming CONSTRUCT/INSERTs in TDB

2018-03-03 Thread Andy Seaborne

Hi Adrian,

Do you have an example of such an update?

Some cases can't stream, but it is possible some cases aren't streaming 
when they could.


Or the whole transaction is quite large which is where TDB2 comes in.

On 02/03/18 09:21, Adrian Gschwend wrote:

Hi group,

I had some discussions with Andy before outside the list. I have a
pipeline that creates a bunch of new triples via CONSTRUCT or INSERT
queries. The patterns are in my opinion pretty easy to stream, if I do
not do an insert but just a count, the query executes fine in TDB. As
soon as I do constructs on top of that it runs forever, takes CPU like
crazy and eventually runs out of memory.


It sounds like the heap size isn't big enough.  Sometimes its just a 
matter of increasing by a small amount. Java exhibits this when the heap 
is nearly full.



Now my guess is that it tries to create everything in-memory which is
probably simply too much. From some googling I found:

https://issues.apache.org/jira/browse/JENA-329

https://issues.apache.org/jira/browse/JENA-205

if I get that correctly there is streaming for this kind of queries but
I can't figure out if I can use that in tdb cli interfaces like tdbupdate.

Is there a way to use that from cli? My pipelines are run in batches so
I don't really care about speed, I just care that the queries finish.




regards

Adrian



Re: Inline Values and XSD Time Series

2018-03-03 Thread Andy Seaborne



On 01/03/18 16:57, Marco Neumann wrote:

I'd like to see having jena /tdb as powerful as possibly in the future but
also don't mind to delegate to an external index for now to attain faster
data access. e.g. the jena spatial extension gives me roughly 10x faster
data access for my kind of queries over similar FILTER based range queries.


Useful data point.


and yes there should indeed  be a decent audience for improved time series
data performance in jena as well. there might even be room for
standardization later on.

enjoy the snow,


I was - it started melting :-(

(not that Bristol gets much snow but we had some this year).


Marco


On Thu, Mar 1, 2018 at 5:36 PM, Andy Seaborne  wrote:




On 01/03/18 12:46, Marco Neumann wrote:


a query could look like this



PREFIX spatial:
PREFIX rdfs: 

Select *
WHERE{
?s spatial:dateRange(2011 2012-03).
?s rdfs:label ?slabel.
FILTER(regex(?slabel,"Andy Seaborne","i"))
}




That can be all in one index or ways to make that query faster? Both make
sense.

Find all

?x :atTime ? v . FILTER ( ?v in some datetime range)

which is about making triple patterns faster when there is a FILTER as
well.

If the triple access to the data can start in the right place, stop in the
right place (a range query) then it will be faster than currently access
all values.

That's all doable with the current data on disk (caveat details!)heklps
widely but isn't optimial.  (And leaves the hard question of how to do two
discriminating selection/filters: in parallel and merge? do text and heck
in time? otyher way round?)


A new index that answers all that query, or precalculated results for that
query is separate storage. More complex for the end user but it could be
very powerful.

 Andy





On Thu, Mar 1, 2018 at 1:27 PM, Marco Neumann 
wrote:

https://lucidworks.com/2016/02/13/solrs-daterangefield-perform/


On Thu, Mar 1, 2018 at 1:22 PM, Andy Seaborne  wrote:




On 28/02/18 17:53, Marco Neumann wrote:



thank you, it's less than I hoped for




Concrete example?



but certainly more than what I

can ask for Andy :)

In short I'd like to get the xsd:dateTime scan out of the sparql
filter and perform a more efficient range via a date index similar to
the jena spatial implementation.

I am going to take a look at DateRangeField  and see how it performs
relative to a standard sparql filter range query.

best,
Marco


On Tue, Feb 27, 2018 at 5:21 PM, Andy Seaborne 
wrote:




On 27/02/18 11:41, Marco Neumann wrote:




Hi Andy, (I presume you wrote the following below) could you please
elaborate on the significance of this contribution in TDB?





Hi Marco,

For certain XSD datatypes, the value is stored in the NodeId (64 bits,
minus
the datatype indicator - 56 bits for TDB1, up to 62 bits for TDB2 for
xsd:doubles) itself. It is faster to get the node back out the


database.





If value does not fit in the bits available, the long form is used.
In
the
long form, the NodeId is a pointer into the node table and the node is
stoted as the lexical form+datatype (TDB1: in text; TDB2 in binary /


RDF



Thrift). This applies to strings and URIs.




"The xsd:dateTime and xsd:date ranges cover about 8000 years from
year
zero with a precision down to 1 millisecond. Timezone information is
retained to an accuracy of 15 minutes with special timezones for Z
and
for no explicit timezone."





That's the limit for xsd:dataTime in 56 bits.




https://jena.apache.org/documentation/tdb/architecture.html#inline-


values





does this give us enhanced temporal access methods via TDB that are
exposed as property functions in SPARQL?





What exactly are you looking for here? Range queries or a database you
can
view at a point in time? ("Temporal database" can mean either.)

You get the same SPARQL file capabilities but the inline form is
faster
(measurable and by quite a lot) because it does not go to the node


table.



Despite caching of the node table, it is still faster to get nodes out



of



the DB form the inline form (and I'd like to go faster still).


Point-on-database.

Not possible in TDB1.
Possible (but not exposed) in TDB2.  TDB2 never forgets!

In particular I'd be interested in range queries on xsd:dateTime  here

and the possible  use of DateRangeField (SOLR) along jena-spatial.





Range queries - it would be possible to start in the right place for a
range
scan because the values are in sorted order under this 

Getting Symmetric Concise Bounded Description with Fuseki

2018-03-03 Thread Reto Gmür
Hi all,

I've noticed that a DESCRIBE query returns the Concise Bounded Description 
(CBD) of a resource, i.e. it expands forward properties to the next non-blank 
node. Is it possible to configure FUSEKI to also follow properties backwards 
when answering a DESCRIBE query? Or is there another query that would return 
that? By using the non-standard possibility to query for blank-nodes I could 
get the full resource context with multiple queries but I would refer to do 
this with a single query and not to use any non-standard SPARQL feature.

Cheers,
Reto