Re: Re: Semantics of SERVICE w.r.t. slicing

2022-06-04 Thread Claus Stadler

Hi Andy,


> Are you going to be making improvements to query 
tranformation/optimization as part of your work on the enhanced SERVICE 
handling on the active PR?


To summarize the PR (https://github.com/apache/jena/issues/1314) for 
readers here: Its about a (a) improving the extension system for custom 
service executors and


(b) creating a plugin that allows for bulk retrieval and caching with 
SERVICE.


Actually I am trying to avoid touching transformation/optimization, but 
as part of my work on SERVICE extensions I added a little


'correlate' option. Together with a 'self' flag for referring back to 
the active dataset this allows for doing:



# For each department fetch 5 employees

SELECT * {

  ?d a . Department

  SERVICE  { # self could also be a URI such as 
urn:x-arq:self


    SELECT ?e { ?d hasEmployee ?e } LIMIT 5

} }


Actually the variable ?d in the SERVICE clause has a different scope, 
but if 'correlate' is seen, my plugin just applies 
Rename.reverseVarRename on the OpService.


This could be restricted to only the variables that join with the input 
binding. This means the scope of (some of) the variables in the SERVICE 
clause is lost and a naive substitution with the input bindings becomes 
possible.


For example the following query


SELECT * {

  BIND( AS ?s)

  SERVICE  { # self is implied if no other URL is mentioned

    SELECT ?x ?y { # Important not no project ?s otherwise VarFinder 
will prevent the OpJoin->OpSequence optimization


  { BIND(?s AS ?x) } UNION { BIND(?s AS ?y) } }

} }


Yields:

-
| s | x | y |
=
|  |  |   |
|  |   |  |
-


For completeness, without correlate: one gets:

SELECT * {
  BIND( AS ?s)
  { SELECT ?x ?y { { BIND(?s AS ?x) } UNION { BIND(?s AS ?y) } } }
}
-
| s | x | y |
=
|  |   |   |
|  |   |   |



So far, it was possible to trick Jena into optimizing OpJoin into 
OpSequence as long as there were no joining variables.


The need for the extra projection of ?x ?y (and not ?s) is not super 
nice but it used to be a good tradeoff for not having to touch optimizers


and having this feature escalate into the core of ARQ.


I guess with my recent (bug) report I shot myself somewhat in the foot 
now :D



Because I am not sure if its still possible to write a query 
syntactically in a way such that  OpJoin turns into OpSequence if 
LIMIT/OFFSET appears in the service clause!


Consequently, its actually the optimizer that would have to be aware of 
the 'correlate' flag on service clauses and base its decision on it.


It just turns out that the SPARQL 1.1 service syntax is the easiest way 
to have a syntax for it until hopefully sparql 1.2 standardizes it 
(corresponding issue: https://github.com/w3c/sparql-12/issues/100)



Andy recently also raised the option to extend the ARQ parser with 
custom syntax |SERVICE <http://my.endpoint/sparql> ARGS "cache" { ... }:|


https://github.com/apache/jena/pull/1315#issuecomment-1146350174


Something along these lines would be very powerful when fleshed out, but 
from my side I think for this work its not necessary to add custom 
syntax (yet).


But of course the larger picture is how to e.g. extend service with e.g. 
http options and other custom options.


(I think there was some discussion on the sparql 1.2 issue tracker but I 
can't find it right now).



Cheers,

Claus




On 03.06.22 22:41, Andy Seaborne wrote:

JENA-2332 and PR 1364.

    Andy

https://issues.apache.org/jira/browse/JENA-2332

https://github.com/apache/jena/pull/1364

On 03/06/2022 18:29, Andy Seaborne wrote:

Probably a bug then.

Are you going to be making improvements to query 
tranformation/optimization as part of your work on the enhanced 
SERVICE handling on the active PR?


 Andy

On 03/06/2022 10:39, Claus Stadler wrote:

Hi again,


I think the point was missed; what I was actually after is that in 
the following query a "join" is optimized into a "sequence"


and I wonder whether this is the correct behavior if a LIMIT/OFFSET 
is present.


So running the following query with optimize enabled/disabled gives 
different results:


SELECT * {
   SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
   SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s 
<http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 }

}


➜  bin ./arq --query service-query.rq

   (sequence !

 (service <https://dbpedia.org/sparql>
   (slice _ 5
 (bgp (triple ?s 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://dbpedia.org/ontology/MusicalArtist>

 (service <https://dbpedia.org/sparql>
 

Re: Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Claus Stadler

Hi again,


I think the point was missed; what I was actually after is that in the 
following query a "join" is optimized into a "sequence"


and I wonder whether this is the correct behavior if a LIMIT/OFFSET is 
present.


So running the following query with optimize enabled/disabled gives 
different results:


SELECT * {
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s 
<http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 }

}


➜  bin ./arq --query service-query.rq

  (sequence !

    (service <https://dbpedia.org/sparql>
  (slice _ 5
    (bgp (triple ?s 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://dbpedia.org/ontology/MusicalArtist>

    (service <https://dbpedia.org/sparql>
  (slice _ 1
    (bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label> 
?x)


---
| s   | 
x |

===
| <http://dbpedia.org/resource/Aarti_Mukherjee>   | "Aarti 
Mukherjee"@en  |
| <http://dbpedia.org/resource/Abatte_Barihun>    | "Abatte 
Barihun"@en   |
| <http://dbpedia.org/resource/Abby_Abadi>    | "Abby 
Abadi"@en   |
| <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> | "Abd al 
Malik"@de |
| <http://dbpedia.org/resource/Abdul_Wahid_Khan>  | "Abdul Wahid 
Khan"@en |

---


./arq --explain --optimize=no --query service-query.rq
  (join !
    (service <https://dbpedia.org/sparql>
  (slice _ 5
    (bgp (triple ?s 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://dbpedia.org/ontology/MusicalArtist>

    (service <https://dbpedia.org/sparql>
  (slice _ 1
    (bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label> 
?x)

-
| s | x |
=
-


Cheers,

Claus


On 03.06.22 10:22, Andy Seaborne wrote:



On 02/06/2022 21:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub 
query with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
   SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }

   SERVICE <https://dbpedia.org/sparql> { BIND(?s AS ?x) }
}


The query plan for that is:

(join
  (service <https://dbpedia.org/sparql>
    (slice _ 5
  (bgp (triple ?s 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://dbpedia.org/ontology/MusicalArtist>

  (service <https://dbpedia.org/sparql>
    (extend ((?x ?s))
  (table unit

which has not had any optimization applied.  ARQ checks scopes before 
doing any transfomation.


Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

---
| s   | x |
===
| <http://dbpedia.org/resource/Aarti_Mukherjee>   |   |
| <http://dbpedia.org/resource/Abatte_Barihun>    |   |
| <http://dbpedia.org/resource/Abby_Abadi>    |   |
| <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> |   |
| <http://dbpedia.org/resource/Abdul_Wahid_Khan>  |   |
---

LIMIT 1 is a no-op - the second SERVICE always evals to one row of no 
columns. Which makes the second SERVICE the join identity and the 
result is the first SERVICE.


Column ?x is only in the display because it is in "SELECT *"

Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a 
Musical Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation 
(without result set limits) - just much faster.


SELECT * {
   SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
   SERVICE <https://dbpedia.org/sparql> { ?s 
<http://www.w3.org/2000/01/rdf-schema#label> ?x }

}


The main point:

However, the following query with ARQ interestingly yields one 
binding for every musical artist - which contradicts the bottom-up 
paradigm:


SELECT * {
   SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.

Semantics of SERVICE w.r.t. slicing

2022-06-02 Thread Claus Stadler

Hi,

I noticed some interesting results when using SERVICE with a sub query 
with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }

  SERVICE <https://dbpedia.org/sparql> { BIND(?s AS ?x) }
}


Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a Musical 
Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation (without 
result set limits) - just much faster.


SELECT * {
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
  SERVICE <https://dbpedia.org/sparql> { ?s 
<http://www.w3.org/2000/01/rdf-schema#label> ?x }

}


The main point:

However, the following query with ARQ interestingly yields one binding 
for every musical artist - which contradicts the bottom-up paradigm:


SELECT * {
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a 
<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
  SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s 
<http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 }

}


<http://dbpedia.org/resource/Aarti_Mukherjee> "Aarti Mukherjee"@en
<http://dbpedia.org/resource/Abatte_Barihun> "Abatte Barihun"@en
... 3 more results ...


With bottom-up semantics, the second service clause would only fetch a 
single binding so in the unlikely event that it happens to join with a 
musical artist I'd expect at most one binding


in the overall result set.

Now I wonder whether this is a bug or a feature.

I know that Jena's VarFinder is used to decide whether to perform a 
bottom-up evaluation using OpJoin or a correlated join using OpSequence 
which results in the different outcomes.


The SPARQL spec doesn't say much about the semantics of Service 
(https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval)


So I wonder which behavior is expected when using SERVICE with SLICE'd 
queries.



Cheers,

Claus


--
Dipl. Inf. Claus Stadler
Institute of Applied Informatics (InfAI) / University of Leipzig
Workpage & WebID: http://aksw.org/ClausStadler



Streaming JSON RowSets (JENA-2302)

2022-03-09 Thread Claus Stadler

Dear all,


I want to inform you of an active PR for making RowSets over 
application/sparql-reults+json streaming



JIRA: https://issues.apache.org/jira/projects/JENA/issues/JENA-2302

PR: https://github.com/apache/jena/pull/1218


As nowadays JSON is the default content type used in Jena for sparql 
results, this PR is aimed at easing working with large sparql result 
sets by having streaming working out-of-the-box.


The implementation used by jena so far loaded json sparql result sets 
into memory first.



The JSON format itself allows for repeated keys (where the last one 
takes precedence) and keys may appear in any order - these things 
introduce a certain variety in how sparql result sets can be represented 
and that needs to be handled correctly by the implementation.



While the new implementation already succeeds on all existing jena 
tests, there is still the risk of breaking existing implementations that 
rely on certain behavior of the non-streaming approach.



Therefore, if you think this change might (negatively) affect you then 
please provide feedback on the proposed PR.



Best regards,

Claus Stadler

--
Dipl. Inf. Claus Stadler
Institute of Applied Informatics (InfAI) / University of Leipzig
Workpage & WebID: http://aksw.org/ClausStadler



Re: Can RDF star support be deactivated?

2020-09-17 Thread Claus Stadler

Hi Andy,


In the meantime I have upgraded my code and it turned out that it was 
essentially the newly introduced visitor method for ElementFind that was 
missing in some of my transforrmers

- but that was easy to 'solve' for now by just raising an 
UnsupportedOperationException.


Cheers,

Claus


On 2020/09/01 17:42:41, Andy Seaborne  wrote:
> Ping?>
>
> I'm not aware of any compile errors for APIs but and return signatures >
> can make it complicated.>
>
> NodeVisitor doesn't include it (probably an omission - but a default >
> method would solve that?>
>
> Andy>
>
> On 28/08/2020 13:33, Andy Seaborne wrote:>
> > >
> > >
> > On 28/08/2020 02:12, cstad...@informatik.uni-leipzig.de wrote:>
> >>>
> >> +1 This is a very good point; I also have around 10 years of active >
> >> code based on Jena and I was not yet able to upgrade to 3.16 because I >
> >> did not find the time to resolve several compile errors which are at >
> >> least partly due to changes introduced for RDF*. And even after the >
> >> upgrade I would most likely run into the similar issues as Holger >
> >> points out.>
> > >
> > Hmm - where are you getting compile errors?>
> > >
> >>>
> >> I have have used the following to work around legacy issues with >
> >> RDF1.0/1.1:>
> >> JenaRuntime.isRDF11 = false;>
> >>>
> >> This might be a good place to allow for a>
> >> JenaRuntime.isRDFStar = false;>
> >>>
> >> Cheers,>
> >> Claus>
> >>>
> >> Quoting Holger Knublauch :>
> >>>
> >>> It's good to see the recently introduced RDF* features in Jena. But >
> >>> as someone with a lot of existing Jena code, this low-level change >
> >>> poses a number of challenges. For example we have many of places with >
> >>> variations of>
> >>>>
> >>> if(rdfNode.isResource()) { if(rdfNode.isURIResource()) { } else { // >
> >>> Here we now assume it's a blank node, yet this is no longer true // >
> >>> and they node may also be a triple node } } else { // Must be a >
> >>> literal - this hasn't changed }>
> >>>>
> >>> which now need to be changed to handle rdfNode.isStmtResource() too. >
> >>> And it should of course do so in a meaningful way.>
> >>>>
> >>> I guess properly adjusting our code base will take many months, and >
> >>> it will require a lot of testing and iterating.>
> >>>>
> >>> In the meantime, is there a flag that we can set to deactivate RDF* >
> >>> support in the parsers and SPARQL*? The page >
> >>> https://jena.apache.org/documentation/rdfstar/ only states "it is >
> >>> active by default in Fuseki" but doesn't show an API to do the same >
> >>> programmatically.>
> >>>>
> >>> Could you also give some background on the implications on TDB? I >
> >>> guess if such new nodes end up in a database, then this database can >
> >>> no longer work with older Jena versions?>
> >>>>
> >>> Thanks>
> >>> Holger>
> >>>
> >>>
> >>>
>

--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Identify SPARQL query's type

2020-03-19 Thread Claus Stadler

Hi Marco,

I will prepare a presentation of the most important features next week; I can't 
say right now which day is best, but maybe we can arrange that on short notice 
on the weekend or on Monday via direct mail. As for contributions to Jena 
directly, I am already in contact with Andy via some recent JIRA issues and PRs 
:)


I also intend to start the discussion on contributing some relevant parts of 
our extension project to jena directly. The reason why this did not happen so 
far is mainly because it takes significantly more efforts to polish code up for 
a such a big community project and ensuring a good level of stability - but 
some parts are stable and probably of more general interest :)


Cheers,

Claus


On 19.03.20 10:37, Marco Neumann wrote:

thank you Claus, there is obviously much more in the Jena-extensions
(SmartDataAnalytics / jena-sparql-api).

if you want to contribute your work to the Jena project you will have to
follow up with Andy directly. But I am not sure this is necessary at the
moment since you already provide the code in the public domain conveniently
as an extension / add-on to the Jena project, which I think is great as is
for now. Over time we might want to learn from your work and add aspects to
the overall core Jena project I would think.

It would be great if we could schedule a zoom session in order to give us
an overview of the "SmartDataAnalytics / jena-sparql-api" extensions

could you prepare such a presentation in the coming days?

best,
Marco



On Wed, Mar 18, 2020 at 3:34 PM Claus Stadler <
cstad...@informatik.uni-leipzig.de> wrote:


Hi,


The SparqlStmt API built against jena 3.14.0 is now available on Maven
Central [1]  in case one want to give it a try (example in [2]) and give
feedback and whether one thinks it would be a useful contribution to Jena
directly - and what changes would be necessary if so.



org.aksw.jena-sparql-api
jena-sparql-api-stmt
3.14.0-1




[1]
https://search.maven.org/artifact/org.aksw.jena-sparql-api/jena-sparql-api-stmt/3.14.0-1/jar

[2]
https://github.com/SmartDataAnalytics/jena-sparql-api/blob/def0d3bdf0f4396fbf1ef0715f9697e9bb255029/jena-sparql-api-stmt/src/test/java/org/aksw/jena_sparql_api/stmt/TestSparqlStmtUtils.java#L54


Cheers,

Claus



On 18.03.20 16:04, Andy Seaborne wrote:

Note that parsing the string as a query aborts early as soon as it finds

an update keyword so the cost of parsing isn't very large.

 Andy

On 18/03/2020 11:58, Marco Neumann wrote:

is there some utility function here in the code base now already to do
this, or do I still need to roll my own here?

On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:


On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:

Hi,

I would like to know if Jena offers a way to detect the type of an

unknow SPARQL request ?Starting from the query string.

At the moment the only way I succed to code it without "basic parsing"

of the query ( sort of thing I prefer avoid, manually parsing string

with

short function often create errors )

looks like this :

[...]
  String queryString = "a query string, may be a select or an

update";

   try{
   Query select = QueryFactory.create(queryString);
   Service.process_select_query(select);//do some work with

the select

   }
   catch(QueryException e){
   UpdateRequest update =

UpdateFactory.create(queryString);

   Service.process_update_query(update);//do some work with

the update

   }
   catch(ProcessException e){
   //handle this exception
   }

[...]

So is it possible ? Or not ?

Not currently.

You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK

keyword

coming after BASE/PREFIXES/Comments.

      Andy



--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Identify SPARQL query's type

2020-03-18 Thread Claus Stadler



My current solution builds on the existing jena infrastructure - but it does 
not modify it.

My uses cases for uniform sparql stmt parsing included parsing query strings 
against a given prefix mapping and also cleaning queries - so removing unused 
prefixes.

I have some test cases in [1] of what it looks like:

PrefixMapping pm = RDFDataMgr.loadModel("rdf-prefixes/wikidata.jsonld");
String queryStr = "INSERT {" +
        "  ?s wdt:P279 wd:Q7725634 .\n" +
        "}\n" +
        "  WHERE {\n" +
        "  ?s rdfs:label ?desc \n" +
        "  FILTER (LANG(?desc) = \"en\").\n" +
        "  }\n";


// A SparqlStmtParser currently inherits from Function

SparqlStmtParser parser =  SparqlStmtParserImpl.create(pm);
SparqlStmt stmt = parser.apply(queryStr); // Probably 'parse' would a nicer 
name than the generic apply
SparqlStmtUtils.optimizePrefixes(stmt);
UpdateRequest updateRequest = stmt.getUpdateRequest();


I know its not very well documented, but I hope you can get the gist of the 
idea :)


Cheers,

Claus

[1] 
https://github.com/SmartDataAnalytics/jena-sparql-api/blob/def0d3bdf0f4396fbf1ef0715f9697e9bb255029/jena-sparql-api-stmt/src/test/java/org/aksw/jena_sparql_api/stmt/TestSparqlStmtUtils.java#L54




On 18.03.20 14:25, Marco Neumann wrote:

thank you Claus and Martynas, both very good ideas here. it's a function we
should move into Jena.

let's look at this in a bit more detail now, I currently envision this to
be a factory method of org.apache.jena.query.Query returning boolean like

.isSelect()
.isAsk()
.isDescribe()
.isUpdate()

Claus your solution would extend the following?

org.apache.jena.sparql.lang.ParserSPARQL11.perform(ParserSPARQL11.java:100)

how is fuseki implementing this during query parsing at the moment?




On Wed, Mar 18, 2020 at 1:00 PM Martynas Jusevičius 
wrote:


I always wondered why there is no class hierarchy for SPARQL commands,
similarly to SP vocabulary [1]. Something like

Command
   Query
 Describe
 Construct
 Select
 Ask
   Update
 ...

So that one could check command type doing instanceof Update or
instance of Select instead of query.isSelectType() etc.

[1] https://github.com/spinrdf/spinrdf/blob/master/etc/sp.ttl



On Wed, Mar 18, 2020 at 12:58 PM Marco Neumann 
wrote:

is there some utility function here in the code base now already to do
this, or do I still need to roll my own here?

On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:


On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:

Hi,

I would like to know if Jena offers a way to detect the type of an

unknow SPARQL request ?Starting from the query string.

At the moment the only way I succed to code it without "basic

parsing"

of the query ( sort of thing I prefer avoid, manually parsing string

with

short function often create errors )

looks like this :

[...]
 String queryString = "a query string, may be a select or an

update";

  try{
  Query select = QueryFactory.create(queryString);
  Service.process_select_query(select);//do some work with

the select

  }
  catch(QueryException e){
  UpdateRequest update =

UpdateFactory.create(queryString);

  Service.process_update_query(update);//do some work with

the update

  }
  catch(ProcessException e){
  //handle this exception
  }

[...]

So is it possible ? Or not ?

Not currently.

You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK

keyword

coming after BASE/PREFIXES/Comments.

 Andy



--


---
Marco Neumann
KONA



--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Identify SPARQL query's type

2020-03-18 Thread Claus Stadler

Hi,


I created an abstraction for this awhile ago - essentially there is an 
interface 'SparqlStmt' with implementations that wrap an UpdateRequest or a 
Query. Correspondingly, there is a SparqlStmtParser interface whose 
implementation is makes use of a SparqlQueryParser and SparqlUpdateParser.

If the statement cannot be parsed, it attempts to to get the location of the 
parse error (line / col) and decide on the statement type on this basis.


Its on maven central - but I see its not there for jena 3.14.0 but I could do 
it today.

https://search.maven.org/artifact/org.aksw.jena-sparql-api/jena-sparql-api-stmt/3.13.1-1/jar


I would also gladly contribute it to jena directly :)


Cheers,

Claus



https://github.com/SmartDataAnalytics/jena-sparql-api/blob/fa5fe33b0e7ac80586cdb3522aa5e0d75718db26/jena-sparql-api-stmt/src/main/java/org/aksw/jena_sparql_api/stmt/SparqlStmtParserImpl.java#L57




On 18.03.20 12:58, Marco Neumann wrote:

is there some utility function here in the code base now already to do
this, or do I still need to roll my own here?

On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:


On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:

Hi,

I would like to know if Jena offers a way to detect the type of an

unknow SPARQL request ?Starting from the query string.

At the moment the only way I succed to code it without "basic parsing"

of the query ( sort of thing I prefer avoid, manually parsing string with
short function often create errors )

looks like this :

[...]
 String queryString = "a query string, may be a select or an

update";

  try{
  Query select = QueryFactory.create(queryString);
  Service.process_select_query(select);//do some work with

the select

  }
  catch(QueryException e){
  UpdateRequest update = UpdateFactory.create(queryString);
  Service.process_update_query(update);//do some work with

the update

  }
  catch(ProcessException e){
  //handle this exception
  }

[...]

So is it possible ? Or not ?

Not currently.

You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK keyword
coming after BASE/PREFIXES/Comments.

 Andy



--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: CSV to rdf

2019-02-26 Thread Claus Stadler

Hi all,


We have created SparqlIntegrate, examples at: 
https://github.com/SmartDataAnalytics/SparqlIntegrate/tree/develop/doc


It uses Jena's plugin system to register SPARQL functions and property 
functions(*) for CSV, XML and JSON processing.

So its another SPARQL-based approach.


(*) https://jena.apache.org/documentation/query/library-propfunc.html


Cheers,

Claus


On 14.02.19 15:19, Martynas Jusevičius wrote:

https://github.com/AtomGraph/CSV2RDF

Not XSLT but SPARQL-based, yet the transformation concept is similar.

I'm the author :)

On Thu, Feb 14, 2019 at 2:59 PM elio hbeich  wrote:

Dear all

Do you have any suggestion about tools or XSLT that can transform CSV to
RDF

Thank you in advance,
Elio HBEICH


jena-arq: FROM / FROM NAMED clauses of SPARQL queries over in-memory Dataset are ignored

2015-12-09 Thread Claus Stadler

Hi,

It appears that the FROM / FROM NAMED clauses of SPARQL queries are ignored 
when executed over a Dataset.
In the example below, I would expect the first result set to yield the content 
of the file, whereas I expect the second one to be empty as the specified named 
graph does not exist, yet, I get the exact opposite.

Are there some magic switches or algebra transformations that can be applied 
programmatically for changing the behavior?

Similar queries on Virtuoso on http://dbpedia.org/sparql work to my expectation.

Tested with jena-arq 2.13.0 and 3.0.0
A related issue might be this one, but it appears a fix was only provided for 
Fuseki: https://issues.apache.org/jira/browse/JENA-1004

Cheers,
Claus

Code:
public class TestDatasetQuery {
@Test
public void test() throws IOException {
Dataset ds = DatasetFactory.createMem();
RDFDataMgr.read(ds, new 
ClassPathResource("test-person.nq").getInputStream(), Lang.NQUADS);

String graphName = ds.listNames().next();
Node s = 
ds.getNamedModel(graphName).listSubjects().toSet().iterator().next().asNode();
System.out.println("Got subject: " + s + " in graph " + graphName);

{
// Should yield some solutions - but actually doesn't
QueryExecution qe = QueryExecutionFactory.create("SELECT * FROM <" + graphName + 
"> { ?s ?p ?o }", ds);
ResultSet rs = qe.execSelect();
System.out.println(ResultSetFormatter.asText(rs));
}

{
// Should not return anything, as the named graph does not exist, 
yet, the original data is returned
QueryExecution qe = QueryExecutionFactory.create("SELECT * FROM NAMED 
<http://foobar> { Graph ?g { ?s ?p ?o } }", ds);
ResultSet rs = qe.execSelect();
System.out.println(ResultSetFormatter.asText(rs));
}
}
}

File: test-person.nq
<http://ex.org/JohnDoe> <http://ex.org/label> "John 
Doe"^^<http://www.w3.org/2001/XMLSchema#string> <http://ex.org/graph/> .
<http://ex.org/JohnDoe> <http://ex.org/age> 
"20"^^<http://www.w3.org/2001/XMLSchema#int> <http://ex.org/graph/> .

Output:

Got subject: http://ex.org/JohnDoe in graph http://ex.org/graph/
-
| s | p | o |
=
-

---
| s   | p | o   
 | g  |
===
| <http://ex.org/JohnDoe> | <http://ex.org/age>   | 
"20"^^<http://www.w3.org/2001/XMLSchema#int> | <http://ex.org/graph/> |
| <http://ex.org/JohnDoe> | <http://ex.org/label> | "John Doe"        
   | <http://ex.org/graph/> |
---


--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Sparql To SQL

2014-03-30 Thread Claus Stadler

There is also Sparqlify[1], which we use for exposing OpenStreetMap as LinkedData in the 
LinkedGeoData project[2] (and dumpig the RDF for downloads)[3].
Unfortunately, R2RML support not yet totally stable; we wrote our mappings[4] in what we 
refer to as Sparqlification Mapping Language (SML) [5] (release should be 
ready soon).
In a nutshell, the relation between R2RML to SML is a bit akin to (the SPARQL 
elements of) SPIN to SPARQL, for a pretty comprehensive comparision of examples 
(the R2RML test suite) see [6].

Cheers,
Claus

[1] https://github.com/AKSW/Sparqlify (Instructions for installing Sparqlify 
from the debian package are included there)
[2] LinkedGeoData: http://linkedgeodata.org
[3] http://downloads.linkedgeodata.org/releases
[4] 
https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml
[5] http://sml.aksw.org
[6] http://sml.aksw.org/comparison/sml-r2rml-loc-table.html

On 31.03.2014 01:29, Martynas Jusevičius wrote:

There are also commercial options, e.g. this:
http://www.revelytix.com/content/spyder

On Sun, Mar 30, 2014 at 10:15 PM, Kamalraj Jairam
kjai...@semanticsoftware.com.au wrote:

Hello Paul,

These are the informtion i have

1) R2RML mapping for existing DB

2) I have some fedreated queries which hit the triple store containg the data 
which has been loaded from the DB

The Loading  data from DB (100 tables, millions of rows) to TS takes a long 
time. I need a framework to convert Sparql to SQL which will hit the table and 
return RDF result set.

D2RQ is in version .81 and hasn't changed much. Not sure whether D2RQ is the 
right approach.

Thanks
Kamalraj




--
Kjairam
Sent with Airmail


On 31 March 2014 at 6:59:20 am, Paul Tyson 
(phty...@sbcglobal.netmailto:phty...@sbcglobal.net) wrote:

On Sun, 2014-03-30 at 09:37 +, Kamalraj Jairam wrote:

Hello All,

Whats the best way to convert sparql to SQL using R2RML mappings

and convert resultset from DB to RDF?


What are the givens? Do you have existing SPARQL text written against
some RDF produced by some existing R2RML mappings? Or do you have some
SPARQL and some SQL and you want to fill in the R2RML and produce some
RDF?

In any case, it is an interesting scenario that will arise more often as
companies expand their use of RDF while relying mostly on SQL.

There is a higher abstraction that should be explored and possibly
exploited to provide a general pattern for working through these
situations. Since SQL, SPARQL, and R2RML are rule languages compatible
with relational algebra (RA), it should be possible to derive a common
set of RA predicates and classes to create an RDF vocabulary. This
vocabulary can then be used to write queries as production rules in a
generic standard rule language, such as RIF (Rule Interchange Format).
The RIF source can be translated to SQL, SPARQL, or R2RML for execution
in the target system.

Going from RIF to SQL, SPARQL, or R2RML is always going to be easier
than starting from SQL or SPARQL and going to some other format.
Theoretically you should be able to partially translate R2RML to RIF
automatically (SQL embedded in the R2RML will still be opaque). But I
don't know what tools could be used to translate SQL and SPARQL texts to
generic production rules using an RA vocabulary.

The ultimate payoff from this approach is that it will be possible to
link all of your data relations and operations with meaningful business
terminology and processes. It will enable greater visibility and control
of all data operations, and put important elements of business logic in
transparent rules (e.g. RIF) instead of arcane notations such as SQL
SPARQL (or worse, procedural code).

Regards,
--Paul



--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: How can I implement pagination for ResultSet

2013-10-23 Thread Claus Stadler

Hi,

For the pagination issue combined with client side caching (and adding 
delay for not overloading the endpoint), I have written this utility lib 
some time ago:


https://github.com/AKSW/jena-sparql-api

Its currently for the prior Jena 2.10.0, but will be upgraded soon.

As for

 2) Get the size of ResultSet. but there are no way to get the total 
number that ResultSet contains.


A cheap way of getting the count is doing a query for it with Select 
(Count(*) As ?c) { { your original query here} }. Although I prefer 
modifying the projection in Jena's Query object as I dislike string hacks.



Best,
Claus


On 22.10.2013 19:21, Wang Dongsheng wrote:

By the way, I want to ask you two questions accordingly,

For 1), Can I get the total number first? cause it's better to
calculate how much pages there are in advance.
For 2), If the resultSet is very Large, will the transferring to List
being broken up easily?

On Tue, Oct 22, 2013 at 6:31 PM, Wang Dongsheng dswang2...@gmail.com wrote:

Hi Samuel,
  Thanks a lot, It's very cool~ :)

Sincere~
Wang



On Tue, Oct 22, 2013 at 4:22 PM, Samuel Croset samuel.cro...@gmail.com wrote:

Hi,

For 1) you can use the OFFSET and LIMIT constructs
For 2) You can use:  ResultSetFormatter.toList()

See this answer
http://answers.semanticweb.com/questions/9456/jena-pagination-for-sparql for
more details.

Cheers,

Samuel


On Tue, Oct 22, 2013 at 8:24 AM, Wang Dongsheng dswang2...@gmail.comwrote:


Hi, all
I want to implement pagination for ResultSet of Sparql.
I guess there are generally two ways:

1, Through a specified SPARQL format, but I dont know how to write the
query,
2. Get the size of ResultSet. but there are no way to get the total
number that ResultSet contains.

I prefer to implement the second way, But Doc of API seems unavailable.

Any one tried this kind of effort? Or it was not suitable to paginate
the result?

Thank in advance~




--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Re: Future of Jena SDB

2013-06-07 Thread Claus Stadler

Hi,

Just a quick note, as I am the developer of Sparqlify[1], which pretty 
much is a general SQL-to-RDF mapping layer (well, or at least SPARQL-SQL 
rewriter) based on Jena (it also has some dependencies to SDB).
From my experience, although I had and still have some dependencies to 
SDB, but I had to e.g. duplicate the SqlExpr hierachy (e.g. [2]) because 
I needed each SqlExpr to provide its datatype).


The old master branch of Sparqlify already had initial support for 
rewriting spatial predicates to SQL (only ST_Intersects and ST_DWithin), 
but we are currently enhancing this system to allow one to essentially 
expose any (or at least most) SQL functions as a SPARQL one.


[1] https://github.com/AKSW/Sparqlify
[2] 
https://github.com/AKSW/Sparqlify/tree/master/sparqlify-core/src/main/java/org/aksw/sparqlify/algebra/sql/exprs2


Cheers,
Claus


On 06/07/2013 11:54 AM, Olivier Rossel wrote:

Could SDB be useful when dealing with GeoSPARQL and your backend is
(something like) Postgresql+Postgis?
(just a question, this is not one of my needs at the moment).


On Fri, Jun 7, 2013 at 11:29 AM, Andy Seaborne a...@apache.org wrote:


SDB is a Jena storage module that uses SQL databases for RDF storage. See
[1] for documentation. It uses a custom database schema to store RDF; it is
not a general SQL-to-RDF mapping layer.

The supported databases are: Oracle, Microsoft SQL Server, DB2,
PostgreSQL, MySQL, Apache Derby, H2, HSQLDB.  Only Derby and HSQLDB are
tested in the development build process.

Both Oracle and IBM corporations provide commercial RDF solutions using
Jena that are completely unrelated to SDB.

TDB is faster, more scalable and better supported than SDB but there can
be reasons why an SQL-backed solution is appropriate.

There is no active development or maintenance of SDB from within the
committer team; no committers use SDB and it imposes a cost to the team to
generate separate releases.  We're not receiving patches contributed to
JIRA items for bugs.

We are proposing:

1/ moving it into the main build so it will be part of the main
distribution with limited testing.

2/ marking it as under review / maintenance only.

It will not be treated as something that can block a release, nor for any
significant length of time, stop development builds.

It may be pulled from the main build, and from a release, at very short
notice.

If moved out, the source code will still be available but no binaries
(releases or development builds) will be produced.

What would change SDB's status is care and attention. There are ways to
enhance it, for example, pushing the work of filters into the SQL database,
where possible, to improve query performance.

 Andy

[1] 
http://jena.apache.org/**documentation/sdb/index.htmlhttp://jena.apache.org/documentation/sdb/index.html




--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260



Request for a QueryExecutionFactory interface

2013-06-07 Thread Claus Stadler

Hi,

Would it be possible to add a QueryExecutionFactory (QEF) *interface* to 
Jena?
The com.hp.hpl.jena.query.QueryExecutionFacotry has lots of static 
factory methods, but I guess it would be very useful if Jena itself 
provided such an interface (either different package, different name or 
both) because
then implementations based on Jena could rely on such interface (see 
below and [1]) in a (quasi) standard way, and other projects could 
provide fancy implementations.


public interface QueryExecutionFactory
extends QueryExecutionFactoryString, QueryExecutionFactoryQuery
{ /** * Some Id identifying the SPARQL service, such as a name given to 
a jena Model or the URL of a remote service */
String getId(); /** * Some string identifying the state of this 
execution factory, such as the selected graphs, or for query federation 
the configured endpoints and their respective graphs. * Used for caching */

String getState();
}


The reason I ask this, is because I created [2], which uses this 
architecture to transparently add delay, caching and pagination to a QEF 
- i.e. you could just pose a usual SPARQL query to DBpedia, and [2] will 
take care of retrieving the *complete* result, thereby caching each page 
so that one can resume pagination from cache should something go wrong.


But for example, someone might provide a parallel pagination component, 
or some query federation system, such as FedX could be wrapped with such 
interface as well, and application developers would not have to rely on 
a specific implementation.


Cheers,
Claus

[1] 
https://github.com/AKSW/jena-sparql-api/blob/master/jena-sparql-api-core/src/main/java/org/aksw/jena_sparql_api/core/QueryExecutionFactory.java

[2] https://github.com/AKSW/jena-sparql-api

--
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage  WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260