Re: [DRAFT] Jena board report for July 2015

2015-07-03 Thread Rob Vesse
+1 LGTM

On 03/07/2015 09:40, Andy Seaborne a...@apache.org wrote:

Report from the Apache Jena project [Andy Seaborne]

## Description:
A framework for developing Semantic Web and Linked Data applications in
Java.

## Activity:

The project is  working towards a major release.

The most significant user-visible is converting to org.apache.jena
package names everywhere, replacing the historical names for older code
areas.

The project has also seen new contributors, both helping clean the code
up, adding new functionality and generally discussing the code.

## Issues:

There are no issues requiring board attention at this time

## PMC/Committership changes:

  - Currently 13 committers and 11 PMC members in the project.
  - Osma Suominen was added to the PMC on Thu Jun 25 2015
  - Osma Suominen was added as a committer on Thu Jun 25 2015

## Releases:

  - Last release was 2.13.0 on Fri Mar 13 2015

## Mailing list activity:

  - us...@jena.apache.org:
 - 586 subscribers (up 16 in the last 3 months):
 - 485 emails sent to list (663 in previous quarter)

  - dev@jena.apache.org:
 - 149 subscribers (down -3 in the last 3 months):
 - 1480 emails sent to list (1148 in previous quarter)


## JIRA activity:

  - 67 JIRA tickets created in the last 3 months
  - 62 JIRA tickets closed/resolved in the last 3 months






[GitHub] jena pull request: Removing out-of-date comment and empty @Overrid...

2015-07-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/83


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Query parameterization.

2015-07-03 Thread Andy Seaborne

On 01/07/15 05:27, Holger Knublauch wrote:

Hi Andy,

this looks great, and is just in time for the ongoing discussions in the
SHACL group. I apologize in advance for not having the bandwidth yet to
try this out from your branch, but this topic will definitely bubble up
in the priorities soon...

I have not fully understood how the semantics of this are different from
the setInitialBinding feature that we currently use in SPIN, and which
seems to do a pretty good job. However, having a facility to do further
pre-processing in advance may improve performance and provide a more
formal definition of what setInitialBinding is doing. I am personally
not enthusiastic about approaches based on text-substitution, so working
on the parsed syntax tree looks good to me. There are some (rare) cases
where text-substitution would be more powerful, e.g. dynamic path
properties


If you can insert compound syntax, then injection attacks need to be 
considered.



and some solution modifiers, but as you say no approach is
perfect.


Better done on the algebra?  Especially around SELECT clause as it is 
several modifiers in tangle.


(See recent OpAsQuery discussion and changes)



Questions:

- would this also pre-bind variables inside of nested SELECTs?


Yes (it's a choice - it could not do it with some analysis of the inner 
projection as it passes through).



- I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?


Probably! (it's tricky and needs more testing)
...
Yes - the replacement with bnodes-are-variables in SPARQL is done during 
parsing and this is post parse (different to all string based approaches).


If the substituted query to turned into a string, it will beome a bnode 
in SPARQL which then reparses is a variable.  The printing code 
(specifically NodeToLabelMapBNode.asString) handles it and would need a 
tweak.


The _:label form would be better but needs implementing.


- What about bound(?var) and ?var is pre-bound?


?var in bound(?var) is replaced (as ?var in all expressions).  This is 
syntax.


Andy



Thanks
Holger


On 6/28/15 8:08 PM, Andy Seaborne wrote:

(info / discussion / ...)

In working on JENA-963 (OpAsQuery; reworked handling of SPARQL
modifiers for GROUP BY), it was easier/better to add the code I had
for rewriting syntax by transformation, much like the algebra is
rewritten by the optimizer.  The use case is rewriting the output of
OpAsQuery to remove unnecessary nesting of levels of {} which arise
during translation for the safety of the translation.

Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
framework for rewriting syntax, like we have for the SPARQL+ algebra.

It is also capable of being a parameterized query system (PQ).  We
already ParameterizedSparqlString (PSS) so how do they compare?

Work-in-progress:

https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java


PQ is a rewrite of a Query object (the template) with a map of
variables to constants. That is, it works on the syntax tree after
parsing and produces a syntax tree.

PSS is a builder with substitution. It builds a string, carefully
(injection attacks) and is neutral as to what it is working with -
query or update or something weird.
http://jena.apache.org/documentation/query/parameterized-sparql-strings.html


Summary:

PQ is only for replacement of a variable in a template.
PSS is a builder that can do that as part of building.

PQ covers cases PSS doesn't - neither is perfect.

PSS works with INSERT DATA.
PQ would use the INSERT { ... } WHERE {} form.

Details:

PSS:
  Can build query, update strings and fragments
  Supports JDBC style positional parameters (a '?')
These must be bound to get a valid query.
Can generate illegal syntax.
  Tests the type of the injected value (string, iri, double etc).
  Has corner cases
 Looks for ?x as a string so ...
   This is not a ?x as a variable
   http://example/foo?x=123
   SELECT ?x
   ns:local\?x (a legal local part)
  Protects against injection by checking.
  Works on INSERT DATA.

PQ:
  Replaces SPARQL variables where identified as variables.
(no extra-syntax positional '?')
  Legal query to legal syntax query.
The query may violate scope rules (example below).
Not a query builder.
  Post parser, so no reparsing to use the query
(for large updates and queries)
  Injection is meaningless - can only inject values, not syntax.
  Can rewrite structurally: SELECT ?x = SELECT  (:value AS ?x)
which is useful to record the injection variables.
  Works with INSERT {?s ?p ?o } WHERE { }

PQ example:

  Query template = QueryFactory.create(.. valid query ..) ;
  MapString, RDFNode map = new HashMap() ;
  map.put(y, ResourceFactory.createPlainLiteral(Bristol) ;
  Query query = ParameterizedQuery.setVariables(template, map) ;


A perfect system probably needs a template language which SPARQL
extended with a new template variable which is 

[jira] [Commented] (JENA-978) jena-text: store original literals

2015-07-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613316#comment-14613316
 ] 

ASF subversion and git services commented on JENA-978:
--

Commit b7eac624cfe5c95b4a7f6ecddbdfc27bd361da0a in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=b7eac62 ]

JENA-978: jena-text stored literals: initial functionality and tests for Lucene


   jena-text: store original literals
 

 Key: JENA-978
 URL: https://issues.apache.org/jira/browse/JENA-978
 Project: Apache Jena
  Issue Type: Improvement
  Components: Text
Affects Versions: Jena 2.13.0
Reporter: Osma Suominen

 As discussed on the dev list, this PR implements a feature where it's 
 possible to store the original literal values in the jena-text Lucene index 
 and to access them when querying the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-966) LazyIterator

2015-07-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613322#comment-14613322
 ] 

ASF subversion and git services commented on JENA-966:
--

Commit 66ceff0cb1b2aaddf4477993dda60a1e32026ebf in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=66ceff0 ]

JENA-966: Deprecate EarlyBindingIterator and UniqueExtendedIterator


 LazyIterator
 

 Key: JENA-966
 URL: https://issues.apache.org/jira/browse/JENA-966
 Project: Apache Jena
  Issue Type: Bug
  Components: Core
Affects Versions: Jena 3.0.0
Reporter: Claude Warren
Assignee: Claude Warren
 Fix For: Jena 3.0.0


 LazyIterator is an abstract class.  The documentation indicates that the 
 create() method needs to be overridden to create an instance.  From this I 
 would expect that 
 now LazyIterator(){
 @Override
 public ExtendedIteratorModel create() {
   ...
 }};
 Would work however LazyIterator does not override:
 remoteNext(), andThen(), toList(), and toSet().
 I believe these should be implemented in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Random error named broken line for same data, same query and same code.

2015-07-03 Thread Ankur Padia
Hello everyone,

I am trying to execute a SPARQL query by making a call to fuseki server
running at default port 3030. The query is as follows.

PREFIX  univ: http://example.com#
PREFIX  rdf:  http://www.w3.org/1999/02/22-rdf-syntax-ns#

SELECT  ?subject ?predicate ?object
FROM http://example.data/University3A
WHERE
  { {   { ?subject univ:doctoralDegreeFrom ?object
  {   { ?subject rdf:type univ:FullProfessor}
UNION
  { ?subject rdf:type univ:AssistantProfessor}
UNION
  { ?subject rdf:type univ:AssociateProfessor}
UNION
  { ?subject rdf:type univ:Lecturer}
  }
}
  UNION
{ ?subject univ:mastersDegreeFrom ?object
  {   { ?subject rdf:type univ:FullProfessor}
UNION
  { ?subject rdf:type univ:AssistantProfessor}
UNION
  { ?subject rdf:type univ:AssociateProfessor}
UNION
  { ?subject rdf:type univ:Lecturer}
  }
}
  UNION
{ ?subject univ:undergraduateDegreeFrom ?object
  {   { ?subject rdf:type univ:FullProfessor}
UNION
  { ?subject rdf:type univ:AssistantProfessor}
UNION
  { ?subject rdf:type univ:AssociateProfessor}
UNION
  { ?subject rdf:type univ:Lecturer}
  }
}
  UNION
{ ?subject univ:emailAddress ?object
  {   { ?subject rdf:type univ:FullProfessor}
UNION
  { ?subject rdf:type univ:AssistantProfessor}
UNION
  { ?subject rdf:type univ:AssociateProfessor}
UNION
  { ?subject rdf:type univ:Lecturer}
  }
}
  UNION
{ ?subject univ:name ?object}
  UNION
{ ?subject univ:researchInterest ?object}
  UNION
{ ?subject univ:teacherOf ?object}
  UNION
{ ?subject univ:worksFor ?object
  {   { ?subject rdf:type univ:FullProfessor}
UNION
  { ?subject rdf:type univ:AssistantProfessor}
UNION
  { ?subject rdf:type univ:AssociateProfessor}
UNION
  { ?subject rdf:type univ:Lecturer}
  }
}
  UNION
{ ?subject univ:name ?object
  { { ?subject rdf:type univ:Course}}
}
  UNION
{ ?subject univ:name ?object
  { { ?subject rdf:type univ:Department}}
}
  UNION
{ ?subject univ:suborganizationof ?object
  { { ?subject rdf:type univ:Department}}
}
  UNION
{ ?subject univ:name ?object
  { { ?subject rdf:type univ:GraduateCourse}}
}
  UNION
{ ?subject univ:advisor ?object
  {   { ?subject rdf:type univ:GraduateStudent}
UNION
  { ?subject rdf:type univ:ResearchAssistant}
UNION
  { ?subject rdf:type univ:TeachingAssistant}
  }
}
  UNION
{ ?subject univ:emailAddress ?object
  {   { ?subject rdf:type univ:GraduateStudent}
UNION
  { ?subject rdf:type univ:ResearchAssistant}
UNION
  { ?subject rdf:type univ:TeachingAssistant}
  }
}
  UNION
{ ?subject univ:memberOf ?object
  {   { ?subject rdf:type univ:GraduateStudent}
UNION
  { ?subject rdf:type univ:ResearchAssistant}
UNION
  { ?subject rdf:type univ:TeachingAssistant}
  }
}
  UNION
{ ?subject univ:name ?object
  {   { ?subject rdf:type univ:GraduateStudent}
UNION
  { ?subject rdf:type univ:ResearchAssistant}
UNION
  { ?subject rdf:type univ:TeachingAssistant}
  }
}
  UNION
{ ?subject univ:name ?object}
  UNION
{ ?subject univ:publicationAuthor ?object}
  UNION
{ ?subject univ:suborganizationOf ?object
  { { ?subject rdf:type univ:ResearchGroup}}
}
  UNION
{ ?subject rdf:type univ:University}
}
  }

Data set used to query upon is the synthetic data generated using data
generator script available at http://swat.cse.lehigh.edu/projects/lubm/
(UBA 1.7) with command 3 university as the command line argument and name
space as http://example.com;


*Problem* : When I try to run the code to submit query using
QueryEngineHTTP I get an error mainly *broken line : some text*, and each
time I run the code to query dataset I get different type of text like
broken line (new line) : ty, broken line (new line) : www.


I tried to peep in the generated data, but I was not able to find any new
line in the file or any error.

Interestingly, when data is generated using above script UBA 1.7 with 1
university as argument I get no such error for the same query (may be
number of triples are one third when compared with 3 as argument).

*Code* used is 

[jira] [Updated] (JENA-978) jena-text: store original literals

2015-07-03 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-978:
---
Assignee: Osma Suominen  (was: Andy Seaborne)

   jena-text: store original literals
 

 Key: JENA-978
 URL: https://issues.apache.org/jira/browse/JENA-978
 Project: Apache Jena
  Issue Type: Improvement
  Components: Text
Affects Versions: Jena 2.13.0
Reporter: Osma Suominen
Assignee: Osma Suominen
 Fix For: Jena 3.0.0


 As discussed on the dev list, this PR implements a feature where it's 
 possible to store the original literal values in the jena-text Lucene index 
 and to access them when querying the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JENA-978) jena-text: store original literals

2015-07-03 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-978.

   Resolution: Done
Fix Version/s: Jena 3.0.0

   jena-text: store original literals
 

 Key: JENA-978
 URL: https://issues.apache.org/jira/browse/JENA-978
 Project: Apache Jena
  Issue Type: Improvement
  Components: Text
Affects Versions: Jena 2.13.0
Reporter: Osma Suominen
Assignee: Osma Suominen
 Fix For: Jena 3.0.0


 As discussed on the dev list, this PR implements a feature where it's 
 possible to store the original literal values in the jena-text Lucene index 
 and to access them when querying the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-982) Concurrent Modification Exception on DatasetGraphMem.clear()

2015-07-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612981#comment-14612981
 ] 

ASF subversion and git services commented on JENA-982:
--

Commit a3d7609a65606417b4109da201d2aaaf6f87aa2a in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=a3d7609 ]

JENA-982 : CME on DatasetGraphMem.clear

 Concurrent Modification Exception on DatasetGraphMem.clear()
 

 Key: JENA-982
 URL: https://issues.apache.org/jira/browse/JENA-982
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 2.13.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 3.0.0


 Report from : 
 [stackoverflow/31188209|http://stackoverflow.com/questions/31188209/jena-concurrentmodificationexception-on-datasetgraph-clear]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Query parameterization.

2015-07-03 Thread Andy Seaborne

On 01/07/15 07:17, Claude Warren wrote:

SelectBuilder sb = new SelectBuilder()
 .addVar( * )
 .addWhere( ?s, ?p, ?o );
sb.setVar( Var.alloc( ?o ), NodeFactory.createURI(
http://xmlns.com/foaf/0.1/Person;  ) ) ;Query q = sb.build();


Hi Claude,

Should that be one of
  Var.alloc( o )
  Var.alloc(Var.canonical(?o))

How does it compare to the corner cases in my first message?


There is at least one injection attack:

NodeFactory.createURI of

http://xmlns.com/foaf/0.1/Person . ?s ?q http://example/ns;

because it is string inclusion, jena-querybuilder needs to do the same 
checks that ParametrizedSparqlString does for URI.  A check is needed on 
literals but a different kind of test.


BTW:

and how do I add

OPTIONAL {
   ?s q 123 .
   ?s v ?x .
   FILTER(?x56)
}
?

And for UNION, there seems to be a confusion because it takes a 
SelectBuilder (a subquery) but that's an SQL-ism, not SPARQL.


It seems to cause problems:

SelectBuilder sb = new SelectBuilder().addVar(*) ;
sb.addWhere(?s, ?p, ?o) ;
SelectBuilder sb1 = new SelectBuilder().addVar(*) ;
sb1.addWhere(?s, ?p, ?o) ;
sb1.addUnion(sb1) ;
Query q1 = sb1.build() ;
String s1 = q1.toString() ;
System.out.println(s1) ;

I get stack overflow.

UNION and OPTIONAL are similar - they take graph patterns.

Andy



[DRAFT] Jena board report for July 2015

2015-07-03 Thread Andy Seaborne

Report from the Apache Jena project [Andy Seaborne]

## Description:
A framework for developing Semantic Web and Linked Data applications in
Java.

## Activity:

The project is  working towards a major release.

The most significant user-visible is converting to org.apache.jena 
package names everywhere, replacing the historical names for older code 
areas.


The project has also seen new contributors, both helping clean the code 
up, adding new functionality and generally discussing the code.


## Issues:

There are no issues requiring board attention at this time

## PMC/Committership changes:

 - Currently 13 committers and 11 PMC members in the project.
 - Osma Suominen was added to the PMC on Thu Jun 25 2015
 - Osma Suominen was added as a committer on Thu Jun 25 2015

## Releases:

 - Last release was 2.13.0 on Fri Mar 13 2015

## Mailing list activity:

 - us...@jena.apache.org:
- 586 subscribers (up 16 in the last 3 months):
- 485 emails sent to list (663 in previous quarter)

 - dev@jena.apache.org:
- 149 subscribers (down -3 in the last 3 months):
- 1480 emails sent to list (1148 in previous quarter)


## JIRA activity:

 - 67 JIRA tickets created in the last 3 months
 - 62 JIRA tickets closed/resolved in the last 3 months


[jira] [Created] (JENA-982) Concurrent Modification Exception on DatasetGraphMem.clear()

2015-07-03 Thread Andy Seaborne (JIRA)
Andy Seaborne created JENA-982:
--

 Summary: Concurrent Modification Exception on 
DatasetGraphMem.clear()
 Key: JENA-982
 URL: https://issues.apache.org/jira/browse/JENA-982
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 2.13.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 3.0.0


Report from : 
[stackoverflow/31188209|http://stackoverflow.com/questions/31188209/jena-concurrentmodificationexception-on-datasetgraph-clear]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JENA-982) Concurrent Modification Exception on DatasetGraphMem.clear()

2015-07-03 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-982.
--
Resolution: Fixed

 Concurrent Modification Exception on DatasetGraphMem.clear()
 

 Key: JENA-982
 URL: https://issues.apache.org/jira/browse/JENA-982
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 2.13.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 3.0.0


 Report from : 
 [stackoverflow/31188209|http://stackoverflow.com/questions/31188209/jena-concurrentmodificationexception-on-datasetgraph-clear]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)