[GitHub] [jena] ajs6f commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
ajs6f commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302242337
 
 

 ##
 File path: README.md
 ##
 @@ -1,4 +1,4 @@
-Jena README
+Jena README 
 
 Review comment:
   And it's gone.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] xristy commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
xristy commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302242284
 
 

 ##
 File path: README.md
 ##
 @@ -1,4 +1,4 @@
-Jena README
+Jena README 
 
 Review comment:
   It is unintentional. A colleague was testing some github logic and 
apparently selected the jena fork's README.md. I've committed a repaired 
README.md


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] ajs6f commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
ajs6f commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302240553
 
 

 ##
 File path: jena-text/src/test/resources/log4j.properties
 ##
 @@ -12,5 +12,7 @@ log4j.logger.org.apache.jena.arq.exec=INFO
 # Reduce test/build noise
 log4j.logger.org.apache.jena.query.text.TextIndexLucene=ERROR
 
+# log4j.logger.org.apache.jena.query.text=TRACE
 
 Review comment:
   Cool!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] xristy commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
xristy commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302239544
 
 

 ##
 File path: jena-text/src/test/resources/log4j.properties
 ##
 @@ -12,5 +12,7 @@ log4j.logger.org.apache.jena.arq.exec=INFO
 # Reduce test/build noise
 log4j.logger.org.apache.jena.query.text.TextIndexLucene=ERROR
 
+# log4j.logger.org.apache.jena.query.text=TRACE
 
 Review comment:
   No I committed an update to log4j.properties @ 19:28Z to turn off the TRACE 
level.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (JENA-1723) jena:text create OR's of Lucene fields

2019-07-10 Thread Code Ferret (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882386#comment-16882386
 ] 

Code Ferret commented on JENA-1723:
---

I've made [PR 582|https://github.com/apache/jena/pull/582] for this issue.

The PR implements allowing for a list of properties in a {{text:query}} that 
are then or'd in the Lucene query as described above:

{code}
(?s ?sc ?lit ?graph ?prop) text:query ( skos:prefLabel skos:altLabel rdfs:label 
"some query" "highlight:" )
{code}

and the feature of being able to name a list of properties for later use as 
described above:

{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
{code}

The PR includes detailed unit tests.

> jena:text create OR's of Lucene fields
> --
>
> Key: JENA-1723
> URL: https://issues.apache.org/jira/browse/JENA-1723
> Project: Apache Jena
>  Issue Type: New Feature
>  Components: Jena
>Affects Versions: Jena 3.13.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Minor
>  Labels: pull-request-available
>
> h3. Motivation:
> With the current {{jena:text}} we often find that we have query patterns such 
> as:
> {code}
> select ?foo where {
>   {
>  (?s ?sc ?lit) text:query ( rdfs:label "some query" "highlight:" ).
>   }
>   union
>   {
> (?s ?sc ?lit) text:query ( skos:altLabel "some query" "highlight:" ).
>   }
>   union
>   { 
> (?s ?sc ?lit) text:query ( skos:prefLabel "some query" "highlight:").
>   }
> }
> {code}
> For various sets of RDF properties, each corresponding to some Lucene field.
> It can be more performant to _push_ the {{unions}} into the Lucene query by 
> rewriting as:
> {code}
> (altLabel:"some query" OR prefLabel:"some query" OR label:"some query")
> {code}
> Then it's a single query with Lucene performing the {{unions}}.
> h3. Approach:
> We've implemented this by 
> 1. adding a new assembler feature in {{text:TextIndexLucene}}:
> {code}
> [] text:props (
> text:propList [ text:propListProp  ex:labels ;
>  text:props ( skos:prefLabel skos:altLabel rdfs:label ) ]
> } ;
> {code}
> Which allows to give a single _Property_ id, e.g., {{ex:labels}}, to a list 
> of properties.
> and
> 2. adding some syntax to the {{TextQueryPF}}:
> {code}
> (?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels "some query" 
> "highlight:" )
> {code}
> The addition of the fifth output arg, {{?prop}}, allows to return the 
> specific property that matched and if the input args includes {{text:props}} 
> as the first argument then there must be a list, of at least one, properties 
> prior to the query string. These properties are either the usual Lucene 
> indexed properties that occur in {{text:query}} or a property list property 
> such as {{ex:labels}} above.
> When a list property is encountered it is expanded to the underlying list of 
> indexed properties from the configuration.
> There may be any mix of indexed and property list properties following 
> {{text:props}} in the input arg list:
> {code}
> (?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels rdfs:comment 
> "some query" "highlight:" )
> {code}
> which searches over the three properties listed in {{ex:labels}} and the 
> property {{rdfs:comment}}.
> This functionality is implemented, including copious tests, and a PR can be 
> issued after a bit of code cleanup.
> h3. Discussion:
> The use of {{text:props}} in the query form isn't strictly necessary, and was 
> introduced as a way of indicating the intent to have a list of properties to 
> be searched over. 
> If the {{text:props}} _flag_ is removed from the implementation then the 
> feature will simply check the property(s) for whether they are list 
> properties or just indexed properties.
> With this modification the above queries would be written simply as:
> {code}
> (?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
> {code}
> or
> {code}
> (?s ?sc ?lit ?graph ?prop) text:query ( ex:labels rdfs:comment "some query" 
> "highlight:" )
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [jena] ajs6f commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
ajs6f commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302237192
 
 

 ##
 File path: jena-text/src/test/resources/log4j.properties
 ##
 @@ -12,5 +12,7 @@ log4j.logger.org.apache.jena.arq.exec=INFO
 # Reduce test/build noise
 log4j.logger.org.apache.jena.query.text.TextIndexLucene=ERROR
 
+# log4j.logger.org.apache.jena.query.text=TRACE
 
 Review comment:
   Did you mean to leave this level of logging on for this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] ajs6f commented on a change in pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
ajs6f commented on a change in pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582#discussion_r302235636
 
 

 ##
 File path: README.md
 ##
 @@ -1,4 +1,4 @@
-Jena README
+Jena README 
 
 Review comment:
   Extra space?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] xristy opened a new pull request #582: Jena 1723 or props

2019-07-10 Thread GitBox
xristy opened a new pull request #582: Jena 1723 or props
URL: https://github.com/apache/jena/pull/582
 
 
   PR for JENA-1723


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: RDFStream to RDFConnection

2019-07-10 Thread ajs6f
+1 to a spot in jena-examples with a write-up on our site.

ajs6f

> On Jul 10, 2019, at 3:20 PM, Andy Seaborne  wrote:
> 
> How big is it one file?  A module, even under jena-extras seems a tad heavy.
> 
> Stepping back from the specifics, thinking this might be one of several:
> 
> Is this more of an example of how to to do something? That could be done by 
> publishing the source, still with the Apache legal framework.
> 
> We have jena-examples, package org/apache/jena/example/ and that gets into 
> the release source.
> 
> Maybe that's a way without too much ceremony.
> 
> Or more a "documentation" via the web-site
> Or the cwiki?
> 
>Andy
> 
> On 09/07/2019 10:43, Claude Warren wrote:
>> So, the question is should I go ahead and create a library of StreamRDF
>> implementations in the extras section?  I could see one to do serialization
>> over Kafka (or other queue implementations)?
>> On Mon, Jul 8, 2019 at 5:56 PM Claude Warren  wrote:
>>> The case I was trying to solve was reading a largish XML document and
>>> converting it to an RDF graph.  After a few iterations I ended up writing a
>>> custom Sax parser that calls the RDFStream triple/quad methods.  But I
>>> wanted a way to update a Fuseki server so RDFConnection seemed like the
>>> natural choice.
>>> 
>>> In some recent work for my employer I found that I like the RDFConneciton
>>> as the same code can work against a local dataset or a remote one.
>>> 
>>> Claude
>>> 
>>> On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:
>>> 
 This "replay" buffer approach was the direction I first went in for TIM,
 until turning to MVCC (speaking of MVCC, that code is probably somewhere,
 since we don't squash when we merge). Looking back, one thing that helped
 me move on was the potential effect of very large transactions. But in a
 controlled situation like Claude's, that problem wouldn't arise.
 
 ajs6f
 
> On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> 
> Claude,
> 
> Good timing!
> 
> This is what RDF Delta does and for updates rather than just StreamRDF
 additions though its not to an RDFConnection - it's to a patch service.
> 
> With hindsight, I wonder if that woudl have been better as
 BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
 buffer and underlying DatasetGraph behave correctly (find* works and has
 the right cardinality of results). Its a bit fiddley to get it all right
 but once it works it is a building block that has a lot of re-usability.
> 
> I came across this with the SHACL work for a BufferingGraph (with
 prefixes) give "abort" of transactions to simple graphs which aren't
 transactional.
> 
> But it occurs in Fuseki with complex dataset set ups like rules.
> 
>Andy
> 
> On 08/07/2019 11:09, Claude Warren wrote:
>> I have written an RDFStream to RDFConnection with caching.  Basically,
 the
>> stream caches triples/quads until a limit is reached and then it writes
>> them to the RDFConnection.  At finish it writes any triples/quads in
 the
>> cache to the RDFConnection.
>> Internally I cache the stream in a dataset.  I write triples to the
 default
>> dataset and quads as appropriate.
>> I have a couple of questions:
>> 1) In this arrangement what does the "base" tell me? I currently
 ignore it
>> and want to make sure I havn't missed something.
> 
> The parser saw a BASE statement.
> 
> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
 concatenated).
> 
> Its not necessary because the data stream should have resolved IRIs in
 it so base is used in a stream.
> 
>> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>> from the RDFConnectionStream class.  They are not passed into the
 dataset
>> in any way.  I didn't see any method to do so and don't really think
 it is
>> needed.  Does anyone see a problem with this?
>> 3) Does anyone have a use for this class?  If so I am happy to
 contribute
>> it, though the next question becomes what module to put it in?
 Perhaps we
>> should have an extras package for RDFStream implementations?
>> Claude
 
 
>>> 
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> 
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>> 



Re: RDFStream to RDFConnection

2019-07-10 Thread Andy Seaborne

How big is it one file?  A module, even under jena-extras seems a tad heavy.

Stepping back from the specifics, thinking this might be one of several:

Is this more of an example of how to to do something? That could be done 
by publishing the source, still with the Apache legal framework.


We have jena-examples, package org/apache/jena/example/ and that gets 
into the release source.


Maybe that's a way without too much ceremony.

Or more a "documentation" via the web-site
Or the cwiki?

Andy

On 09/07/2019 10:43, Claude Warren wrote:

So, the question is should I go ahead and create a library of StreamRDF
implementations in the extras section?  I could see one to do serialization
over Kafka (or other queue implementations)?

On Mon, Jul 8, 2019 at 5:56 PM Claude Warren  wrote:


The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:


This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f


On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF

additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as

BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.


I came across this with the SHACL work for a BufferingGraph (with

prefixes) give "abort" of transactions to simple graphs which aren't
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically,

the

stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in

the

cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the

default

dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently

ignore it

and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are

concatenated).


Its not necessary because the data stream should have resolved IRIs in

it so base is used in a stream.



2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the

dataset

in any way.  I didn't see any method to do so and don't really think

it is

needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to

contribute

it, though the next question becomes what module to put it in?

Perhaps we

should have an extras package for RDFStream implementations?
Claude





--
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren






Re: [jena] 01/04: Build for wider range of JDKs on Travis

2019-07-10 Thread Andy Seaborne

If it broken by JDK9 or JDK10, it'll be broken in JDK11 :-)

And we could start to be checking out JDK14.

Would it be better to have these builds nightly on Jenkins?

What's your usage model for Travis-CI?

I use it to check branch work on my cloned repo, not on the master repo.

Andy

On 10/07/2019 10:05, Rob Vesse wrote:

Yeah, though as with most past Java versions that doesn't mean people aren't 
still using them

Having a wide range of builds helps spot any breaking changes or behavioural 
subtleties across JVM versions

Rob

On 08/07/2019, 20:30, "Andy Seaborne"  wrote:

 Rob,
 
 Aren't java9 and java10 now end-of-life?
 
 If so - do we need them in the general travis (merely because each adds

 15-20 mins).
 
 I use Travis for branch development - we can use Jekins to validate master.
 
 In the same vein - what about a 13 Early Access build on ASF Jenkins?
 
  Andy
 
 On 08/07/2019 10:16, rve...@apache.org wrote:

 > This is an automated email from the ASF dual-hosted git repository.
 >
 > rvesse pushed a commit to branch master
 > in repository https://gitbox.apache.org/repos/asf/jena.git
 >
 > commit f0bf5f317e0725fd7b375dfa83859692c79216e6
 > Author: Rob Vesse 
 > AuthorDate: Mon Apr 29 13:44:15 2019 +0100
 >
 >  Build for wider range of JDKs on Travis
 > ---
 >   .travis.yml | 3 +++
 >   1 file changed, 3 insertions(+)
 >
 > diff --git a/.travis.yml b/.travis.yml
 > index 7b39dcc..6b02e49 100644
 > --- a/.travis.yml
 > +++ b/.travis.yml
 > @@ -5,5 +5,8 @@ script: mvn -B clean install
 >   jdk:
 > - openjdk8
 > - oraclejdk8
 > +  - openjdk9
 > +  - openjdk10
 > +  - openjdk11
 >   env:
 > - JAVA_OPTS="-Xmx3072M -Xms512M -XX:+UseG1GC"
 >
 







Re: [jena] 01/04: Build for wider range of JDKs on Travis

2019-07-10 Thread Rob Vesse
Yeah, though as with most past Java versions that doesn't mean people aren't 
still using them

Having a wide range of builds helps spot any breaking changes or behavioural 
subtleties across JVM versions

Rob

On 08/07/2019, 20:30, "Andy Seaborne"  wrote:

Rob,

Aren't java9 and java10 now end-of-life?

If so - do we need them in the general travis (merely because each adds 
15-20 mins).

I use Travis for branch development - we can use Jekins to validate master.

In the same vein - what about a 13 Early Access build on ASF Jenkins?

 Andy

On 08/07/2019 10:16, rve...@apache.org wrote:
> This is an automated email from the ASF dual-hosted git repository.
> 
> rvesse pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/jena.git
> 
> commit f0bf5f317e0725fd7b375dfa83859692c79216e6
> Author: Rob Vesse 
> AuthorDate: Mon Apr 29 13:44:15 2019 +0100
> 
>  Build for wider range of JDKs on Travis
> ---
>   .travis.yml | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/.travis.yml b/.travis.yml
> index 7b39dcc..6b02e49 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -5,5 +5,8 @@ script: mvn -B clean install
>   jdk:
> - openjdk8
> - oraclejdk8
> +  - openjdk9
> +  - openjdk10
> +  - openjdk11
>   env:
> - JAVA_OPTS="-Xmx3072M -Xms512M -XX:+UseG1GC"
>