[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-14 Thread osma
Github user osma commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-102144875
  
Hi, I see that you already changed your code - impressive work!

One more suggestion - I hope it won't come too late, since you've already 
moved code from TextIndexLucene to TextIndexLuceneMultilingual like I 
suggested...

I've been thinking about how I could make use of this in my application 
(Skosmos). I don't have a need for language-specific analyzers 
(LowerCaseKeywordAnalyzer works better for the application, in all languages), 
but it could be useful to be able to target searches based on language - this 
way I could avoid some false hits from the text index and thus get faster 
queries overall. So I wonder if it would be possible to separate these two 
aspects:

1. Store language tags of literals in the Lucene index and be able to 
restrict the query to a specific language with a query parameter
2. Use different analyzers for different languages

Right now your code does both, but it's not possible to do only 1. 
Obviously 2 depends on 1.

How about adding a new option langField, similar to graphField, that 
can be configured via the assembler (or as a constructor parameter, just like 
graphField). When set to the name of a field (the obvious choice would be 
lang), the language tags would get stored in the index, and it would be 
possible to target queries for a specific language. This would already be base 
functionality for TextIndexLucene (sorry - I know you just moved the code 
away!).

Then the MultilingualAnalyzer, implemented in TextIndexLuceneMultilingual, 
would depend on having langField set and would actually cause the analyzer to 
be dynamically selected based on language.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Commons RDF

2015-05-14 Thread aj...@virginia.edu
Wearing my Jena user's hat for a moment, this would be lovely and I would be 
happy to help with it.

A project [1] on which I work persists RDF via some very complex mappings into 
and out of a JCR repository, and being able to stream it a little more 
gracefully would be a nice win for us. Those mappings are basically formed out 
of iterators and transformations, kind of a poor man's Stream API, but we're 
moving to rebuild over the real Streams API. Maybe this could be generalized 
into a more popular use case?

[1]: http://www.fedora-commons.org/

---
A. Soroka
The University of Virginia Library

On May 14, 2015, at 3:46 PM, Stian Soiland-Reyes st...@apache.org wrote:

 I'm also interested in making Jena parsers and serializers usable directly
 from a Commons RDF perspective, without interaction with intermediate Jena
 core objects. E.g something like:
 
 StreamTriple s = JenaCommonsRDF.read(inputStream, Lang.Turtle)
 
 And vice versa for write.
 
 Such a bridge should be possible on top of StreamRDF and RIOT, right?
 Perhaps a Worker thread is needed if there is pull vs push issues.
 
 Should we start a branch, or first flesh out the rough edges of such a
 bridge module in the wiki?
 On 13 May 2015 15:59, Andy Seaborne a...@apache.org wrote:
 
 On 12/05/15 15:26, A. Soroka wrote:
 
 At:
 
 http://commonsrdf.incubator.apache.org/implementations.html
 
 It says Apache Jena is considering a compatibility interface that
 provides and consumes Commons RDF objects.
 
 I'm wondering if there have been any experiments to that end, or whether
 Jena is waiting for some resources to explore that possibility? I would be
 happy to give a go at making a simple module that just implements the
 current Commons RDF API types over
 jena-core in a simple way, to get things started.
 ---
 A. Soroka
 The University of Virginia Library
 
 
 I have some code that mocks up commonsrdf over Jena in the sense that it
 uses jena behind the RDFTermFactory; that's the easy bit.  It's limited and
 definitely not a bridge between the two APIs.  It is merely exploring the
 commonsrdf work.
 
 It would mess up the existing interfaces no end to add commonsrdf as
 interfaces to Model/Resource; and Graph/Triple/Node is generalized RDF so
 the type model does not fit.
 
 It needs a bridge module and a proper module would be good.
 
 ((I also have https://github.com/afs/commonsrdf-container which is even
 more minimal than the simple implementation.  Not Jena related.))
 
 Some other interesting projects:
  An in-memory dataset : JENA-624
 
 Have a specifically in-memory DatasetGraph to complement the current
 general purpose dataset.
 
  Bruno is working on JENA-632
 
 In fact, I can see commonsrdf being at the center of a new API, very Java8
 specific, that is oriented around processing RDF stream style - see the
 email from Paul Houle.
 
 Or take StreamRDF and add java8-stream-ness around it (maybe not directly
 changing but making it the source for java8-streams - some issues of
 pull-streams and push-stream styles here which are hard when efficiency is
 considered).
 
 
Andy
 



Re: Commons RDF

2015-05-14 Thread Stian Soiland-Reyes
I'm also interested in making Jena parsers and serializers usable directly
from a Commons RDF perspective, without interaction with intermediate Jena
core objects. E.g something like:

StreamTriple s = JenaCommonsRDF.read(inputStream, Lang.Turtle)

And vice versa for write.

Such a bridge should be possible on top of StreamRDF and RIOT, right?
Perhaps a Worker thread is needed if there is pull vs push issues.

Should we start a branch, or first flesh out the rough edges of such a
bridge module in the wiki?
On 13 May 2015 15:59, Andy Seaborne a...@apache.org wrote:

 On 12/05/15 15:26, A. Soroka wrote:

 At:

 http://commonsrdf.incubator.apache.org/implementations.html

 It says Apache Jena is considering a compatibility interface that
 provides and consumes Commons RDF objects.

 I'm wondering if there have been any experiments to that end, or whether
 Jena is waiting for some resources to explore that possibility? I would be
 happy to give a go at making a simple module that just implements the
 current Commons RDF API types over
 jena-core in a simple way, to get things started.
 ---
 A. Soroka
 The University of Virginia Library


 I have some code that mocks up commonsrdf over Jena in the sense that it
 uses jena behind the RDFTermFactory; that's the easy bit.  It's limited and
 definitely not a bridge between the two APIs.  It is merely exploring the
 commonsrdf work.

 It would mess up the existing interfaces no end to add commonsrdf as
 interfaces to Model/Resource; and Graph/Triple/Node is generalized RDF so
 the type model does not fit.

 It needs a bridge module and a proper module would be good.

 ((I also have https://github.com/afs/commonsrdf-container which is even
 more minimal than the simple implementation.  Not Jena related.))

 Some other interesting projects:
   An in-memory dataset : JENA-624

 Have a specifically in-memory DatasetGraph to complement the current
 general purpose dataset.

   Bruno is working on JENA-632

 In fact, I can see commonsrdf being at the center of a new API, very Java8
 specific, that is oriented around processing RDF stream style - see the
 email from Paul Houle.

 Or take StreamRDF and add java8-stream-ness around it (maybe not directly
 changing but making it the source for java8-streams - some issues of
 pull-streams and push-stream styles here which are hard when efficiency is
 considered).


 Andy



[GitHub] jena pull request: JENA-938: Nonfunctional cleanup in various modu...

2015-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/60


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: JENA-938: Nonfunctional cleanup in various modu...

2015-05-14 Thread afs
Github user afs commented on the pull request:

https://github.com/apache/jena/pull/60#issuecomment-102169543
  
I've picked out as much material from the PR as I can and applied according 
to the discussions (so not removing interface declarations when the base type 
includes it, left exceptions on \@overrides in main code, removed unnecessary 
exceptions more completely in test code).

I have not handled jena-permissions, a module I know less about. One of the 
reasons split by module is helpful is that different people tend to look after 
different modules so one person does not have oversight of all code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544354#comment-14544354
 ] 

ASF subversion and git services commented on JENA-938:
--

Commit 49480190d4ab4bb1436a431a64dc5883faa4190c in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=4948019 ]

JENA-938 : Code cleaning (extract of PR #60)

 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544356#comment-14544356
 ] 

ASF subversion and git services commented on JENA-938:
--

Commit a251dd9a45b28cff1422982e4307d4300f049e36 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=a251dd9 ]

JENA-938 : Code cleaning (extract of PR #60)

 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544357#comment-14544357
 ] 

ASF subversion and git services commented on JENA-938:
--

Commit 51d83cf0894b0be7fc27ae5a606313f508c38f88 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=51d83cf ]

JENA-938 : Code cleaning (extract of PR #60)


 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544366#comment-14544366
 ] 

ASF GitHub Bot commented on JENA-938:
-

Github user afs commented on the pull request:

https://github.com/apache/jena/pull/60#issuecomment-102169543
  
I've picked out as much material from the PR as I can and applied according 
to the discussions (so not removing interface declarations when the base type 
includes it, left exceptions on \@overrides in main code, removed unnecessary 
exceptions more completely in test code).

I have not handled jena-permissions, a module I know less about. One of the 
reasons split by module is helpful is that different people tend to look after 
different modules so one person does not have oversight of all code.


 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544358#comment-14544358
 ] 

ASF GitHub Bot commented on JENA-938:
-

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/60


 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-938) Clean up dead code

2015-05-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544355#comment-14544355
 ] 

ASF subversion and git services commented on JENA-938:
--

Commit 45ee91c875031c6187cd76dfeef4a7e9b9ee8d6e in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=45ee91c ]

JENA-938 : Code cleaning (extract of PR #60)

 Clean up dead code
 --

 Key: JENA-938
 URL: https://issues.apache.org/jira/browse/JENA-938
 Project: Apache Jena
  Issue Type: Task
  Components: Jena
Affects Versions: Jena 3.0.0
Reporter: A. Soroka
Priority: Minor
  Labels: cleanup, jena

 This is an umbrella task to which several PRs will be attached, each 
 containing clean up for some modules in Jena. Each PR will contain only 
 non-controversial emendations, such as the removal of unused imports or 
 unthrown exceptions. Specifically disallowed are the removal of actual logic 
 or methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-14 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-102207391
  
Ok, it's not supposed to be a big job. I'll take a look soon.
For the multilingual analyzer, the lang must be stored anyway, it depends 
on it (like you said in point 2). So, either the langField param will be 
ignored or an exception will be raised to alert the forgotten field ?

Another point :
In the current version, I put an undef value rather than an empty one for 
the unlocalized literals. Because the query on empty field is not obvious with 
Lucene and I want to be able to search unlocalized values in explicit way. In 
your case, I don't think it will be a necessity.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (JENA-942) XML results from dbpedia.org can not be parsed.

2015-05-14 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542679#comment-14542679
 ] 

Andy Seaborne edited comment on JENA-942 at 5/14/15 11:16 AM:
--

This is a manifestation of Java bug 
https://bugs.openjdk.java.net/browse/JDK-8029437

Possible workarounds:

# Use a different format (see below).
# Switch to SAX based parsing: {{SystemARQ.UseSAX = true ;}}
# Add a dependency on the WoodSToX STaX parser to the application project: 
{{org.codehaus.woodstox:wstx-asl}} (versions 3.2.7 and 4.0.6 seem to work)
# [Add the WoodSToX 
jar|http://repo1.maven.org/maven2/org/codehaus/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.jar]
  to the classpath.

For SPARQL result in JSON:
{noformat}
QueryEngineHTTP qexec = (QueryEngineHTTP) 
QueryExecutionFactory.sparqlService(uri, query);
// request JSON results
qexec.setSelectContentType(WebContent.contentTypeResultsJSON);
ResultSet results = qexec.execSelect();
{noformat}


was (Author: andy.seaborne):
This is a manifestation of Java bug 
https://bugs.openjdk.java.net/browse/JDK-8029437

Possible workarounds:

# Use a different format (see below).
# Switch to SAX based parsing: `SystemARQ.UseSAX = true ;`
# Add a dependency on the WoodSToX STaX parser to the application project: 
{{org.codehaus.woodstox:wstx-asl}} (versions 3.2.7 and 4.0.6 seem to work)
# [Add the WoodSToX 
jar|http://repo1.maven.org/maven2/org/codehaus/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.jar]
  to the classpath.

For SPARQL result in JSON:
{noformat}
QueryEngineHTTP qexec = (QueryEngineHTTP) 
QueryExecutionFactory.sparqlService(uri, query);
// request JSON results
qexec.setSelectContentType(WebContent.contentTypeResultsJSON);
ResultSet results = qexec.execSelect();
{noformat}

 XML results from dbpedia.org can not be parsed.
 ---

 Key: JENA-942
 URL: https://issues.apache.org/jira/browse/JENA-942
 Project: Apache Jena
  Issue Type: Bug
Reporter: Andy Seaborne

 If dbpedia.org generates XML 1.1 results, then the use of {{variable 
 name=.../}} or any XML tag with {{.../}} can cause:
 {noformat}
 com.hp.hpl.jena.sparql.resultset.ResultSetException: Failed when initializing 
 the StAX parsing engine
 at com.hp.hpl.jena.sparql.resultset.XMLInputStAX.init(XMLInputStAX.java:118)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)