[jira] [Commented] (JENA-1274) Support a writer-per-graph in-memory dataset

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870112#comment-15870112
 ] 

ASF GitHub Bot commented on JENA-1274:
--

Github user ajs6f closed the pull request at:

https://github.com/apache/jena/pull/204


> Support a writer-per-graph in-memory dataset
> 
>
> Key: JENA-1274
> URL: https://issues.apache.org/jira/browse/JENA-1274
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ, Jena
>Reporter: A. Soroka
>Assignee: A. Soroka
>Priority: Minor
>  Labels: ldp, multithreading, named_graphs
>
> Without too much work we could support a writer-per-graph in-memory dataset. 
> The target use case here is LDP-style interaction or other RESTful 
> architectures, where it is normal for updates to occur centered on one 
> resource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] jena issue #204: One writable graph per thread/transaction dataset

2017-02-16 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/204
  
Closing in favor of further more general discussion about points raised on 
this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #204: One writable graph per thread/transaction dataset

2017-02-16 Thread ajs6f
Github user ajs6f closed the pull request at:

https://github.com/apache/jena/pull/204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: jena-ldp

2017-02-16 Thread A. Soroka
> On Feb 15, 2017, at 4:10 PM, Andy Seaborne  wrote:
> 
> 
> Then there are a couple of comments about design and jena-ldp overall. I took 
> away from that that ThreadPerGraphDataset is in-progress - I think it would 
> be better to know it is the right design for whatever it's use case is before 
> putting into a jena release. It gives more freedom to change without worrying 
> about incompatibility.
> 
> Once released there is an obligation which can be limiting (despite any words 
> around it to say "may change").

Well, the code as it stands is (I think) okay-- but where does it stand within 
Jena? :grin:

I'm going to close that PR and move the branch over to my Github clone. There 
are at least two possibilities here: evolve it towards some kind of jena-ldp, 
or expand it to try to tackle variable lock regions and a more general 
in-memory multiwriter setup. Either deserve their own threads of discussion.

---
A. Soroka
The University of Virginia Library



[jira] [Commented] (JENA-329) Add streaming CONSTRUCT results to Fuseki

2017-02-16 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870101#comment-15870101
 ] 

A. Soroka commented on JENA-329:


+1 to "ergonomics" here (i.e. offering a menu of trade-offs between 
tuple-uniqueness and resource consumption). My only caveat is that we should 
document something like this pretty well, explaining that the new mode does 
_not_ guarantee to eliminate all duplicates.

> Add streaming CONSTRUCT results to Fuseki
> -
>
> Key: JENA-329
> URL: https://issues.apache.org/jira/browse/JENA-329
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Fuseki
>Reporter: Stephen Allen
>
> As a result of JENA-205, streaming results are now available for CONSTRUCT 
> queries.  However there can be duplicate triples in the iterator.  This task 
> is to allow Fuseki to stream back results, while at the same time performing 
> a distinct operation.
> The fix would be to modify SPARQL_Query to use 
> QueryExecution.execConstructTriples() and filter the results through a 
> DistinctDataNet as they are being streamed back to the client.
> This also requires RDFWriter implementations that can accept Iterator 
> instead of Model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Clearing up old branches

2017-02-16 Thread A. Soroka

> On Feb 15, 2017, at 4:31 PM, Andy Seaborne  wrote:
> On 14/02/17 15:52, A. Soroka wrote:

>> [Fedora Commons] does offer a lot more, but it might still be scoped enough 
>> and quickly scalable enough for your immediate needs. Obviously, even though 
>> I commit for it, it doesn't do all my LDP needs because I am here talking 
>> about doing another impl! :)
> 
> What needs do you see here?

It's more that as Osma noticed, Fedora does a great deal that isn't of use or 
interest to everyone. Within that project I tried to move forward an API 
specification that would allow for alternative (more minimal) implementations, 
but until that comes to fruition, it would be nice to have something available 
that is much, much lighter-weight and SPARQL-equipped out of the box and 
supported by a lively community (e.g. this one! :grin:).

> The main choice is designing the URL space - making a SPARQL endpoint not 
> look like an LDP resource. You could snoop deeper into the request iof you 
> want namespace overlap.  That may happen automatically as direct-named GSP 
> resources.  But there again, it may be too clever and no GSP configured and 
> use JAX-RS be a lot easier to build.

One question is whether there is a one-to-one dataset <=> LDP service matching. 
I'm not sure that has to be the case, or even should be.

> Can LDP-NR be handled by web server static resource handling (and maybe a 
> filter if you want to chnage the HTTP header)?

Something like that. I was also thinking about bringing something like Apache 
Camel for that purpose because you might have a lot of different sources for 
bitstreams. From Jena's POV, I think we can just leave that up to the 
integrator, yes?



---
A. Soroka
The University of Virginia Library




[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-02-16 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869749#comment-15869749
 ] 

Osma Suominen commented on JENA-1277:
-

[~s.ara...@geophy.com] Thanks for confirming! It's a bit strange that your 
performance only increased 4x, but I guess this depends a lot on the data and 
the query. With the data and query you submitted above, I saw a performance 
improvement of around 100x, as detailed in my comments.

In any case, we now have confirmation from the original reporter that the 
improvement was helpful. There should be no need to reopen this issue. If there 
are still similar performance problems with jena-spatial, please open a new 
issue instead.

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
>Assignee: Osma Suominen
> Fix For: Jena 3.2.0
>
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (JENA-329) Add streaming CONSTRUCT results to Fuseki

2017-02-16 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869732#comment-15869732
 ] 

Osma Suominen commented on JENA-329:


I implemented something like this for the hdtsparql command line tool in the 
hdt-java package. See this PR: https://github.com/rdfhdt/hdt-java/pull/43

In the implementation I used a 1000-slot LRU cache to check for duplicates, 
effectively a sliding window. In the (very limited) testing I performed, this 
seemed to do a good job of eliminating duplicates with good performance. Of 
course it won't guarantee that all duplicates are eliminated, but I agree with 
Andy above that this is a reasonable trade-off. I considered using 
DistinctDataNet as well, which in my understanding would eliminate all 
duplicates, but it would be a lot more costly in terms of resources (disk space 
and IO) for queries with large result sets.

I could do the same for tdbquery (and/or the sparql command line tool) if 
desired. Probably Fuseki as well, though I'm not very familiar with its 
internals.

> Add streaming CONSTRUCT results to Fuseki
> -
>
> Key: JENA-329
> URL: https://issues.apache.org/jira/browse/JENA-329
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Fuseki
>Reporter: Stephen Allen
>
> As a result of JENA-205, streaming results are now available for CONSTRUCT 
> queries.  However there can be duplicate triples in the iterator.  This task 
> is to allow Fuseki to stream back results, while at the same time performing 
> a distinct operation.
> The fix would be to modify SPARQL_Query to use 
> QueryExecution.execConstructTriples() and filter the results through a 
> DistinctDataNet as they are being streamed back to the client.
> This also requires RDFWriter implementations that can accept Iterator 
> instead of Model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)