from:"aj...@virginia.edu"

Re: Maintenance branches

2017-01-24 Thread aj...@virginia.edu

I think Claude introduced the idea of LTS releases, so I'm curious about 
whether he thinks that the audience for stability includes people who would use 
a "stable" series of the kind Osma describes, even without the Apache 
imprimatur.

ajs6f 

> On Jan 24, 2017, at 2:57 PM, Andy Seaborne  wrote:
> 
> 
> 
>> On 24/01/17 12:57, Osma Suominen wrote:
>> 23.01.2017, 19:31, Andy Seaborne kirjoitti:
>> 
>>> To expand on that: That would mean users could get source code to build
>>> themselves, it would not be an "Apache release" and not in maven
>>> central.  For "products", the legal side of a release probably matters.
>> 
>> Source code yes, but I think it would make sense to set up some kind of
>> autobuilder for the stable branch, similar to how snapshots are built
>> nightly. It shouldn't be much effort to set this up, but it would be a
>> valuable service for users.
> 
> It's not an Apache release.
> 
> Snapshots are specifically allowed for developers which we include anyone 
> picking and testing.
> 
> They are not releases.
> 
> Products that want LTS stability will, I believe, want:
> * The ASF release legal framework
> * Assurance that the LTS will be around for the life of the product
> * Ideally, support contracts (3rd party)
> 
> It is likely because they don't have the technical capabilities or resources 
> in-house to investigate and report, let alone fix.
> 
> The trouble really comes when a "bug fix" is a feature change. If the bug is 
> not some low thing like an NPE, one products view of a "fix" is another 
> products regression.
> 
> (Believe me! It's happing to me right now - a SPARQL fix to comply with the 
> standard has causes interesting changes.)
> 
> ---
> 
> There are three options here:
> 
> * Current
>  Advantage: bug fixes, most timely.
>  Disadvantage: picks up everything
> 
> * A "last release+fixes" branch
>  Not a release ... unless voted on
>  Not long term stability (product life : years)
>  Some extra work
> 
> * LTS
>  Long term commitment.
>  More work.
> 
> And a point about LTS - more bug reports are nice, but contributions of fixes 
> is much better.
> 
> I'm not convinced that item 2 would be much used - they last only 4 or 6 
> months as I understand the concept.
> 
> Events like Jena2->Jena3 are extremely rare.  Otherwise, we add features, not 
> remove them, backwards compatibility is as good as a stable branch (I would 
> hope!).  The low-cost way of careful adding to master seems to me best unless 
> we have additional contributions of fixes (not just reports) or other 
> resourcing.
> 
>Andy

Re: Maintenance branches

2017-01-24 Thread aj...@virginia.edu

I've had some experience with protocols such as are described by Osma and I 
think that they have real value for (particularly for large) users and sites. 
And as he says, they can be automated. I would be willing to help with that. I 
would like to learn more about Apache infrastructure.

That having been said, I must also agree with Andy that, much as we might like 
to provide them, true LTS releases are probably beyond our strength right now. 
I wonder if any vendors are currently offering such a product?

ajs6f 

> On Jan 24, 2017, at 7:57 AM, Osma Suominen  wrote:
> 
> 23.01.2017, 19:31, Andy Seaborne kirjoitti:
> 
>> To expand on that: That would mean users could get source code to build
>> themselves, it would not be an "Apache release" and not in maven
>> central.  For "products", the legal side of a release probably matters.
> 
> Source code yes, but I think it would make sense to set up some kind of 
> autobuilder for the stable branch, similar to how snapshots are built 
> nightly. It shouldn't be much effort to set this up, but it would be a 
> valuable service for users.
> 
>>> Currently when a user discovers and reports a bug in Jena and it gets
>>> fixed in master, the user has to choose between waiting for the next
>>> release or using a snapshot,
>> or cherry picking - it's a distributed version control system!
> 
> You're right, but it takes some effort and understanding of the git tree and 
> how to build it.
> 
>>> which may have other unrelated issues due
>>> to ongoing development. With a stable branch, there would be a third
>>> option - like the previous release, but with some bugs fixed.
>> 
>> If we want a proper release, it's a vote - quite doable, just needs
>> someone to do it.
> 
> Yes, having more frequent releases and distributing the RM burden further are 
> both excellent new developments. If new releases are made frequently, there 
> is less need for a stable branch.
> 
> I'm not proposing creating such a stable branch at the moment, just pointing 
> out that if we want to better serve users who need a (semi-)supported 
> non-development version, a stable branch like this could be a solution that 
> wouldn't require much extra effort from the developers.
> 
> -Osma
> 
> -- 
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi

Re: Jena system initialization

2015-09-20 Thread aj...@virginia.edu

>
> When a class is loaded, the  is not run.  That happens on first use.
>

Ah, this is what I did not understand correctly. Okay, no problem.

IMHO JenaSystem.init is a step in the right direction and better way than
> the current way initialization is done.
>

No argument here.

What would be helpful at this point is concrete improvements/alternatives
and concrete evaluation esp in OSGi.  [*]

I would like to do that, but I am still working JENA-624, _very_ slowly
(bits of Sunday afternoons only). Ah, the joys of all-volunteer projects,
right? {grin}

---
A. Soroka
The University of Virginia Library

On Sun, Sep 20, 2015 at 9:53 AM, Andy Seaborne  wrote:

> On 20/09/15 12:16, A. Soroka wrote:
>
>> On Sep 18, 2015, at 6:21 PM, Andy Seaborne  wrote:
>>
>>>
>>> How would people have the chance to call JenaSystem::set before those
 static initializers run? Or am I misunderstanding the use of the hook?

>>>
>>> The documentation
>>>
>>>/**
>>> * Set the {@link JenaSubsystemRegistry}.
>>> * To have any effect, this function must be called before any other
>>> Jena code,
>>> * and especially before calling {@code JenaSystem.init()}.
>>> */
>>>
>>> Touch Jena and it initialises.
>>>
>>
>> Yes, this is what I don’t understand. If JenaSystem.init() is
>> gettingcalled in static initializers (and it is) then that means I must
>> somehow
>> call JenaSystem.set() in static code, too, or I don’t see how it happen
>> before JenaSystem.init(). Even then, it doesn’t seem that I can
>> guarantee that my JenaSystem.set() call will precede JenaSystem.init()
>> getting called.
>>
>
> This works with "*** LOAD" printed and running with debug on:
>
>JenaSubsystemRegistry r = new JenaSubsystemRegistryBasic() {
>@Override
>public void load() {
>System.err.println("*** LOAD") ;
>super.load();
>}
>} ;
>
>// Set the sub-system registry
>JenaSystem.setSubsystemRegistry(r);
>
>// Enable output if required
>JenaSystem.DEBUG_INIT = true ;
>
>// Initialize Jena directly or indirectly
>//JenaSystem.init() ;
>ModelFactory.createDefaultModel() ;
>
> and ModelFactory has a static initializer.  This is the first call to
> "other Jena code".  When a class is loaded, the  is not run.  That
> happens on first use.
>
> Run the example code above uncommenting the "System.init()" and you will
> see that the call in ModelFactory returns early (the recursive
> initialization problem - much discussed in the comments and a well known
> Java issue.) if the explicit JenaSystem.init() is used.
>
>
> IMHO JenaSystem.init is a step in the right direction and better way than
> the current way initialization is done.
>
> What would be helpful at this point is concrete improvements/alternatives
> and concrete evaluation esp in OSGi.  [*]
>
> Andy
>
>
> http://jena.staging.apache.org/documentation/notes/system-initialization.html
>
> [*] JENA-913 : The OSGi integration testing in the build is broken.
>
>
>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>
>>
>

Re: Jena OSGi (was: [] 201508 Release of 23 Clerezza modules)

2015-09-11 Thread aj...@virginia.edu

I've used OSGi enough to understand why Class.forName() is problematic. Some of 
these uses, however, seem like pretty legitimate dynamic code, for example the 
assembler subsystem. An OSGi solution to that need might be the OSGi service 
registry, but that's obviously not useful here.

Some of the other uses could be replaced with the use of a plain Java 
ServiceLoader. I'm not sure what you mean by "Such registrations should 
instead by done with the java.lang.Class parameters - which can then be used 
directly." but I think your message was cut off?

---
A. Soroka
The University of Virginia Library

On Sep 10, 2015, at 5:50 PM, Stian Soiland-Reyes <st...@apache.org> wrote:

> Last time I looked at interdependencies there were several
> Class.forName() calls around in jena-core and jena-arq
> - see https://paste.apache.org/5y0W
> 
> Class.forName() depends on the ClassLoader of the caller (by
> introspecting the call stack) - but in OSGi there are multiple
> ClassLoaders, think of it as one per JAR - and they can only access
> packages that are declared as Imports in their META-INF/MANIFEST.MF
> 
> 
> This would falls apart if the class to be called is not explicitly
> included in the OSGi imports. Some of these were for instance with
> jena-arq parsers and writers registering themselves with classname in
> jena-core - but jena-core can't access jena-arq classes in OSGi
> (Although circular imports are technically allowed in OSGi it's not
> usually a good idea).
> 
> 
> Now we have Jena 3, but we still have the duplication between
> RDFReaderFImpl in jena-core and IO_JenaReaders in jena-arq - so this
> is very much a real problem, because using riot would autoregister its
> classnames in RDFReaderFImp.  Third-party callers could also be
> registering - although RDFReaderFImp is screaming "imp imp" all over
> the place, so we should be free to change that.
> 
> 
> Such registrations should instead by done with the java.lang.Class
> parameters - which can then be used directly.  The
> 
> 
> 
> 
> 
> 
> 
> On 10 September 2015 at 22:50, Stian Soiland-Reyes <st...@apache.org> wrote:
>> On 10 September 2015 at 18:13, aj...@virginia.edu <aj...@virginia.edu> wrote:
>>> If this is a matter of "just a couple of lines in the manifest file" cannot 
>>> a patch be created to do that in Jena itself? Are there inter-module 
>>> dependency issues that make that difficult?
>> 
>> 
>> In theory just setting
>> 
>> bundle
>> 
>> and using the maven-bundle-plugin
>> is enough to auto-generate the correct META-INF metadata for OSGi.
>> This can be customized (as we do for the apache-jena-osgi/jena-osgi
>> module).
>> 
>> One complication is if the external dependencies are OSGi or not -
>> httpclient is one tricky one as it has done the same as Jena and
>> provided a separate wrapper httpclient-osgi (and httpcore-osgi) -
>> however the way they did this with Maven means that just using it as a
>> dependency would still pull in a dependency on the regular httpclient
>> library. So if you are a non-OSGi user you would then see the
>> httpclient classes in two JARs - which with Maven version resolution
>> could easily become in mismatched versions.
>> 
>> Ironically httpclient-osgi does not depend on httpcore-osgi - so the
>> one dependency that it truly need isn't stated in its pom.
>> 
>> 
>> In jena-osgi I therefore excluded all those deeper dependencies:
>> 
>> See 
>> https://github.com/apache/jena/blob/master/apache-jena-osgi/jena-osgi/pom.xml#L165
>> 
>> To avoid Jena-Maven-OSGi users the same issue, I similarly here
>> slightly misused the provided for the dependencies that
>> are not to be dependencies of the final jena-osgi JAR, but which are
>> shadowed inside.
>> https://github.com/apache/jena/blob/master/apache-jena-osgi/jena-osgi/pom.xml#L107
>> 
>> 
>> If we move to bundle then we should get
>> httpclient folks to sort out their poms upstream so we can rely on
>> them in a cleaner fashion across Jena. (or put this exclusion rule
>> into  of jena-parent) - I don't think
>> copy-pasting that big  block around anything that directly
>> or indirectly requires things like httpclient is good.
>> 
>> Ideally they should also move to bundle and
>> avoid *-osgi, which would of course simplify things.
>> 
>> 
>> There are other potential issues as Class.forName() which the current
>> jena-osgi is narrowly cheating around by effectively making a single
>> class loader for all of Jena (including as Reto pointed out, TDB etc)
>> 
>> 
>> 
>> 
>> 
>> --
>> Stian Soiland-Reyes
>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> http://orcid.org/-0001-9842-9718
> 
> 
> 
> -- 
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons RDF (incubating)
> http://orcid.org/-0001-9842-9718

Re: [] 201508 Release of 23 Clerezza modules

2015-09-10 Thread aj...@virginia.edu

If this is a matter of "just a couple of lines in the manifest file" cannot a 
patch be created to do that in Jena itself? Are there inter-module dependency 
issues that make that difficult?

---
A. Soroka
The University of Virginia Library

On Sep 10, 2015, at 11:49 AM, Reto Gmür  wrote:

> On 9 Sep 2015 13:50, "Rob Vesse"  wrote:
>> 
>> This seems a little odd to me.  It looks like they are placing these
>> artifacts in their own group ID.  However it still sets a slightly strange
>> precedence if Apache Foo can release artifacts named Apache Bar even if
>> they do so under their own maven coordinates
>> 
>> Is this something they've been doing for a long time or is this a new
>> thing?
> 
> It is something which clerezza had been doing for a very long time. Apache
> servicemix does the same for other projects that do not ship OSGi bundles,
> see: http://mvnrepository.com/artifact/org.apache.servicemix.bundles
> 
> Of course in an ideal world Jena would be modular and all it's jars would
> also be OSGi bundles, after all this is just a couple of lines in the
> manifest file.
> 
>> 
>> If new why couldn't they work with us to provide the fixes back to Jena?
> 
> What clerezza is doing is not an actual fix, but rather a wrapping. Stian
> did something similar.
> 
> Reto
>> 
>> Rob
>> 
>> On 07/09/2015 17:35, "Andy Seaborne"  wrote:
>> 
>>> PMC,
>>> 
>>> Clerezza is proposing redistributing modified Jena 2.13.0 binaries.
>>> NOTICE and LICENSE have been changed.  These would go into the Apache
>>> release maven repo.
>>> 
>>> The binaries are currently at:
>>> 
>>> 
> https://repository.apache.org/content/repositories/orgapacheclerezza-1009/
>>> org/apache/clerezza/ext/org.apache.jena.jena-core/2.13.0_1/
>>> 
>>> (Modified version number as well - it does not make clear that 2.13.0_1
>>> is not Jena-project release.)
>>> 
>>>  Andy
>>> 
>>>  Forwarded Message 
>>> Subject: Re: [] 201508 Release of 23 Clerezza modules
>>> Date: Mon, 7 Sep 2015 12:12:23 +0100
>>> From: Andy Seaborne 
>>> To: d...@clerezza.apache.org
>>> 
>>> On 06/09/15 18:39, Reto Gmür wrote:
 On Sat, Sep 5, 2015 at 10:21 PM, Andy Seaborne  wrote:
 
> On 05/09/15 16:36, Reto Gmür wrote:
> 
>> Hi all,
>> 
>> This is a partial clerezza release of 23 modules bringing the
>> following
>> improvements:
>> 
>> - Fixed issues preventing rdf.rdfjson and rdf.jena.sparql to expose
>> their
>> OSGi-DS services
>> - Updated to latest version of Jersey
>> - Updated Jena Version
>> - Contains integration tests
>> 
>> It contains the following artifacts that shall be released to maven
>> central:
>> 
> 
> Where are the convenience binaries?  (I didn't see anything on
> https://repository.apache.org/#stagingRepositories but may have missed
> something)
 
 Enabled now. Here:
 
 
> https://repository.apache.org/content/repositories/orgapacheclerezza-1009
 /
>>> 
>>> Could you have used Jena's OSGi artifact?
>>> 
>>> The binaries have had the NOTICE and LICENSE files replaced in both jar
>>> and sources.jar. These miss the necessary declarations.
>>> 
>>>  Andy
>>> 
 
 Cheers,
 Reto
 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>>

Re: 3rd party modifying Jena binaries. Re: [] 201508 Release of 23 Clerezza modules

2015-09-09 Thread aj...@virginia.edu

Is it your impression that the "special OSGi spice" additions are something 
that Jena could reasonably adopt into normal builds? Then maybe they wouldn't 
feel the need to do this… 

---
A. Soroka
The University of Virginia Library

On Sep 9, 2015, at 4:06 PM, Andy Seaborne  wrote:

> 
> On 09/09/15 11:49, Rob Vesse wrote:
>> This seems a little odd to me.  It looks like they are placing these
>> artifacts in their own group ID.  However it still sets a slightly strange
>> precedence if Apache Foo can release artifacts named Apache Bar even if
>> they do so under their own maven coordinates
> 
> We do something vaguely similar with Google Guava using "jena-shaded-guava". 
> The original Guava binaries do not include NOTICE and LICENSE files.  But 
> then we change the class file and sources in accordance with the package 
> names.  Maybe Clerezza should shade to under org.apache.clerezza.ext.jena.
> 
> Clerezza artifact labelling does confuse.
> 
> The modifications to Jena binaries are that there is other stuff in the jars 
> for OSGi, timestamps are "now" not "then".  You can't tell by looking at jars 
> whether there code changes, but the related pom looks like a shade-OSGi step.
> 
> This is not specific to Jena - there are other jars having had the same 
> process applied to them.
> 
> The removing the NOTICE and LICENSE is a problem.
> 
> They are specific to the modules and ought to carried over - they can have 
> more added but removing the contents of another open source projects N is a 
> big no-no.
> 
>   Andy
> 
>> Is this something they've been doing for a long time or is this a new
>> thing?
>> 
>> If new why couldn't they work with us to provide the fixes back to Jena?
>> 
>> Rob
>> 
>> On 07/09/2015 17:35, "Andy Seaborne"  wrote:
>> 
>>> PMC,
>>> 
>>> Clerezza is proposing redistributing modified Jena 2.13.0 binaries.
>>> NOTICE and LICENSE have been changed.  These would go into the Apache
>>> release maven repo.
>>> 
>>> The binaries are currently at:
>>> 
>>> https://repository.apache.org/content/repositories/orgapacheclerezza-1009/
>>> org/apache/clerezza/ext/org.apache.jena.jena-core/2.13.0_1/
>>> 
>>> (Modified version number as well - it does not make clear that 2.13.0_1
>>> is not Jena-project release.)
>>> 
>>> Andy
>>> 
>>>  Forwarded Message 
>>> Subject: Re: [] 201508 Release of 23 Clerezza modules
>>> Date: Mon, 7 Sep 2015 12:12:23 +0100
>>> From: Andy Seaborne 
>>> To: d...@clerezza.apache.org
>>> 
>>> On 06/09/15 18:39, Reto Gmür wrote:
 On Sat, Sep 5, 2015 at 10:21 PM, Andy Seaborne  wrote:
 
> On 05/09/15 16:36, Reto Gmür wrote:
> 
>> Hi all,
>> 
>> This is a partial clerezza release of 23 modules bringing the
>> following
>> improvements:
>> 
>> - Fixed issues preventing rdf.rdfjson and rdf.jena.sparql to expose
>> their
>> OSGi-DS services
>> - Updated to latest version of Jersey
>> - Updated Jena Version
>> - Contains integration tests
>> 
>> It contains the following artifacts that shall be released to maven
>> central:
>> 
> 
> Where are the convenience binaries?  (I didn't see anything on
> https://repository.apache.org/#stagingRepositories but may have missed
> something)
 
 Enabled now. Here:
 
 https://repository.apache.org/content/repositories/orgapacheclerezza-1009
 /
>>> 
>>> Could you have used Jena's OSGi artifact?
>>> 
>>> The binaries have had the NOTICE and LICENSE files replaced in both jar
>>> and sources.jar. These miss the necessary declarations.
>>> 
>>> Andy
>>> 
 
 Cheers,
 Reto
 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

2015-08-31 Thread aj...@virginia.edu

I apologize if I am being thick here, but I don't understand how one goes about 
checking the potential match without some kind of covering resource against 
which to do that, something with a full representation of the graph. Can you 
elaborate on how to check the validity of the match?

Thank you for taking the time to walk through this!

---
A. Soroka
The University of Virginia Library

On Aug 31, 2015, at 10:04 AM, Claude Warren  wrote:

> Step 3 is about removing the false positives from the bloom filter.  It does 
> not require an index, it requires checking the values to ensure match.

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

2015-08-31 Thread aj...@virginia.edu

I'm still a bit confused as to why you don't regard step 3 as being potentially 
very expensive. In order to verify a match, we will have to examine an "exact" 
index, and that (as Andy remarked) is likely to require traversal, or else we 
throw away all the space gains.

Is this technique a way to pay a lot of time for a lot of space savings? 
Perhaps it is appropriate for an alternative implementation for very large 
datasets?

---
A. Soroka
The University of Virginia Library

On Aug 31, 2015, at 6:48 AM, Claude Warren <cla...@xenei.com> wrote:

> to find find(G,S,*,*) with bloom filters and return an iterator you
> 
>   1. construct a bloom filter with G and S
>   2. scan the list of quads checking for matches.
>   3. for each result that matches verify that it has G and S (I have done 
> this with an extended iterator in Jena)
> 
> result is an iterator that returns all (G,S,*,*) quads.
> 
> similar tests can be performed for any pattern -- same code used.
> 
> Step 2 is the expensive one.  But the bloom filter check is so efficient
> that it becomes very difficult to perform search operations in less time
> than it takes to scan the list.
> 
> Claude
> 
> On Mon, Aug 31, 2015 at 11:01 AM, Andy Seaborne <a...@apache.org> wrote:
> 
>> On 29/08/15 14:55, Claude Warren wrote:
>> 
>>> Something I have been thinking about
>>> 
>>> you could replace  GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single
>>> bloomfilter implementation.  It means a 2 step process to find matches but
>>> it might be fast enough and reduce the overhead significantly.
>>> 
>>> I did an in-memory and a relational DB based version recently, but it was
>>> just a quick POC.
>>> 
>>> Claude
>>> 
>> 
>> So we're talking about in-memory, where the items are java classes.  A
>> quad is 2 slots java overhead + 4 slots for G, S, P, O pointers.  That's 48
>> bytes if the heap is >32G and 24 bytes otherwise (compressed pointers or 32
>> bit).
>> 
>> For storage, the key test is "contains" to maintain the "set of"
>> semantics.  Something to stop index traversal for each insert would be
>> great but it's still stored and 1 not up to 6 would be good. (Note that
>> most data is unique quads.)
>> 
>> The import retrieval operation is find(G,S,P,O) where any of those can be
>> a wildcard and return (ideally as a stream) all matching quads with a
>> prefix.  The multiple indexes exist to find based on prefix.
>> 
>> How would that work for, say find(G,S,*,*) with bloom filters and 1b
>> quads?  How does the code go from returning G,S,P1,O1 to the next G,S,P1,O2
>> without trying every value for the O slot?
>> 
>> For a hash map based hierarchical index G->S->P->O, it's O(1) to find the
>> start of the scan then datastructure iteration.  A hash-based is not
>> necessarily the best choice [*] but it's a baseline to discuss.
>> 
>> And in memory, will a bloom filter-based system be faster?  Because of
>> false-positives, isn't a definitively index still needed?  If one is kept,
>> not 6, there could be great space gains but every quad returned is a
>> top-to-bottom traversal of that index (which is now a not a range index).
>> 
>> The design should work for 1+ billion in-memory quads - that's the way the
>> world is going.
>> 
>> So each quad is reduced to a
>>> single bloom filter comprising 4 items (15-bytes).
>>> 
>> 
>>Andy
>> 
>> [*] even in memory, it might be worth allocating internal ids and working
>> in longs like a disk based system because it is more compact - naive
>> hashmaps take a lot of space to when storing small items like quads.
>> tradeoffs, tradeoffs, ...
>> 
>> 
>> 
>> 
>>> On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka <aj...@virginia.edu> wrote:
>>> 
>>> Hey, folks--
>>>> 
>>>> There hasn't been too much feedback on my proposal for a journaling
>>>> DatasetGraph:
>>>> 
>>>> https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
>>>> 
>>>> which was and is to be a step towards JENA-624: Develop a new in-memory
>>>> RDF Dataset implementation. So I'm moving on to look at the real problem:
>>>> an in-memory  DatasetGraph with high concurrency, for use with modern
>>>> hardware running many, many threads in large core memory.
>>>> 
>>>> I'm beginning to sketch out rough code, and I'd like to run some design
>>>> decisions past the list to get cri

Re: JENA-624: Develop a new in-memory RDF Dataset implementation

2015-08-29 Thread aj...@virginia.edu

Thanks for the feedback!

I can see how one Bloom filter could be used with an accompanying structure to 
replace one of the indexes, but I don't quite see how one could replace all of 
them-- can you elaborate? 

---
A. Soroka
The University of Virginia Library

On Aug 29, 2015, at 9:55 AM, Claude Warren cla...@xenei.com wrote:

 Something I have been thinking about
 
 you could replace  GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single
 bloomfilter implementation.  It means a 2 step process to find matches but
 it might be fast enough and reduce the overhead significantly.
 
 I did an in-memory and a relational DB based version recently, but it was
 just a quick POC.
 
 Claude
 
 On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka aj...@virginia.edu wrote:
 
 Hey, folks--
 
 There hasn't been too much feedback on my proposal for a journaling
 DatasetGraph:
 
 https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
 
 which was and is to be a step towards JENA-624: Develop a new in-memory
 RDF Dataset implementation. So I'm moving on to look at the real problem:
 an in-memory  DatasetGraph with high concurrency, for use with modern
 hardware running many, many threads in large core memory.
 
 I'm beginning to sketch out rough code, and I'd like to run some design
 decisions past the list to get criticism/advice/horrified warnings/whatever
 needs to be said.
 
 1) All-transactional action: i.e. no non-transactional operation. This is
 obviously a great thing for simplifying my work, but I hope it won't be out
 of line with the expected uses for this stuff.
 
 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I
 figure to play to the strength of in-core-memory operation: raw speed, but
 obviously this is going to cost space.
 
 3) At least for now, all commits succeed.
 
 4) The use of persistent datastructures to avoid complex and error-prone
 fine-grained locking regimes. I'm using http://pcollections.org/ for now,
 but I am in no way committed to it nor do I claim to have thoroughly vetted
 it. It's simple but enough to get started, and that's all I need to bring
 the real design questions into focus.
 
 5) Snapshot isolation. Transactions do not see commits that occur during
 their lifetime. Each works entirely from the state of the DatasetGraph at
 the start of its life.
 
 6) Only as many as one transaction per thread, for now. Transactions are
 not thread-safe. These are simplifying assumptions that could be relaxed
 later.
 
 My current design operates as follows:
 
 At the start of a transaction, a fresh in-transaction reference is taken
 atomically from the AtomicReference that points to the index block. As
 operations are performed in the transaction, that in-transaction reference
 is progressed (in the sense in which any persistent datastructure is
 progressed) while the operations are recorded. Upon an abort, the
 in-transaction reference and the record are just thrown away. Upon a
 commit, the in-transaction reference is thrown away and the operation
 record is re-run against the main reference (the one that is copied at the
 beginning of a transaction). That rerun happens inside an atomic update
 (hence the use of AtomicReference). This all should avoid the need for
 explicit locking in Jena and should confine any blocking against the
 indexes to the actual duration of a commit.
 
 What do you guys think?
 
 
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: JENA-624: Develop a new in-memory RDF Dataset implementation

2015-08-28 Thread aj...@virginia.edu

In fact, this is why I tried (for a first try) a design with only one 
transaction committing at a time, which amounts to SW in terms of 
serializability, I thought. But I am allowing multiple writers to assemble 
changes in multiple transactions at the same time, and I think that is what 
will prevent the use of swap-into-commit. Maybe this is a bad trade? Since 
JENA-624 contemplates very high concurrency, is it worth doing a MR+SW design 
at all? But MRMW seems very hard. {grin}

I had some ideas about structuring indexes in such a way as to allow for more 
fine-grained locking and using merge for actual MW, but as you point out, 
locking down to particular resources is not able to guarantee against conflicts 
between conceptual entities. I also had some nightmares trying to think about 
how to manage bnodes across multiple writers.

---
A. Soroka
The University of Virginia Library

On Aug 28, 2015, at 6:17 AM, Andy Seaborne a...@apache.org wrote:

 On 27/08/15 16:53, aj...@virginia.edu wrote:
 Andy-- Thanks, these comments are really helpful! I've replied
 in-line in a few places to clarify or answer questions, or ask some
 of my own. {grin}
 
 --- A. Soroka The University of Virginia Library
 
 
 If there are multiple writers, then (1) system aborts will always
 be possible (conflicting updates) and (2) locking on datastructres
 is necessary ... or timestamps and vector clocks or some such.
 
 Right, see below. Again, there are multiple writers, but they only
 see themselves, and only one committer. Only one committer at a
 time prevents conflicts, since there is no schema to violate, but it
 is a brutal way to deal with the problem. And the re-run scheme of
 operation means it will be a very real bottleneck.
 
 5) Snapshot isolation. Transactions do not see commits that occur
 during their lifetime. Each works entirely from the state of the
 DatasetGraph at the start of its life.
 But they see their own updates presumably?
 
 Right, that's exactly the purpose of taking off their own reference
 to the persistent datastructures at the start of the transaction.
 They evolve their datastructures independently.
 
 When used in a program, persistent datastructures diverge when two writes act 
 from the same base point.
 
 Transactions do more - they are serializing all operations so there is a 
 linear sequence of versions.  This is the problem you identify below.
 
 6) Only as many as one transaction per thread, for now.
 Transactions are not thread-safe. These are simplifying
 assumptions that could be relaxed later.
 
 TDB ended up there as well.  There is, internally, a transaction
 object but it's held in a ThreadLocal and fetched when needed.
 Otherwise a lot of interface need a transaction parameter and its
 hard to reuse other code that does pass it through.
 
 That's close to what I sketched out.
 
 I have taken a second take on transactions with TDB2.  This module
 is an independent transactions system, unlike TDB1 where it's
 TDB1-specific.
 https://github.com/afs/mantis/tree/master/dboe-transaction It needs
 documentation for use on its own but I have used in in another
 project to coordinate distributed transactions. (dboe = database
 operating environment)
 
 I need to study this more. Obviously, if I can take over some of your
 work, that would be ideal.
 
 My current design operates as follows: snipped
 Looks good.  I don't quite understand the need to record and rerun
 though - isn't the power of pcollections that there can be old and
 new roots to the datastructures and commit is swap to new one,
 abort is forget the new one.
 
 Yeah, but my worry (perhaps just my misunderstanding) is over
 transactions interacting badly in the presence of snapshot isolation.
 Let's say we did use the technique of atomic swap, and consider the
 following scenario:
 
 
 T=-1  The committed datastructures contain triples T.
 T=0   Transaction 1 begins, taking a reference to the datastructures
 T=1   Transaction 2 begins, taking its own reference to the datastructures
 T=3   Transaction 1 does some updates, adding some triples T_1 to its own 
 branch, resulting in T+T_1.
 T=4   Transaction 2 does some updates, adding some triples T_2 to its own 
 branch, resulting in T+T_2.
 T=5   Transaction 1 commits, so that the committed triples are now T + T_1.
 T=6   Transaction 2 commits, so that the committed triples are now T + T_2.
 
 
 We lost Transaction 1's T_1 triples. I think this technique actually
 requires _merge_ instead of swap, either merge-into-open-transactions
 (after a commit) which isn't snapshot isolation or merge-into-commit
 (instead of swap-into-commit). But there's plenty of chance that I'm
 just misunderstanding this whole thing. {grin} I have not designed a
 transaction system over persistent datastructures before, and I
 welcome correction. I also need to research more about persistent
 datastructures with merge capability.
 
 which is why 2+ writers needs locking or aborts.
 
 
 The common

Re: JENA-624: Develop a new in-memory RDF Dataset implementation

2015-08-28 Thread aj...@virginia.edu

Ah, okay, I see the problem more clearly now. Thanks!

It seems to me now that the best immediate road forward is to go to true MR+SW 
(a write lock for the dataset), since I take from your remarks that you think 
that would be valuable in itself. That would be straightforward. I have read a 
few papers that discuss doing MW by locking at the granularity of triple 
patterns or BGPs, but I have to admit that it will take more study before I am 
ready to implement something like that. {grin}

---
A. Soroka
The University of Virginia Library

On Aug 28, 2015, at 7:42 AM, Andy Seaborne a...@apache.org wrote:

 On 28/08/15 12:22, aj...@virginia.edu wrote:
 In fact, this is why I tried (for a first try) a design with only one
 transaction committing at a time, which amounts to SW in terms of
 serializability, I thought.
 
 No :-(
 
 But I am allowing multiple writers to
 assemble changes in multiple transactions at the same time, and I
 think that is what will prevent the use of swap-into-commit. Maybe
 this is a bad trade? Since JENA-624 contemplates very high
 concurrency, is it worth doing a MR+SW design at all? But MRMW seems
 very hard. {grin}
 
 I had some ideas about structuring indexes in such a way as to allow
 for more fine-grained locking and using merge for actual MW, but as
 you point out, locking down to particular resources is not able to
 guarantee against conflicts between conceptual entities. I also had
 some nightmares trying to think about how to manage bnodes across
 multiple writers.
 
 
 See my example for a counter example.
 
 It's not 2 commits at once to avoid, it is that W2 is reading a pre-W1 comit 
 view of the world.
 
 W1 starts and takes a start-of-transaction pointer to datastructures.
 
 W1 reads the account balance as 10
 
 W2 start ditto.
 
 W2 reads the account balance as 10
 
 W1 updates and commits
 The account balance visible to any new reader is 15
 
 W2 updates and commits
 The account balance visible to any new reader is 17
 
 but it should be 22. The +5 has been lost.
 
 Your scheme keeps the database datastructures safe, but at the data model 
 level, can cause inconsistency and loss of change.
 
 Either an application level resolution of changes or something like 2-phase 
 locking is needed and even then there are issues of non-repeatable reads and 
 phantoms reads.
 
 https://en.wikipedia.org/wiki/Isolation_%28database_systems%29
 
 It gets very nasty when aggregations (COUNT, SUM) happen.  You can get 
 answers that are not from any state of the data that ever existed.
 
   Andy

Re: JENA-624: Develop a new in-memory RDF Dataset implementation

2015-08-27 Thread aj...@virginia.edu

Andy-- Thanks, these comments are really helpful! I've replied in-line in a few 
places to clarify or answer questions, or ask some of my own. {grin}

---
A. Soroka
The University of Virginia Library

On Aug 27, 2015, at 5:35 AM, Andy Seaborne a...@apache.org wrote:

 1) All-transactional action: i.e. no non-transactional operation. This is 
 obviously a great thing for simplifying my work, but I hope it won't be out 
 of line with the expected uses for this stuff.
 You could add an auto-commit feature so that any update outside a transaction 
 has a transaction wrapper applied.  Feature.

I can and will.

 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I 
 figure to play to the strength of in-core-memory operation: raw speed, but 
 obviously this is going to cost space.
 There are choices :-) esp in memory are the datastructure kind can chnage on 
 the way down. e.g. have a hash map for GS-PO and keep the PO tightly packed 
 (a few for each S) and scan them.

Very true. Ideally, I would like to offer some knobs to users to choose their 
own balance between speed and space. I'll step back and consider a few more 
designs before going further with this six-way approach.

 Is that going to 6 pcollections datastructres or all held in one 
 datastructres (c.f. BerkeleyDB)?

Right now I am looking at a setup with six independent indexes addressed 
through a single class, but that was just the first thing that came to mind 
that seemed reasonable. I am not committed to that by any means. If I step away 
from the six-way, the question changes.

 3) At least for now, all commits succeed.
 4) The use of persistent datastructures to avoid complex and error-prone 
 fine-grained locking regimes. I'm using http://pcollections.org/ for now, 
 but I am in no way committed to it nor do I claim to have thoroughly vetted 
 it. It's simple but enough to get started, and that's all I need to bring 
 the real design questions into focus.
 Is a consequence that there is one truly active writer (and many readers)?

Something like that. If you look at the scheme of operation below, all writers 
are invisible to each other  and can write at will, but only one writer can 
commit at a time. That may very well not be enough concurrency, but it's just a 
starting place.

 If there are multiple writers, then (1) system aborts will always be possible 
 (conflicting updates) and (2) locking on datastructres is necessary ... or 
 timestamps and vector clocks or some such.

Right, see below. Again, there are multiple writers, but they only see 
themselves, and only one committer. Only one committer at a time prevents 
conflicts, since there is no schema to violate, but it is a brutal way to deal 
with the problem. And the re-run scheme of operation means it will be a very 
real bottleneck.

 5) Snapshot isolation. Transactions do not see commits that occur during 
 their lifetime. Each works entirely from the state of the DatasetGraph at 
 the start of its life.
 But they see their own updates presumably?

Right, that's exactly the purpose of taking off their own reference to the 
persistent datastructures at the start of the transaction. They evolve their 
datastructures independently.

 6) Only as many as one transaction per thread, for now. Transactions are not 
 thread-safe. These are simplifying assumptions that could be relaxed later.
 
 TDB ended up there as well.  There is, internally, a transaction object but 
 it's held in a ThreadLocal and fetched when needed.  Otherwise a lot of 
 interface need a transaction parameter and its hard to reuse other code 
 that does pass it through.

That's close to what I sketched out.

 I have taken a second take on transactions with TDB2.  This module is an 
 independent transactions system, unlike TDB1 where it's TDB1-specific.
 https://github.com/afs/mantis/tree/master/dboe-transaction
 It needs documentation for use on its own but I have used in in another 
 project to coordinate distributed transactions. (dboe = database operating 
 environment)

I need to study this more. Obviously, if I can take over some of your work, 
that would be ideal. 

 My current design operates as follows: snipped
 Looks good.  I don't quite understand the need to record and rerun though - 
 isn't the power of pcollections that there can be old and new roots to the 
 datastructures and commit is swap to new one, abort is forget the new one.

Yeah, but my worry (perhaps just my misunderstanding) is over transactions 
interacting badly in the presence of snapshot isolation. Let's say we did use 
the technique of atomic swap, and consider the following scenario:

T=-1  The committed datastructures contain triples T.
T=0   Transaction 1 begins, taking a reference to the datastructures
T=1   Transaction 2 begins, taking its own reference to the datastructures
T=3   Transaction 1 does some updates, adding some triples T_1 to its own 
branch, resulting in T+T_1.
T=4   Transaction 2 does some

Re: RDFConnection

2015-08-05 Thread aj...@virginia.edu

Just a thought on ergonomics: it might be nice to separate clear and
delete, so instead of RDFConnection::delete either clearing or deleting a
graph depending on whether it is the default graph, you have finer control and
can clear a non-default graph.

---
A. Soroka
The University of Virginia Library

On Aug 4, 2015, at 6:21 PM, Andy Seaborne a...@apache.org wrote:

There's a note in the interface

// Query
// Maybe more query forms: querySelect(Query)? select(Query)?

At the moment, the operations are the basic ones (the SPARQL protocols for
query, update and GSP). There's scope to add forms on top.

void execSelect(Query query, ConsumerQuerySolution action)

is one possibility.

Andy

On 04/08/15 16:14, aj...@virginia.edu wrote:
Is this a little bit like Sesame 4's new Repository helper type? Not totally
the same thing, but similar in that it's bringing a lot of convenience
together around the notion of dataset?

http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_and_transaction_handling

---
A. Soroka
The University of Virginia Library

On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote:

Stephen, all,

Recently on users@ there was a question about the s-* in java. That got me
thinking about an interface to pull together all SPARQL operations into one
application-facing place. We have jena-jdbc, and jena-client already -
this is my sketch take.

[1] RDFConnection

Currently, it's a sketch-for-discussion; it's a bit DatasetAccessor-like +
SPARQL query + SPARQL Update. And some whole-dataset-REST-ish operations
(that Fuseki happens to support). It's a chance to redo things a bit.

RDFConnection uses the existing SPARQL+RDF classes and abstractions in ARQ,
not strings, [*] rather than putting all app-visible clases in one package.

Adding an equivalent of DatabaseClient to represent one place would be good
- and add the admin operations, for Fuseki at least. Also, a streaming
load possibility.

Comments?
Specific use cases?

Andy

(multi-operation transactions ... later!)

[*] You can use strings as well - that's the way to get arbitrary
non-standard extensions through.

[1]
https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfconnection/RDFConnection.java

Re: Journaling DatasetGraph

2015-08-05 Thread aj...@virginia.edu

Thanks for the feedback Andy.

 2/
 Datasets that provide support for MW cases and don't provide transactions 
 seem rather unlikely so may be document what kind of DatasetGraph is being 
 supported by DatasetGraphWithRecord then just use the underlying lock..

Okay, that's certainly simpler! And it keeps my grubby fingers out of Lock. 
{grin} 

 3/
 There are two thing to protect in DatasetGraphWithRecord : the underlying 
 dataset and transaction log for supporting abort for writers only.  They can 
 have separate mechanisms.  Use the dataset lock for the DatasetGraph actions 
 and make the transaction undo log operations be safe by other means.

You mean an independent lock visible only inside DatasetGraphWithRecord?

 .. hmm ... the order of entries in the log may matter so true parallel MW 
 looks increasing hard to deal with anyway.  Document and not worry for now?

My fear has been that MW means

a) a log per write-transaction and connections from the transaction to a 
particular set of states for the indexes
b) with those forward states invisible outside the transaction
c) and all the nightmare fun of merging states!

---
A. Soroka
The University of Virginia Library

On Aug 4, 2015, at 4:32 PM, Andy Seaborne a...@apache.org wrote:

 On 03/08/15 17:13, aj...@virginia.edu wrote:
 I've made some emendations to (hopefully) fix this problem. In order to so 
 do, I added a method to Lock itself to report the quality of an instance, 
 simply as an enumeration. I had hoped to avoid touching any of the extant 
 code, but because Lock is a public type that can be instantiated by anyone, 
 I just can't see how to resolve this problem without some way for a Lock to 
 categorize itself independently of the type system's inheritance.
 
 Feedback welcome!
 
 A few things occur to me:
 
 1/
 The transaction log is for supporting abort for writers only.  Nothing needs 
 to be done in DatasetGraphWithRecord for readers. DatasetGraphWithLock does 
 what's needed.  So you don't even need to startRecording for a READ (and the 
 commit clear - _end always aborts is an interesting way to do it!).
 
 2/
 Datasets that provide support for MW cases and don't provide transactions 
 seem rather unlikely so may be document what kind of DatasetGraph is being 
 supported by DatasetGraphWithRecord then just use the underlying lock..
 
 It's not just a case of using ConcurrentHashMap, say, as likely there would 
 be multiple of them for different indexes and that would give weird 
 consistency issues as different parts get updated safely with respect to part 
 of the datastructure but it will be visibly different depending on what the 
 reader uses.  So I think MW will have additional coordination.
 
 3/
 
 There are two thing to protect in DatasetGraphWithRecord : the underlying 
 dataset and transaction log for supporting abort for writers only.  They can 
 have separate mechanisms.  Use the dataset lock for the DatasetGraph actions 
 and make the transaction undo log operations be safe by other means.
 
 .. hmm ... the order of entries in the log may matter so true parallel MW 
 looks increasing hard to deal with anyway.  Document and not worry for now?
 
   Andy
 
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote:
 
 The lock provided by the underlying dataset may matter.  DatasetGraphs 
 support critical sections.  DatasetGraphWithLock uses critical sections of 
 the underlying dataset.
 
 I gave an (hypothetical) example where the lock must be more restrictive 
 than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + 
 counting support to catch application errors).
 
 DatasetGraphWithRecord is relying on single-W for its own datastructures.
 
 Andy
 
 On 29/07/15 21:22, aj...@virginia.edu wrote:
 I'm not sure I understand this advice-- are you saying that because no 
 DatasetGraph can be assumed to support MR, there isn't any point in trying 
 to support MR at the level of DatasetGraphWithRecord? That would seem to 
 make my whole effort a bit pointless.
 
 Or are you saying that because, in practice, all DatasetGraphs _do_ 
 support MR, there's no need to enforce it at the level of 
 DatasetGraphWithRecord?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote:
 
 On 27/07/15 18:06, aj...@virginia.edu wrote:
 Is there some specific reason as to why you override the 
 DatasetGraphWithLock lock?
 Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
 inherits getLock() from DatasetGraphTrackActive, which just pulls the 
 lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW 
 Lock is in play. But maybe I am misunderstanding the interaction here? 
 (No surprise! {grin})
 
 
 A DatasetGraph provides whatever lock is suitable to meet the contract of 
 concurrency [1]
 
 Some implementations (there aren't any) may not even be able

Re: RDFConnection

2015-08-05 Thread aj...@virginia.edu

Ah, that makes my distinction pretty meaningless! This abstraction seems meant 
to rub out just such differences.

This does remind me of another potential nice small feature: a StreamTriple 
construct(Query query) method, maybe at first via 
QueryExecution::execConstructTriples. The AutoCloseable-ity of QueryExecution 
could pass through Stream's QueryExecution AutoCloseable-ity. With clever 
implementation eventually, some of the methods on Stream (e.g. filter) could 
get passed through to SPARQL execution.

---
A. Soroka
The University of Virginia Library

On Aug 5, 2015, at 9:37 AM, Rob Vesse rve...@dotnetrdf.org wrote:

 The main complicating factor is that clear and delete are only separate
 operations if the storage layer stores graph names separately from graph
 data which the SPARQL specification specifically do not require
 
 For storage systems like TDB where only quads are stored the existence of
 a named graph is predicated by the existence of some quads in that graph
 and so delete is equivalent to clear because if you remove all quads for a
 graph TDB doesn't know about that graph any more
 
 The SPARQL specifications actually explicitly call this complication out
 in several places (search for empty graphs in the SPARQL 1.1 update spec)
 and various SPARQL Updates behaviours may differ depending on whether the
 storage layer records the presence of empty graphs or not
 
 Rob
 
 On 05/08/2015 13:44, aj...@virginia.edu aj...@virginia.edu wrote:
 
 Just a thought on ergonomics: it might be nice to separate clear and
 delete, so instead of RDFConnection::delete either clearing or deleting
 a graph depending on whether it is the default graph, you have finer
 control and can clear a non-default graph.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Aug 4, 2015, at 6:21 PM, Andy Seaborne a...@apache.org wrote:
 
 There's a note in the interface
 
   //  Query
   // Maybe more query forms: querySelect(Query)? select(Query)?
 
 At the moment, the operations are the basic ones (the SPARQL protocols
 for query, update and GSP).  There's scope to add forms on top.
 
 void execSelect(Query query, ConsumerQuerySolution action)
 
 is one possibility.
 
 Andy
 
 On 04/08/15 16:14, aj...@virginia.edu wrote:
 Is this a little bit like Sesame 4's new Repository helper type? Not
 totally the same thing, but similar in that it's bringing a lot of
 convenience together around the notion of dataset?
 
 
 http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_an
 d_transaction_handling
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote:
 
 Stephen, all,
 
 Recently on users@ there was a question about the s-* in java. That
 got me thinking about an interface to pull together all SPARQL
 operations into one application-facing place.  We have jena-jdbc, and
 jena-client already - this is my sketch take.
 
 [1] RDFConnection
 
 Currently, it's a sketch-for-discussion; it's a bit
 DatasetAccessor-like + SPARQL query + SPARQL Update.  And some
 whole-dataset-REST-ish operations (that Fuseki happens to support).
 It's a chance to redo things a bit.
 
 RDFConnection uses the existing SPARQL+RDF classes and abstractions
 in ARQ, not strings, [*]  rather than putting all app-visible clases
 in one package.
 
 Adding an equivalent of DatabaseClient to represent one place would
 be good - and add the admin operations, for Fuseki at least.  Also, a
 streaming load possibility.
 
 Comments?
 Specific use cases?
 
   Andy
 
 (multi-operation transactions ... later!)
 
 [*] You can use strings as well - that's the way to get arbitrary
 non-standard extensions through.
 
 [1] 
 https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfco
 nnection/RDFConnection.java

Re: RDFConnection

2015-08-04 Thread aj...@virginia.edu

Is this a little bit like Sesame 4's new Repository helper type? Not totally
the same thing, but similar in that it's bringing a lot of convenience together
around the notion of dataset?

http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_and_transaction_handling

---
A. Soroka
The University of Virginia Library

On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote:

Stephen, all,

[1] RDFConnection

RDFConnection uses the existing SPARQL+RDF classes and abstractions in ARQ,
not strings, [*] rather than putting all app-visible clases in one package.

Adding an equivalent of DatabaseClient to represent one place would be good -
and add the admin operations, for Fuseki at least. Also, a streaming load
possibility.

Comments?
Specific use cases?

Andy

(multi-operation transactions ... later!)

[*] You can use strings as well - that's the way to get arbitrary
non-standard extensions through.

[1]
https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfconnection/RDFConnection.java

Re: Journaling DatasetGraph

2015-08-03 Thread aj...@virginia.edu

I've made some emendations to (hopefully) fix this problem. In order to so do, 
I added a method to Lock itself to report the quality of an instance, simply as 
an enumeration. I had hoped to avoid touching any of the extant code, but 
because Lock is a public type that can be instantiated by anyone, I just can't 
see how to resolve this problem without some way for a Lock to categorize 
itself independently of the type system's inheritance.

Feedback welcome!

---
A. Soroka
The University of Virginia Library

On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote:

 The lock provided by the underlying dataset may matter.  DatasetGraphs 
 support critical sections.  DatasetGraphWithLock uses critical sections of 
 the underlying dataset.
 
 I gave an (hypothetical) example where the lock must be more restrictive than 
 ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting 
 support to catch application errors).
 
 DatasetGraphWithRecord is relying on single-W for its own datastructures.
 
   Andy
 
 On 29/07/15 21:22, aj...@virginia.edu wrote:
 I'm not sure I understand this advice-- are you saying that because no 
 DatasetGraph can be assumed to support MR, there isn't any point in trying 
 to support MR at the level of DatasetGraphWithRecord? That would seem to 
 make my whole effort a bit pointless.
 
 Or are you saying that because, in practice, all DatasetGraphs _do_ support 
 MR, there's no need to enforce it at the level of DatasetGraphWithRecord?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote:
 
 On 27/07/15 18:06, aj...@virginia.edu wrote:
 Is there some specific reason as to why you override the 
 DatasetGraphWithLock lock?
 Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
 inherits getLock() from DatasetGraphTrackActive, which just pulls the lock 
 from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is 
 in play. But maybe I am misunderstanding the interaction here? (No 
 surprise! {grin})
 
 
 A DatasetGraph provides whatever lock is suitable to meet the contract of 
 concurrency [1]
 
 Some implementations (there aren't any) may not even be able to support 
 true parallel readers (for example, datastructures that they may make 
 internal changes even in read operations like moving recently accessed 
 items to the top or caching computation needed for read).
 
 There aren't any (the rules are R-safe) - locks are always LockMRSW.
 
 [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
 
 Andy

Re: Journaling DatasetGraph

2015-07-30 Thread aj...@virginia.edu

I think I understand the problem now. Assuming I do, I see two cases:

1) The underlying dataset has locking that is _more_ restrictive that MRSW, in 
which case DatasetGraphWithRecord must expose that locking, lest it break the 
underlying impl.

2) The underlying dataset has locking that is _less_ restrictive that MRSW, in 
which case DatasetGraphWithRecord must eclipse that locking, lest it break 
DatasetGraphWithRecord's impl.

So my task is to adopt some careful meaning for more and less as used above 
and use it to make DatasetGraphWithRecord's locking more intelligent. I do not 
see anything in Jena that would answer to the purpose, but maybe I am missing 
something. {fingers-crossed}

---
A. Soroka
The University of Virginia Library

On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote:

 The lock provided by the underlying dataset may matter.  DatasetGraphs 
 support critical sections.  DatasetGraphWithLock uses critical sections of 
 the underlying dataset.
 
 I gave an (hypothetical) example where the lock must be more restrictive than 
 ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting 
 support to catch application errors).
 
 DatasetGraphWithRecord is relying on single-W for its own datastructures.
 
   Andy
 
 On 29/07/15 21:22, aj...@virginia.edu wrote:
 I'm not sure I understand this advice-- are you saying that because no 
 DatasetGraph can be assumed to support MR, there isn't any point in trying 
 to support MR at the level of DatasetGraphWithRecord? That would seem to 
 make my whole effort a bit pointless.
 
 Or are you saying that because, in practice, all DatasetGraphs _do_ support 
 MR, there's no need to enforce it at the level of DatasetGraphWithRecord?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote:
 
 On 27/07/15 18:06, aj...@virginia.edu wrote:
 Is there some specific reason as to why you override the 
 DatasetGraphWithLock lock?
 Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
 inherits getLock() from DatasetGraphTrackActive, which just pulls the lock 
 from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is 
 in play. But maybe I am misunderstanding the interaction here? (No 
 surprise! {grin})
 
 
 A DatasetGraph provides whatever lock is suitable to meet the contract of 
 concurrency [1]
 
 Some implementations (there aren't any) may not even be able to support 
 true parallel readers (for example, datastructures that they may make 
 internal changes even in read operations like moving recently accessed 
 items to the top or caching computation needed for read).
 
 There aren't any (the rules are R-safe) - locks are always LockMRSW.
 
 [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
 
 Andy

Re: Journaling DatasetGraph

2015-07-29 Thread aj...@virginia.edu

I'm not sure I understand this advice-- are you saying that because no 
DatasetGraph can be assumed to support MR, there isn't any point in trying to 
support MR at the level of DatasetGraphWithRecord? That would seem to make my 
whole effort a bit pointless.

Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, 
there's no need to enforce it at the level of DatasetGraphWithRecord?

---
A. Soroka
The University of Virginia Library

On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote:

 On 27/07/15 18:06, aj...@virginia.edu wrote:
 Is there some specific reason as to why you override the 
 DatasetGraphWithLock lock?
 Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
 inherits getLock() from DatasetGraphTrackActive, which just pulls the lock 
 from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in 
 play. But maybe I am misunderstanding the interaction here? (No surprise! 
 {grin})
 
 
 A DatasetGraph provides whatever lock is suitable to meet the contract of 
 concurrency [1]
 
 Some implementations (there aren't any) may not even be able to support true 
 parallel readers (for example, datastructures that they may make internal 
 changes even in read operations like moving recently accessed items to the 
 top or caching computation needed for read).
 
 There aren't any (the rules are R-safe) - locks are always LockMRSW.
 
 [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
 
   Andy

Re: Journaling DatasetGraph

2015-07-27 Thread aj...@virginia.edu

Thanks for the feedback, Andy! See comment in-line below.

---
A. Soroka
The University of Virginia Library

On Jul 25, 2015, at 7:43 AM, Andy Seaborne a...@apache.org wrote:

 A first look - there's quite a lot to do with the release at the moment.

Right, I don't expect anyone to get around to much consideration of this until 
that is over. Good luck!

 Having a separate set of functionality to the underlying DatasetGraph is good 
 for the MRSW case and with that composition on multiple datasets, text 
 indexes etc etc. For the MR+SW, I think the more connected nature of 
 transactions and implementation might make it harder to have independent 
 functionality but we'll see.

I agree. That's why I did this as a wrap-around. I don't think MR+SW _can_ be 
done that way, but we'll see…

 Yes - addGraph ought to be a copy.  The general dataset where the app can put 
 together a collection of different graph types is the exception but needed 
 for the case of some graphs being inference, maybe some not.

As I wrote, I believe that my current code does this solidly and the test shows 
it, but I'm not sure that the impl is as efficient as possible. Suggestions 
welcome!

 One of the things that strikes me is that extending Quad to be a 
 QuadOperation breaks being a Quad.  It adds functionality a quad does not 
 have.  Two quads are equal if they have the same G/S/P/O and that's not true 
 for QuadOperation.
 An operation is a pair - the action and the data - not data.

I'm not sure I understand the objection here: all classes inherit from Object 
and virtually all of them add functionality Object does not have and break its 
equality definition. I certainly understand the view on operations you're 
taking, but I'm proposing a different one that includes data, action (in my 
code, that comes in the form of type, not an enumeration, so that I can replace 
cases in your code with polymorphism) _and_ service type. Adding a quad to a 
special index might be substantially different than adding it to a dataset.

 e.g. Putting a QuadOperation into a DatasetGraph would cause problems.

Because of the equality question? I _think_ I understand this objection; are 
you saying that logic for things like DatasetGraph::contains becomes 
problematic? To my mind it implies a more sophisticated type of comparison 
(using equivalence and not equals()) instead of a different kind of data 
structure. I'll try to make some corrections to show what I mean and give you 
something to react to. I may be wrong here, but I'd like to follow out the idea.

 ListBackedOperationRecordOpType extends ReversibleOperationRecordOpType
 
 public class ListBackedOperationRecordOpType extends InvertibleOperation?, 
 ?, ?, ?
   implements ReversibleOperationRecordOpType {
 
 while, yes, a collection of operations could be an operation datasets don't 
 provide such composite operations so the abstraction is not used.  And the 
 reverse of it would be recursive - each operation needs reversing.

I am _not_ making the claim here that a collection of operations could be an 
operation. A record (in my code) is just a record. It is _not_ usable as an 
aggregate operation and doesn't subtype Operation. There is no use of records 
as operations nor any intended such use, so no problem. 

 I'd keep log (= list of operations) as a separate concept from the operations 
 themselves.  One key operation of a ListBackedOperationRecord is clear and 
 Operations are Or this is a naming thing, is record the log entry or the 
 log itself?

Something seems to have been eaten out of your mail (!) but anyway, a record 
_is_ a separate concept from operation. There is ReversibleOperationRecord and 
there is Operation and the only relationship between them is that Operation is 
a parameter type for ReversibleOperationRecord::add and part of the parameter 
type for ReversibleOperationRecord::consume. As far as names, I'm not sure what 
you mean-- ReversibleOperationRecord the type? That's a log. It contains 
Operations, but _is not one itself_. 

 Is there some specific reason as to why you override the DatasetGraphWithLock 
 lock?

Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from 
the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. 
But maybe I am misunderstanding the interaction here? (No surprise! {grin})

 One difference is the notion of reversing an operation is not a feature of 
 the operation itself, it's the way it is played back.  Partially, this is 
 efficiency (which may not matter) as it reduces the object churn but also it 
 puts undo-playback in one place (e.g. reading and writing from storage, which 
 might be non-heap memory, or a compacted form (or even a disk) for where 
 large+long transactions even on in-memory lead to excessive object use.  Just 
 an idea.

Yeah, I intentionally separated the two (reverse an

Re: Journaling DatasetGraph

2015-07-27 Thread aj...@virginia.edu

 One of the things that strikes me is that extending Quad to be a 
 QuadOperation breaks being a Quad.  It adds functionality a quad does not 
 have.  Two quads are equal if they have the same G/S/P/O and that's not true 
 for QuadOperation.
 An operation is a pair - the action and the data - not data. e.g. Putting a 
 QuadOperation into a DatasetGraph would cause problems.

Andy-- I've thought harder about this and I've realized that whether or not I 
can make a navel-gazing argument about correctness, the typing is obviously 
confusing and that's damnation enough. I'll fix this to stop extending Quad.

---
A. Soroka
The University of Virginia Library

On Jul 25, 2015, at 7:43 AM, Andy Seaborne a...@apache.org wrote:

 On 23/07/15 14:18, aj...@virginia.edu wrote:
 After a longish conversation with Andy Seaborne, I've worked up a simple 
 journaling DatasetGraph wrapping implementation. The idea is to use 
 journaling to support proper aborting behavior (which I believe this code 
 does) and to add to that a semantic for DatasetGraph::addGraph that copies 
 tuples instead of leaving a reference to the added Graph (which I believe 
 this code also does). Between these two behaviors, the idea is to be able to 
 support transactionality (MRSW only) reasonably well.
 
 The idea is (if this code looks like a reasonable direction) to move onwards 
 to an implementation that uses persistent data structures for covering 
 indexes in order to get at least to MR+SW and eventually to attack JENA-624: 
 Develop a new in-memory RDF Dataset implementation.
 
 Feedback / advice / criticism greedily desired and welcome!
 
 https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
 
 https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 Hi there,
 
 A first look - there's quite a lot to do with the release at the moment.
 
 Having a separate set of functionality to the underlying DatasetGraph is good 
 for the MRSW case and with that composition on multiple datasets, text 
 indexes etc etc.
 
 For the MR+SW, I think the more connected nature of transactions and 
 implementation might make it harder to have independent functionality but 
 we'll see.
 
 https://github.com/afs/mantis/tree/master/dboe-transaction
 is a take on a trasnaction mechanism.  I'm using it at the moment so I'm 
 finding otu what works ... and what does not.
 
 
 Yes - addGraph ought to be a copy.  The general dataset where the app can put 
 together a collection of different graph types is the exception but needed 
 for the case of some graphs being inference, maybe some not.
 
 
 One of the things that strikes me is that extending Quad to be a 
 QuadOperation breaks being a Quad.  It adds functionality a quad does not 
 have.  Two quads are equal if they have the same G/S/P/O and that's not true 
 for QuadOperation.
 
 An operation is a pair - the action and the data - not data.
 
 e.g. Putting a QuadOperation into a DatasetGraph would cause problems.
 
 
 ListBackedOperationRecordOpType extends ReversibleOperationRecordOpType
 
 [[
 public class ListBackedOperationRecordOpType extends InvertibleOperation?, 
 ?, ?, ?
   implements ReversibleOperationRecordOpType {
 ]]
 
 
 while, yes, a collection of operations could be an operation, datasets don't 
 provide such composite operations so the abstraction is not used.  And the 
 reverse of it would be recursive - each operation needs reversing.
 
 I'd keep log (= list of operations) as a separate concept from the operations 
 themselves.  One key operation of a ListBackedOperationRecord is clear and 
 Operations are
 
 Or this is a naming thing, is record the log entry or the log itself?
 
 
 Is there some specific reason as to why you override the DatasetGraphWithLock 
 lock?
 
 
 My take on this is:
 
 https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg
 
 One difference is the notion of reversing an operation is not a feature of 
 the operation itself, it's the way it is played back.  Partially, this is 
 efficiency (which may not matter) as it reduces the object churn but also it 
 puts undo-playback in one place (e.g. reading and writing from storage, which 
 might be non-heap memory, or a compacted form (or even a disk) for where 
 large+long transactions even on in-memory lead to excessive object use.  Just 
 an idea.
 
   Andy

Re: Iter vs. ExtendedIterator

2015-07-27 Thread aj...@virginia.edu

Since I'm trying to get to an understanding from which I can write a PR with 
some new Javadocs for these type, let me try out the following:

Iter should never be used for a return type or parameter type in the public 
contract of a class. It is only to be used inside implementation code and it 
can be instantiated only to allow method-chaining as part of a calculation.

ExtendedIterator should only be used as a return type or parameter type in the 
public contract of a class when that is specifically required by a type being 
implemented.

Do those remarks capture facts about the two types?

---
A. Soroka
The University of Virginia Library

On Jul 21, 2015, at 3:36 PM, Andy Seaborne a...@apache.org wrote:

 On 21/07/15 15:38, A. Soroka wrote:
 A question came up for me, as a Jena newbie, in the course of  JENA-966: 
 LazyIterator.
 
 The type ExtendedIterator in jena-core is used widely through jena-core. It 
 features several convenient methods for use with iteration, like mapping 
 through functions, filtering, and concatenation.
 
 The type Iter in jena-base is used widely through jena-base and jena-arq. It 
 features many convenient methods for use with iteration, like everything 
 ExtendedIterator does plus much more, (e.g. folding, selecting, reducing…).
 
 What is the difference in use for these two types? Why are they distinct? Is 
 there some means by which it can be made clear when to use each and why? I 
 would be happy to write a simple class Javadoc for Iter (which currently has 
 none at all) to let folks know when to use it, if someone will explain that 
 to me.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 Iter is used in SDB and TDB as well where there are lots of iterators for all 
 sort so things.
 
 ExtendedIterator only works with ExtendedIterator.  Not everything generates 
 ExtendedIterators.
 
 Iter is for working with java.util.Iterator; it is a different style where 
 the statics are more important than the class methods.  It does allow 
 chaining but generally I don't think that style is very common in the code 
 base.
 
   Andy

Journaling DatasetGraph

2015-07-23 Thread aj...@virginia.edu

After a longish conversation with Andy Seaborne, I've worked up a simple 
journaling DatasetGraph wrapping implementation. The idea is to use journaling 
to support proper aborting behavior (which I believe this code does) and to add 
to that a semantic for DatasetGraph::addGraph that copies tuples instead of 
leaving a reference to the added Graph (which I believe this code also does). 
Between these two behaviors, the idea is to be able to support transactionality 
(MRSW only) reasonably well.

The idea is (if this code looks like a reasonable direction) to move onwards to 
an implementation that uses persistent data structures for covering indexes in 
order to get at least to MR+SW and eventually to attack JENA-624: Develop a 
new in-memory RDF Dataset implementation.

Feedback / advice / criticism greedily desired and welcome!

https://github.com/ajs6f/jena/tree/JournalingDatasetgraph

https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph

---
A. Soroka
The University of Virginia Library

Re: Iter vs. ExtendedIterator

2015-07-21 Thread aj...@virginia.edu

Okay, so if I were writing some new code in a Jena module, and I needed to do 
some of the tasks for which these guys have facilities (e.g. filtering), how 
should I select a type to use? Should I only use ExtendedIterator's methods if 
the thing I already have in hand is an ExtendedIterator? Put another way, is it 
ever appropriate to create an ExtendedIterator in a situation in which I am not 
beholden to so do by interface requirements?

Thanks for helping me get some understanding on this.

---
A. Soroka
The University of Virginia Library

On Jul 21, 2015, at 3:36 PM, Andy Seaborne a...@apache.org wrote:
 
 Iter is used in SDB and TDB as well where there are lots of iterators for all 
 sort so things.
 
 ExtendedIterator only works with ExtendedIterator.  Not everything generates 
 ExtendedIterators.
 
 Iter is for working with java.util.Iterator; it is a different style where 
 the statics are more important than the class methods.  It does allow 
 chaining but generally I don't think that style is very common in the code 
 base.
 
   Andy

Re: Fuskei and ETags

2015-06-29 Thread aj...@virginia.edu

On Jun 29, 2015, at 9:33 AM, Claude Warren cla...@xenei.com wrote:
 If there were an ETag per dataset and a method on the dataset to force an 
 ETag reset would this address the index issue in that the indexer could reset 
 the ETag when it deemed appropriate?

It might-- for that indexer. I would be concerned about setups in which another 
process acted against the data out of sight of Fuseki. But would the ETag be 
on ARQ's Dataset itself? If I understand what's going on here correctly 
(debatable at best), Dataset should not have any HTTP concerns mixed into it. 
ETag would be on something closer to Fuseki's DataService, which I do not think 
would normally be accessible to an indexer which is only aware of what's on 
disk… but this is all from my understanding of the architecture, which is 
pretty minimal. {grin} Maybe some kind of last changed timestamp could 
reasonably go on Dataset to support this kind of function?

 In any case I would go with the first choice.

It definitely seems like the most bang for the least buck.

 Is there anything that prohibits sending both an ETag and a constant expires? 
  I havn't looked but I recall they are not mutually exclusive.

Yes, I think you are correct. I suppose a bad ETag will never be known to be 
such as long as it is inside the range of a still-good Expires, but that is a 
question for the administrator configuring Fuseki, it seems to me. There is 
also Cache-Control, of course, in the same field of functionality.

---
A. Soroka
The University of Virginia Library

Re: Fuskei and ETags

2015-06-29 Thread aj...@virginia.edu

I can only speak for the use cases I actually know about. ETags would get used, 
because the most important web app in my concern that is potentially a client 
to Fuseki would be able to use them. But that is just one case.

JENA-626 would be great in any regard. 

---
A. Soroka
The University of Virginia Library

On Jun 29, 2015, at 12:20 PM, Andy Seaborne a...@apache.org wrote:

 There is no case of external modification of the database which Fuseki is 
 running.  A disaster will occur otherwise.  [Modifying externally while 
 running requires a different approach (e.g. switching between two copies of 
 the database ... maybe ... so many ways to corrupt a database ... ).]
 
 
 E-tags is a quite technical solution - will any system actually use it for 
 real even if it is the right solution?  We wouldn't want to find out that 
 etags support does not get used.  For the SPARQL Protocols case (with query 
 stings), it might not really get used.  Has caching of requests including 
 query string rolled out to any degree? (a point from discussion in JENA-388).
 
 If query string currently cause no caching by intermediaries in practice, 
 will clients cache which is the case of one client reissuing the same query? 
 Possible but is it likely?
 
 See also JENA-626 SPARQL Query Caching.  That would make a difference - 
 different client apps starting up often ask the same query to get started.
 
   Andy
 
 On 29/06/15 16:03, Claude Warren wrote:
 I am not familure with how the indexing interplays with the rest of the
 Jena system.  My assumption is, like you, that we only want the ETag in the
 Fuseki layer.  However, to generate an ETag it seems like Fuseki will need
 to be able to ask the underlying dataset when the last change occured, but
 then you also want to know if indexing has changed so that results my be
 changed as well.
 
 If we consider ETag generation separate from the Dataset then the ETag
 generator could register as a listener to the dataset and react whenever a
 change occurs to the model.This doesn't solve the problem of responding
 to index updates.  However, whatever interface the listener uses to trigger
 an ETag change could just as well be done by an indexer.  Is there an
 indexer listener interface (ala Model/Graph listeners)?  In this solution
 the ETag gets input from any registered component.  I think that each
 registered component should have a name and a value.  The ETag
 generator would retain the most recent value for each registered component
 and generate a new ETag when a value changes.  So I see a class with 2
 methods
 
 void ETagGenerator.change( String name, String value )
 and
 String ETagGenerator.getTag(); // to retrieve the current tag.
 
 Claude
 
 
 
 On Mon, Jun 29, 2015 at 2:50 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 On Jun 29, 2015, at 9:33 AM, Claude Warren cla...@xenei.com wrote:
 If there were an ETag per dataset and a method on the dataset to force
 an ETag reset would this address the index issue in that the indexer could
 reset the ETag when it deemed appropriate?
 
 It might-- for that indexer. I would be concerned about setups in which
 another process acted against the data out of sight of Fuseki. But would
 the ETag be on ARQ's Dataset itself? If I understand what's going on here
 correctly (debatable at best), Dataset should not have any HTTP concerns
 mixed into it. ETag would be on something closer to Fuseki's DataService,
 which I do not think would normally be accessible to an indexer which is
 only aware of what's on disk… but this is all from my understanding of the
 architecture, which is pretty minimal. {grin} Maybe some kind of last
 changed timestamp could reasonably go on Dataset to support this kind of
 function?
 
 In any case I would go with the first choice.
 
 It definitely seems like the most bang for the least buck.
 
 Is there anything that prohibits sending both an ETag and a constant
 expires?  I havn't looked but I recall they are not mutually exclusive.
 
 Yes, I think you are correct. I suppose a bad ETag will never be known to
 be such as long as it is inside the range of a still-good Expires, but
 that is a question for the administrator configuring Fuseki, it seems to
 me. There is also Cache-Control, of course, in the same field of
 functionality.
 
 ---
 A. Soroka
 The University of Virginia Library

Re: CMS diff: Reviewing Contributions

2015-06-29 Thread aj...@virginia.edu

Good point. (Speaking as someone who regularly has to be corrected about this 
{grin}.)

---
A. Soroka
The University of Virginia Library

On Jun 29, 2015, at 12:57 PM, Andy Seaborne a...@apache.org wrote:

 Good comments - I've made some revisions to the page based on this input.
 
 It reminded me to add a request for pull requests to have commits focused on 
 the pull requests/contribution functionality, not details of how the code has 
 evolved up to that point (i.e the internal history). Different audiences.
 
   Andy
 
 On 26/06/15 16:12, A. Soroka wrote:
 Clone URL (Committers only):
 https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/getting_involved%2Freviewing_contributions.mdtext
 
 A. Soroka
 
 Index: trunk/content/getting_involved/reviewing_contributions.mdtext
 ===
 --- trunk/content/getting_involved/reviewing_contributions.mdtext
 (revision 1655891)
 +++ trunk/content/getting_involved/reviewing_contributions.mdtext
 (working copy)
 @@ -29,6 +29,19 @@
 
  @author tags will not prevent a contribution being accepted but **should** 
 be removed by the committer who integrates the contribution.
 
 +## Code style
 +
 +Jena does not have a particular formal code style specification at this 
 time, but here are some simple tips for keeping your contribution in good 
 order:
 +
 +- Don't create a method signature that throws checked exceptions that 
 aren't ever actually thrown from the code in that method unless an API 
 supertype specifies that signature. Otherwise, clients of your code will 
 have to include unnecessary handling code.
 +- Don't leave unused imports in your code. Any IDE can solve that problem 
 with one keystroke. :)
 +- If a type declares a supertype that isn't a required declaration, 
 consider whether that clarifies or confuses the intent. The former is okay, 
 the latter not so good.
 +- Minimize the new compiler warnings your patch creates. If you use 
 @SuppressWarnings to hide them, please add a comment explaining the 
 situation or a TODO with a potential future fix that would allow removing 
 the suppression.
 +- Remove unused local variables or fields or uninteresting unused private 
 methods. If it's debugging detritus, consider replacing it with good logging 
 code for future use, if that seems likely to become useful.
 +- If there is valuable code in some unused private method, add a 
 @SuppressWarnings(unused) with an explanation of when it might become 
 useful. If there is valuable but unused code inside a used method, consider 
 breaking it out into a private method and adding a 
 @SuppressWarnings(unused) and an explanation.
 +
 +
 +
  ## Contribution to Apache
 
  The Apache License states that any contribution to an Apache project is 
 automatically considered to be contributed to the Apache foundation and thus 
 liable for inclusion in an Apache project **unless** the contributor 
 explicitly states otherwise.

Re: [jira] [Commented] (JENA-966) LazyIterator

2015-06-24 Thread aj...@virginia.edu

Right, I updated my comment right after I made it, when I noticed the 
difference.

I shouldn't think it matters which one to keep. LazyIterator is a little 
shorter to write. :)

There are a number of other Iterators (noted in the comments to that ticket) 
that seem to be depreciate-able. E.g. SingletonIterator has equivalent Guava 
functionality, and UniqueExtendedIterator has its own comments suggesting that 
it be deprecated (New development should use use codeUniqueFilter/code…). 
As I said in an earlier message, I will issue a reworked PR #79 with those 
suggestions, but I will not touch the lazy iterators.

---
A. Soroka
The University of Virginia Library

On Jun 24, 2015, at 11:32 AM, Claude Warren cla...@xenei.com wrote:

 Yes.  LazyIterator implements ExtendedIterator.  LateBindingIterator
 implements Iterator
 
 My plan -- probably won't execute until tomorrow night -- is to complete
 the implementation of LazyIterator for both (2.13.1 and 3.0.0) and then
 deprecate LateBinding in favor of Lazy as ExtendedIterator implements
 Iterator.
 
 Though I could very easily be swayed to alter LateBindingIterator to
 implement ExtendedIterator and deprecate LazyIterator.
 
 Claude
 
 On Wed, Jun 24, 2015 at 4:13 PM, A. Soroka (JIRA) j...@apache.org wrote:
 
 
[
 https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599547#comment-14599547
 ]
 
 A. Soroka commented on JENA-966:
 
 
 Is there any difference between {{LateBindingIterator}} and
 {{LazyIterator}}?
 
 LazyIterator
 
 
Key: JENA-966
URL: https://issues.apache.org/jira/browse/JENA-966
Project: Apache Jena
 Issue Type: Bug
 Components: Core
   Affects Versions: Jena 3.0.0
   Reporter: Claude Warren
   Assignee: Claude Warren
 
 LazyIterator is an abstract class.  The documentation indicates that the
 create() method needs to be overridden to create an instance.  From this I
 would expect that
 now LazyIterator(){
 @Override
 public ExtendedIteratorModel create() {
  ...
 }};
 Would work however LazyIterator does not override:
 remoteNext(), andThen(), toList(), and toSet().
 I believe these should be implemented in the class.
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: [jira] [Comment Edited] (JENA-966) LazyIterator

2015-06-22 Thread aj...@virginia.edu

The Stream API is definitely significantly different from the Function API. 
Maybe you mean that Stream is significantly different from Iterator (which is 
surely is)?

Everything you say about deprecation seems very fair to me, and that's what 
happened to a number of other types (e.g. in jena-core). I had a PR in to 
actually remove these guys under discussion. (Mentioned in the comments to this 
ticket.) I will rework that to only deprecate them and add advice on how to use 
the Java 8 idioms instead, and we can take that into your proposed other ticket.

As far as Java 8 adoption, I appreciate the difficulties there. I will step out 
of the way and let others discuss that, because I am lucky to be in a position 
where the issue is not too urgent.

---
A. Soroka
The University of Virginia Library

On Jun 22, 2015, at 4:23 PM, Claude Warren cla...@xenei.com wrote:

 I must have misunderstood a post from Andy then.  My error.  I thought
 there was  a comment from Andy that indicated that Supplier was part of
 Stream and that Stream was significantly different from Function.   As I
 said, my error and I am happy to drop that point.
 
 As for the code base, for any publicly accessible surfaces we have to
 consider that they may be used outside of our base in products built upon
 Jena.  The the class was exposed  then removing it should not be cut and
 done.  (I learned this lesson the hard way).  Naturally this does not apply
 to internal code where the interface remains the same and the
 implementation changes.
 
 Thus my suggestion to fill out the class.
 
 I also proposed that we open an Epic to discuss how to move toward the
 Function approach you propose.  I think that we will need a two prong
 approach.  Retain the current interfaces that have been publicly available
 while marking them as deprecated and pointing to the Function approach as
 the replacement.
 
 I would think that we could mark as deprecated and indicate that they will
 be removed in 3.1.0 (or some such).  Perhaps we should discuss how long to
 keep deprecated bits around before removing them.  I  think that 3.x.y
 deprecates something then 3.x+1.0 should be the earliest it should be
 removed.
 
 I also have concerns about the full on Java 8 adoption path we are on as
 there are cases where Java 8 is not available.  Working for IBM I can tell
 you that we still support Java 6 and that the Java 6 IBM ships is patched
 to resolve security issues.  But the fact remains that there are
 environment that are not at java 8 and won't be there any time soon.  Our
 customers are reticent to move java versions in their large environments.
 I have a project where we are going to have to back port Jena 13.1 to Java
 6 (if possible).   But that is neither here nor there with regards to the
 topic at hand.
 
 Claude
 
 On Mon, Jun 22, 2015 at 9:03 PM, A. Soroka (JIRA) j...@apache.org wrote:
 
 
[
 https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596545#comment-14596545
 ]
 
 A. Soroka edited comment on JENA-966 at 6/22/15 8:02 PM:
 -
 
 Just as a sidenote for anyone following this who is not familiar with the
 use of {{Supplier}}: {{Supplier}} is not part of the Java 8 Stream API,
 it's part of the Function API. It is a very simple SAM interface that is
 intended to hold a computation. So one might do:
 
 {code:language=java}
 # the only costs of the line below are an assignment and object creation
 SupplierFoo fooForLater = () - expensiveCompution();
 
 # do some other stuff
 
 # the line below is where we pay the cost of expensiveCompution()
 Foo myFoo = fooForLater.get();
 {code}
 
 
 was (Author: ajs6f):
 Just as a sidenote for anyone following this who is not familiar with the
 use of {{Supplier}}: {{Supplier}} is not part of the Java 8 Stream API,
 it's part of the Function API. It is a very simple SAM interface that is
 intended to hold a computation. So one might do:
 
 {code:language=java}
 # the only cost of the line below is an assignment
 SupplierFoo fooForLater = () - expensiveCompution();
 
 # do some other stuff
 
 # the line below is where we pay the cost of expensiveCompution()
 Foo myFoo = fooForLater.get();
 {code}
 
 LazyIterator
 
 
Key: JENA-966
URL: https://issues.apache.org/jira/browse/JENA-966
Project: Apache Jena
 Issue Type: Bug
 Components: Core
   Affects Versions: Jena 3.0.0
   Reporter: Claude Warren
   Assignee: Claude Warren
 
 LazyIterator is an abstract class.  The documentation indicates that the
 create() method needs to be overridden to create an instance.  From this I
 would expect that
 now LazyIterator(){
 @Override
 public ExtendedIteratorModel create() {
  ...
 }};
 Would work however LazyIterator does not override:
 remoteNext(), andThen(), toList(), and

Re: [jira] [Commented] (JENA-966) LazyIterator

2015-06-17 Thread aj...@virginia.edu

How about using a Java 8 SupplierIteratorT? That's pretty lazy.

---
A. Soroka
The University of Virginia Library

On Jun 17, 2015, at 5:07 AM, Claude Warren cla...@xenei.com wrote:

 I wanted to use it in an application.  Is there a replacement?
 
 On Wed, Jun 17, 2015 at 9:10 AM, Andy Seaborne (JIRA) j...@apache.org
 wrote:
 
 
[
 https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589458#comment-14589458
 ]
 
 Andy Seaborne commented on JENA-966:
 
 
 There are no uses of this class in the codebase anymore.  We can remove it.
 
 LazyIterator
 
 
Key: JENA-966
URL: https://issues.apache.org/jira/browse/JENA-966
Project: Apache Jena
 Issue Type: Bug
 Components: Core
   Affects Versions: Jena 3.0.0
   Reporter: Claude Warren
   Assignee: Claude Warren
 
 LazyIterator is an abstract class.  The documentation indicates that the
 create() method needs to be overridden to create an instance.  From this I
 would expect that
 now LazyIterator(){
 @Override
 public ExtendedIteratorModel create() {
  ...
 }};
 Would work however LazyIterator does not override:
 remoteNext(), andThen(), toList(), and toSet().
 I believe these should be implemented in the class.
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: CLI libraries

2015-06-11 Thread aj...@virginia.edu

In the PR I submitted a day or two ago, I added a DEPRECATED: Please use riot 
instead message to the help of rdfcat, but I didn't have it emit that message 
to stderr on all runs. That seems like a good move to me. I'll add that, and we 
can then decide whether to go all the way to having that be the _only_ think 
rdfcat does in Jena 3.

---
A. Soroka
The University of Virginia Library

On Jun 10, 2015, at 6:39 AM, Andy Seaborne a...@apache.org wrote:

 On 09/06/15 17:11, aj...@virginia.edu wrote:
 I don't see any actual references in the documentation to rdfcat. Perhaps it 
 can be deprecated?
 
 Interesting question - how to deprecate a command line tool?
 
 Print a This is deprecated message to stderr?
 
 As a Jena3 step we can be faster with migration.
 jena.rdfcat = only a message saying use riot?
 
 
 But first - is riot a good enough replacement?  Does it need more 
 documentation? (probabaly yes as facilities got added incrementally : 
 --formatted=FORMAT needs to be the default output style and streaming 
 require intervention).
 
   Andy
 
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote:
 
 People use rdfcat :-( but nowadays riot is better IMO (scale, speed, 
 arguments, ..) but I'm not unbiased.

Re: Trouble Building Under Eclipse

2015-06-11 Thread aj...@virginia.edu

I work on other projects for which we separate the lifecycles of the main 
product and ancillary or supporting products (e.g. configuration for 
Checkstyles) and it works well so long as:

1) The sidecar artifacts are available from Maven Central or an appropriate 
more specific repository. This avoids any annoying double-build situations.

2) The cost of building/publishing the sidecar artifacts is low. This is 
because it's done less frequently and therefore less expertise develops in the 
community about doing it.

As always in dev workflows, YMMV, but shaded Guava does seem to me like a good 
candidate. If the conversation about project code style picks up again (and I 
will be trying to move that forward in a message tomorrow) then artifacts 
related thereto might also be good candidates.

---
A. Soroka
The University of Virginia Library

On Jun 10, 2015, at 5:47 AM, Andy Seaborne a...@apache.org wrote:

 On 09/06/15 16:26, aj...@virginia.edu wrote:
 Okay, now I get why we're sticking with shading in Guava, at least
 for now (since this seems like the kind of problem that OSGi solves
 and hopefully Jigsaw will solve).
 
 Are there objections to ejecting shaded Guava from the main dev
 effort into its own orbit? Or is there a dev cycle associated to the
 main one that makes sense as a home for Guava?
 
 I don't mind either way - doesn't seem like a clear cut right or wrong.
 
 Currently, we have a single build and it produces a single consistent cut of 
 versions (e.g. the binary distribution includes dependencies). 
 jena-shade-guava is the same version as main jena version.
 
 One release vote.
 
 How often does Guava versions change?
 
 16,17,18 were close together (a few months) but 18, the latest, was Aug 2014.
 
   Andy
 
 
 --- A. Soroka The University of Virginia Library
 
 On Jun 8, 2015, at 3:11 PM, Andy Seaborne a...@apache.org wrote:
 
 Hadoop/Elephas is an example of a general problem with Guava.  By
 reputation, upgrading Guava across versions has been problematic -
 subtle and not-so-subtle changes of behaviour or removed code.
 
 When Jena is used as a library, the system or application in which
 it is used might use Guava itself - and need a specific version.
 But Jena uses Guava and needs a specific version with certain code
 in it, which might be different.
 
 We are isolating Jena's use of Guava from the system in which Jena
 is used.  Hadoop's have very strong requirements on Guava versions
 - it might well apply to other user applications as well.
 
 We do exclude/ in the sense that dependency-reduced-pom.xml POM
 of jena-shared-guava does not mention com.google.guava:guava.
 Elephas picks up the Hadoop dependency.
 
 Andy
 
 On 08/06/15 14:26, aj...@virginia.edu wrote:
 I think the idea of breaking the shaded Guava artifact out of
 the main  cycle is great. It's clearly not a subject of work
 under most circumstances and having one less moving part in a
 developer's mix is usually a good thing, especially for the
 simple-minded ({raises hand}).
 
 Is it only Hadoop's Guava that is at issue? Would it be possible
 perhaps to just exclude/ Guava from the Hadoop dependencies in
 Elephas? Or does that blow up Hadoop? Or should I go experiment
 and find out?
 
 --- A. Soroka The University of Virginia Library
 
 On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org
 wrote:
 
 Ah right. To summarise what is happening:
 
 The POM file in the maven repo is not the POM file in git.The
 shade plugin produces a different POM for the the output
 artifact with the shaded dependency removed.
 
 When the project is not open, Eclipse sees the reduced POM,
 which does not have a dependency on Google Guava.
 
 When the module jena-shaded-guava is open in Eclipse, Eclipse
 sees the POM in the module source which names the dependent
 Google Guava in  a dependency.
 
 Result: a certain degree of chaos.
 
 Andy
 
 On 06/06/15 03:19, Stian Soiland-Reyes wrote:
 Yes, you would need to keep the jena-guava project closed so
 you get the Maven-built shaded jar on the classpath, which
 has the shaded package name, otherwise you will just see the
 upstream Guava through Eclipse's project sharing.
 
 The package name is not shaded for OSGi, it is easy to define
 private packages there. It is shaded to avoid duplicate
 version mismatches against other dependencies with the real
 guava, e.g. Hadoop which as you know has an ancient Guava.
 
 It might be good to keep it out of the normal build/release
 cycle, then you would get the jena-guava shade from Maven
 central, which should only change when we upgrade Guava, in
 which case it could be re-enabled in the SNAPSHOT build or
 vote+released as a separate artifact (which might be slightly
 odd as it contains no Jena contributions beyond the package
 name) On 4 Jun 2015 14:33, aj...@virginia.edu
 aj...@virginia.edu wrote:
 
 I have had this problem since I began tinkering. The only
 solution I have found is make sure that the
 jena-shaded-guava project

Re: TDB2

2015-06-09 Thread aj...@virginia.edu

Is there some high level overview of Lizard/Mantis/TDB2 yet extant? Like the 
kind of thing we might see at a conference?

In any event, thanks for working on this-- it's great to know that Jena will be 
able to cluster soon.

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 1:24 PM, Andy Seaborne a...@apache.org wrote:

 On 08/06/15 17:48, Marco Neumann wrote:
 is TDB2 going to replace TDB or is TDB2 a new cluster product?
 
 Whatever people (users, developers) want.  Migrating Dbs is not as easy as 
 ungrading code.  Running oaj.tdb and oaj.tdb2 side by side
 
 (TDB2 is itself 7 maven modules ATM - some can be combined as they are small 
 and just a good idea at the time).
 
 TDB2 is not the cluster (that's Lizard).  Mantis started as the separation 
 out of the low level code needed for Lizard. Initially validation of the 
 reworking of transaction and datastructures, a little extra work has made it 
 as viable as TDB2
 
   Andy
 
 (oaj = org.apache.jena)
 
 
 Marco
 
 On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote:
 Informational announcement: TDB2
 
 TDB2 is a reworking of TDB based on updated implementations of transactions
 and transactional data structures for project Lizard (a clustered SPARQL
 store).
 
 TDB2 has:
 
 * Arbitrary scale write-once transactions
 * New transaction system - can add other first class components.
   (e.g. text indexes, cache tables)
 * Models works across transaction boundaries
 * Cleaner, simpler, more maintainable
 
 TDB2 databases are not compatible with TDB databases.  It uses a more
 efficient encoding for RDF terms.  [1]
 
 Being a database, the new indexing and transaction code needs time to settle
 to bring the maturity up.  I'm using that tech in Lizard development.
 
 Andy
 
 TDB2 code:
 https://github.com/afs/mantis/tree/master/tdb2
 
 Lizard slides:
 http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard
 
 
 [1] An upgrade path using TDB1-style encoding is possible; it is an one-way
 upgrade path and not reversible [2].  TDB2 adds control files for the
 copy-on-write data structures that TDB1 does not understand.
 
 [2] Actually, if the encoding is compatible, what will happen is that TDB1
 will see the database at the time of the upgrade.  Welcome to copy-on-write
 immutable data structures.

Re: Trouble Building Under Eclipse

2015-06-09 Thread aj...@virginia.edu

Okay, now I get why we're sticking with shading in Guava, at least for now 
(since this seems like the kind of problem that OSGi solves and hopefully 
Jigsaw will solve). 

Are there objections to ejecting shaded Guava from the main dev effort into its 
own orbit? Or is there a dev cycle associated to the main one that makes sense 
as a home for Guava?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 3:11 PM, Andy Seaborne a...@apache.org wrote:

 Hadoop/Elephas is an example of a general problem with Guava.  By reputation, 
 upgrading Guava across versions has been problematic - subtle and 
 not-so-subtle changes of behaviour or removed code.
 
 When Jena is used as a library, the system or application in which it is used 
 might use Guava itself - and need a specific version.  But Jena uses Guava 
 and needs a specific version with certain code in it, which might be 
 different.
 
 We are isolating Jena's use of Guava from the system in which Jena is used.  
 Hadoop's have very strong requirements on Guava versions - it might well 
 apply to other user applications as well.
 
 We do exclude/ in the sense that dependency-reduced-pom.xml POM of 
 jena-shared-guava does not mention com.google.guava:guava. Elephas picks up 
 the Hadoop dependency.
 
   Andy
 
 On 08/06/15 14:26, aj...@virginia.edu wrote:
 I think the idea of breaking the shaded Guava artifact out of the
 main  cycle is great. It's clearly not a subject of work under most
 circumstances and having one less moving part in a developer's mix is
 usually a good thing, especially for the simple-minded ({raises hand}).
 
 Is it only Hadoop's Guava that is at issue? Would it be possible
 perhaps to just exclude/ Guava from the Hadoop dependencies in
 Elephas? Or does that blow up Hadoop? Or should I go experiment and find
 out?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote:
 
 Ah right. To summarise what is happening:
 
 The POM file in the maven repo is not the POM file in git.The shade plugin 
 produces a different POM for the the output artifact with the shaded 
 dependency removed.
 
 When the project is not open, Eclipse sees the reduced POM, which does not 
 have a dependency on Google Guava.
 
 When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM 
 in the module source which names the dependent Google Guava in  a 
 dependency.
 
 Result: a certain degree of chaos.
 
 Andy
 
 On 06/06/15 03:19, Stian Soiland-Reyes wrote:
 Yes, you would need to keep the jena-guava project closed so you get the
 Maven-built shaded jar on the classpath, which has the shaded package name,
 otherwise you will just see the upstream Guava through Eclipse's project
 sharing.
 
 The package name is not shaded for OSGi, it is easy to define private
 packages there. It is shaded to avoid duplicate version mismatches against
 other dependencies with the real guava, e.g. Hadoop which as you know has
 an ancient Guava.
 
 It might be good to keep it out of the normal build/release cycle, then you
 would get the jena-guava shade from Maven central, which should only change
 when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT
 build or vote+released as a separate artifact (which might be slightly odd
 as it contains no Jena contributions beyond the package name)
  On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote:
 
 I have had this problem since I began tinkering. The only solution I have
 found is make sure that the jena-shaded-guava project is never open when
 any project that refers to types therein is open. This isn't much of a
 burden, and I suppose it has something to do with the Maven magic that is
 going on inside jena-shaded-guava.
 
 I'm not totally clear as to why Jena shades Guava into its own namespace--
 is it to avoid OSGi-exporting Guava packages? (We have something like that
 going on in another project on which I work.)
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:
 
 Folks
 
 Recently I've been having a lot of trouble getting Jena to build in
 Eclipse
 which seems to be due to the use of the Shade plugin to Shade Guava.  Any
 module that has a reference to the shaded classes ends refuses to build
 with
 various variations of the following error:
 
 java.lang.NoClassDefFoundError:
 org/apache/jena/ext/com/google/common/cache/RemovalNotification
 
 Anybody else been having this issue?  If so how did you resolve it?
 
 Sometimes cleaning my workspace and/or doing a mvn package at the command
 line seems to help but other times it doesn't
 
 Rob

Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-09 Thread aj...@virginia.edu

On Jun 8, 2015, at 6:35 PM, Andy Seaborne a...@apache.org wrote:

 less - there is no transactionality across the contained graphs. (Model.graph 
 transactions aren't connected to dataset transactions)

Ah, glad I asked! {grin}

 As far as model-as-views-of-datasets: is it true that all that is needed for 
 this is a good in-memory dataset?
 
 It would be useful for working in-memory. For example default union graph can 
 bne made to work efficiently, as can dataset transactions.

Okay, so it's more that having a good in-memory dataset would be helpful here? 
I'm just trying to establish if you see the in-memory dataset improvement as 
_blocking_ models-as-views or just that models-as-views would be worth more and 
work better accompanied by a better in-memory dataset.

 What about datasets that are much too large for memory? Or impls of Dataset 
 that incur network latency in operation? Or do these cases just imply the 
 need for the right kinds of laziness in views on Datasets?
 
 Models from TDB are already views.
 public class GraphTDB extends GraphView …

Cool. So we already have that laziness in hand in the form of GraphView.

---
A. Soroka
The University of Virginia Library

Re: CLI libraries

2015-06-09 Thread aj...@virginia.edu

I don't see any actual references in the documentation to rdfcat. Perhaps it 
can be deprecated?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote:

 People use rdfcat :-( but nowadays riot is better IMO (scale, speed, 
 arguments, ..) but I'm not unbiased.

Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-08 Thread aj...@virginia.edu

So to be clear, part of the idea here is to boost the visibility of 
transactions, and one of the things that wants doing as part of that is to 
provide for copy-on-add-graph semantics for the in-memory dataset so that 
transactionality is coherent across such a dataset. Right now it instead is a 
sort of patchwork of whatever forms of transactionality were available in the 
graphs that have been added to it, which isn't an attractive thing to 
advertise, and may not even really work all the time.

As far as model-as-views-of-datasets: is it true that all that is needed for 
this is a good in-memory dataset? What about datasets that are much too large 
for memory? Or impls of Dataset that incur network latency in operation? Or do 
these cases just imply the need for the right kinds of laziness in views on 
Datasets?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote:

 On 08/06/15 10:25, Claude Warren wrote:
 What exactly is this review asking?  Change in strategy or change in docs?
 
 Both :-)
 
 concurrency-howto does not mention transactions except in passing.  It shoudl 
 be more pro-transactions IMO.
 
 A possibility is that Dataset are all transactional, even is that is only 
 DatasetGraphWithLock;
 
 No Dataset.supportsTransactions - its always true.
 Remove Dataset.getlock.
 
 concurrency-howto would be for model-only use.  Everything else is 
 transaction in style.  The documentation should reflect this preferred style.
 
 If we had (hi ajs6f!) an in-memory dataset as well as the general container 
 one, and the in-memory one were transactional, copy-in for addGraph, we could 
 make models be views of datasets always.  Creating a model would have an 
 implicit Dataset if a free standing model.
 
   Andy
 
 
 On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org
 wrote:
 
 Andy Seaborne created JENA-957:
 --
 
  Summary: Review concurrency howto in the light of
 transactions.
  Key: JENA-957
  URL: https://issues.apache.org/jira/browse/JENA-957
  Project: Apache Jena
   Issue Type: Bug
 Reporter: Andy Seaborne
 Priority: Minor
 
 
 http://jena.apache.org/documentation/notes/concurrency-howto.html
 
 Include {{DatasetGraphWithLock}}.
 
 Consider if that should be the default for in-memory and general datasets.
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)

Re: Trouble Building Under Eclipse

2015-06-08 Thread aj...@virginia.edu

I think the idea of breaking the shaded Guava artifact out of the main cycle is 
great. It's clearly not a subject of work under most circumstances and having 
one less moving part in a developer's mix is usually a good thing, especially 
for the simple-minded ({raises hand}).

Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to 
just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that 
blow up Hadoop? Or should I go experiment and find out?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote:

 Ah right. To summarise what is happening:
 
 The POM file in the maven repo is not the POM file in git.The shade plugin 
 produces a different POM for the the output artifact with the shaded 
 dependency removed.
 
 When the project is not open, Eclipse sees the reduced POM, which does not 
 have a dependency on Google Guava.
 
 When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in 
 the module source which names the dependent Google Guava in  a dependency.
 
 Result: a certain degree of chaos.
 
   Andy
 
 On 06/06/15 03:19, Stian Soiland-Reyes wrote:
 Yes, you would need to keep the jena-guava project closed so you get the
 Maven-built shaded jar on the classpath, which has the shaded package name,
 otherwise you will just see the upstream Guava through Eclipse's project
 sharing.
 
 The package name is not shaded for OSGi, it is easy to define private
 packages there. It is shaded to avoid duplicate version mismatches against
 other dependencies with the real guava, e.g. Hadoop which as you know has
 an ancient Guava.
 
 It might be good to keep it out of the normal build/release cycle, then you
 would get the jena-guava shade from Maven central, which should only change
 when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT
 build or vote+released as a separate artifact (which might be slightly odd
 as it contains no Jena contributions beyond the package name)
  On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote:
 
 I have had this problem since I began tinkering. The only solution I have
 found is make sure that the jena-shaded-guava project is never open when
 any project that refers to types therein is open. This isn't much of a
 burden, and I suppose it has something to do with the Maven magic that is
 going on inside jena-shaded-guava.
 
 I'm not totally clear as to why Jena shades Guava into its own namespace--
 is it to avoid OSGi-exporting Guava packages? (We have something like that
 going on in another project on which I work.)
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:
 
 Folks
 
 Recently I've been having a lot of trouble getting Jena to build in
 Eclipse
 which seems to be due to the use of the Shade plugin to Shade Guava.  Any
 module that has a reference to the shaded classes ends refuses to build
 with
 various variations of the following error:
 
 java.lang.NoClassDefFoundError:
 org/apache/jena/ext/com/google/common/cache/RemovalNotification
 
 Anybody else been having this issue?  If so how did you resolve it?
 
 Sometimes cleaning my workspace and/or doing a mvn package at the command
 line seems to help but other times it doesn't
 
 Rob

CLI libraries

2015-06-08 Thread aj...@virginia.edu

In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it 
seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, 
in package jena.cmdline.

If that is correct, and Jena has its own CLI code, wouldn't it be better to 
replace this with a modern CLI library like that provided by Apache Commons? 
Does that sound like a ticket?

---
A. Soroka
The University of Virginia Library

Re: CLI libraries

2015-06-08 Thread aj...@virginia.edu

Okay, that makes sense.

Is the larger move (the construction of 'jena-cmd') worth an epic in Jira? With 
the smaller (take arq.cmd* to jena-base/jena.cmd* and drop 
jena-core/jena.cmdline) as a first story therein?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote:

 On 08/06/15 15:47, aj...@virginia.edu wrote:
 In examining and discussing
 https://issues.apache.org/jira/browse/JENA-959, it seems to me (a
 Jena newbie!) that Jena's CLI action is built up in jena-core, in
 package jena.cmdline.
 
 If that is correct, and Jena has its own CLI code, wouldn't it be
 better to replace this with a modern CLI library like that provided
 by Apache Commons? Does that sound like a ticket?
 
 arq.cmdline.CmdLineArgs
 
 The whole cmd support does more than Apache Commons CLI.
 
 Around command line processing is support for grouping and reuse across 
 commands, and an execution model.
 
 There are a lot of commands -- Apache Commons CLI would also cause chnages in 
 syntax.   e.g. arq.cmd does not treat -- and - differently; combined POSIX 
 like options aren't supported.
 
 (jena.cmdline looks like some partial copy to get older development working).
 
 A useful goal might be to have a module jena-cmd which is after SDB, TDB 
 and the rest with the set of command line tools we deed to be the public set 
 of commands (some of the old stuff needs retiring or at least incompatibly 
 brought into the general style - e.g. rdfcompare).
 
 People use rdfcat :-( but nowadays riot is better IMO (scale, speed, 
 arguments, ..) but I'm not unbiased.
 
 A useful but bounded stpe might be to take arq.cmd* to jena-base/jena.cmd* 
 and drop jena-core/jena.cmdline (not tried this so there maybe a forgotten 
 dependency).
 
 
   Andy
 
 
 --- A. Soroka The University of Virginia Library

Re: Trouble Building Under Eclipse

2015-06-04 Thread aj...@virginia.edu

I have had this problem since I began tinkering. The only solution I have found 
is make sure that the jena-shaded-guava project is never open when any project 
that refers to types therein is open. This isn't much of a burden, and I 
suppose it has something to do with the Maven magic that is going on inside 
jena-shaded-guava.

I'm not totally clear as to why Jena shades Guava into its own namespace-- is 
it to avoid OSGi-exporting Guava packages? (We have something like that going 
on in another project on which I work.)

---
A. Soroka
The University of Virginia Library

On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:

 Folks
 
 Recently I've been having a lot of trouble getting Jena to build in Eclipse
 which seems to be due to the use of the Shade plugin to Shade Guava.  Any
 module that has a reference to the shaded classes ends refuses to build with
 various variations of the following error:
 
 java.lang.NoClassDefFoundError:
 org/apache/jena/ext/com/google/common/cache/RemovalNotification
 
 Anybody else been having this issue?  If so how did you resolve it?
 
 Sometimes cleaning my workspace and/or doing a mvn package at the command
 line seems to help but other times it doesn't
 
 Rob

Re: Commons RDF

2015-05-14 Thread aj...@virginia.edu

Wearing my Jena user's hat for a moment, this would be lovely and I would be 
happy to help with it.

A project [1] on which I work persists RDF via some very complex mappings into 
and out of a JCR repository, and being able to stream it a little more 
gracefully would be a nice win for us. Those mappings are basically formed out 
of iterators and transformations, kind of a poor man's Stream API, but we're 
moving to rebuild over the real Streams API. Maybe this could be generalized 
into a more popular use case?

[1]: http://www.fedora-commons.org/

---
A. Soroka
The University of Virginia Library

On May 14, 2015, at 3:46 PM, Stian Soiland-Reyes st...@apache.org wrote:

 I'm also interested in making Jena parsers and serializers usable directly
 from a Commons RDF perspective, without interaction with intermediate Jena
 core objects. E.g something like:
 
 StreamTriple s = JenaCommonsRDF.read(inputStream, Lang.Turtle)
 
 And vice versa for write.
 
 Such a bridge should be possible on top of StreamRDF and RIOT, right?
 Perhaps a Worker thread is needed if there is pull vs push issues.
 
 Should we start a branch, or first flesh out the rough edges of such a
 bridge module in the wiki?
 On 13 May 2015 15:59, Andy Seaborne a...@apache.org wrote:
 
 On 12/05/15 15:26, A. Soroka wrote:
 
 At:
 
 http://commonsrdf.incubator.apache.org/implementations.html
 
 It says Apache Jena is considering a compatibility interface that
 provides and consumes Commons RDF objects.
 
 I'm wondering if there have been any experiments to that end, or whether
 Jena is waiting for some resources to explore that possibility? I would be
 happy to give a go at making a simple module that just implements the
 current Commons RDF API types over
 jena-core in a simple way, to get things started.
 ---
 A. Soroka
 The University of Virginia Library
 
 
 I have some code that mocks up commonsrdf over Jena in the sense that it
 uses jena behind the RDFTermFactory; that's the easy bit.  It's limited and
 definitely not a bridge between the two APIs.  It is merely exploring the
 commonsrdf work.
 
 It would mess up the existing interfaces no end to add commonsrdf as
 interfaces to Model/Resource; and Graph/Triple/Node is generalized RDF so
 the type model does not fit.
 
 It needs a bridge module and a proper module would be good.
 
 ((I also have https://github.com/afs/commonsrdf-container which is even
 more minimal than the simple implementation.  Not Jena related.))
 
 Some other interesting projects:
  An in-memory dataset : JENA-624
 
 Have a specifically in-memory DatasetGraph to complement the current
 general purpose dataset.
 
  Bruno is working on JENA-632
 
 In fact, I can see commonsrdf being at the center of a new API, very Java8
 specific, that is oriented around processing RDF stream style - see the
 email from Paul Houle.
 
 Or take StreamRDF and add java8-stream-ness around it (maybe not directly
 changing but making it the source for java8-streams - some issues of
 pull-streams and push-stream styles here which are hard when efficiency is
 considered).
 
 
Andy

code quality tools WAS: Code policies

2015-05-13 Thread aj...@virginia.edu

There seems to be some consensus that it would be nice to bring in some 
automated code quality facilities for Jena. So far, the ones that have been 
mentioned are:

1) Sonar, which is on the way-- https://issues.apache.org/jira/browse/INFRA-9469

2) FindBugs, for which good Maven support exists: 
http://gleclaire.github.io/findbugs-maven-plugin/

3) PMD, for which again, good Maven support exists 
http://maven.apache.org/plugins/maven-pmd-plugin/

I've made a ticket for trying out FindBugs and PMD: 
https://issues.apache.org/jira/browse/JENA-941

and I'll happily work it. Maybe we'll like the feedback, maybe not, but it's 
always good to get more info.

---
A. Soroka
The University of Virginia Library

On May 12, 2015, at 5:22 PM, Bruno P. Kinoshita ki...@apache.org wrote:

 I think that something like checkstyle, PMD and FindBugs, and update the 
 contribution page asking contributors to review their changes before sending 
 PR's or patches would help. 
 It would be good to avoid replicating unnecessary policies in the web site 
 though. Like suggesting that we expect the contributor to use 80 columns. 
 That would mean that we would have to update the checktyle XML rule file and 
 the web site if we decided to use 120 or any other number. We can probably 
 leave some basic policies (no tabs, no unused imports, etc).
 WDYT?
 Bruno
 
 
  From: A. Soroka aj...@virginia.edu
 To: dev@jena.apache.org dev@jena.apache.org 
 Sent: Wednesday, May 13, 2015 2:07 AM
 Subject: Code policies
 
 From comments on some clean up PRs I submitted over this past weekend, it 
 seems that it would be nice to have some rough code standards that could help 
 newcomers _without_ inhibiting anyone from contributing. Possible policies 
 that came up included:
 
 • Don't give a method signature that throws checked exceptions that 
 aren't ever thrown from the code in that method unless an API supertype specs 
 it.
 • Don't leave unused imports in. Any IDE can solve that problem with one 
 keystroke. {grin}
 • If a type declares a supertype that isn't a required declaration, 
 consider whether that clarifies or confuses the intent. The former is okay, 
 the latter not so good.
 • Minimize the compiler warnings your code throws up. If you use 
 @SuppressWarnings to hide them, please add a comment explaining the situation 
 or a TODO with a potential future fix that would allow removing the 
 suppression.
 • Remove unused local variables or fields or uninteresting unused private 
 methods. If it's debugging detritus, consider replacing it with good logging 
 code for future use, if that seems likely to become useful.
 • If there is valuable code in some unused private method, add a 
 @SuppressWarnings with an explanation of when it might become useful. If 
 there is valuable but dead code inside a live method, consider breaking 
 it out into a private method and adding a @SuppressWarnings and an 
 explanation.
 
 If we can develop a reasonable list of expectations for contributions (and 
 presumably, for the current code base) I will be happy to write some text for 
 project site pages and try to encode some of the expectations with Maven 
 Checkstyles. To be clear, I'm not suggesting any kind of blocking step in the 
 build process, just a chance for some handy feedback about code submissions.
 
 Thoughts?
 
 ---
 A. Soroka
 The University of Virginia Library

Re: another clean up suggestion: dead code and its resuscitation

2015-05-11 Thread aj...@virginia.edu

I've laid in a ticket:

https://issues.apache.org/jira/browse/JENA-938

and attached a few PRs of reasonable size. They contain the removal of 
superinterfaces that don't need declaration, checked exceptions that cannot be 
thrown, and unnecessary typecasts. Those seemed to be entirely 
non-controversial moves to make. They result in additions and deletions, but 
the net result is many, many lines that are shorter and easier to read and 439 
fewer LOC in total.

It's not clear to me that there was consensus about removing never-called 
private methods, unreachable code (i.e. if (false) {…}) or unused fields.

I think I could send in at least one more PR with the removal of unused local 
variables? That also seems generally non-controversial.

---
A. Soroka
The University of Virginia Library

On May 8, 2015, at 10:14 AM, Claude Warren cla...@xenei.com wrote:

 I think the catch of execptions that can not be thrown can be safely
 removed.
 I would also vote up removal of private methods that are never used.
 
 fields are a bit trickier, but then I am probably thinking of parameters
 and  matching an interface... Yeah, I would vote up removing unused methods.
 
 Claude
 
 
 On Fri, May 8, 2015 at 3:54 AM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 I'm building a PR [1] right now as a sort of think-piece to give us
 something concrete to look at. I'm building it up out of ONLY things that
 Eclipse/javac can determine are definitely impossible to execute or
 redundant or never used, including:
 
 - Private methods that are never used.
 - Superinterfaces that don't need declaration.
 - Fields that are never used.
 - Checked exceptions that cannot be thrown.
 
 and so far, I'm at about 11,000 lines to delete, which is… a good many.
 Certainly too many to believe that all are really totally dead stuff that
 should be gone. {grin} As you point out, some portion of this is stuff that
 we wouldn't want to lose. My hope is that we can look over this PR and
 develop some tickets for the kinds of things to which you refer (e.g.
 features that didn't make it into SPARQL) and insert some TODOs and so
 forth. And maybe we can use it as a starting place for actual pruning. I'll
 send the PR sometime tomorrow.
 
 [1] https://github.com/ajs6f/jena/tree/KillDeadThings
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote:
 
 +1 to removing dead code though what is dead is tricky. In arq and tdb
 there was some but they included code that is a useful record (e.g.
 features that didn't make it into SPARQL).  I removed obvious junk. Some is
 checking code that I'd like to leave.
 
 I had a look - a regex of if *\( *false *\) but I didn't find much in
 core (just 2)
 
 if(false) requires the compiler to generate no code and final
 boolean but in Java8, does that include effectively final?
 
 What were you looking for?
 
 I tend to agree that the use of a field makes things worse.
 
  Andy
 
 On 07/05/15 19:24, Stephen Allen wrote:
 I'd say just eliminate all of that dead code.  Also any commented code
 as
 well.  We have a source control system, one can always look into the
 history to get that stuff.  Using a field just makes it worse IMO...
 it'll
 never get removed if we do that.
 
 -Stephen
 
 
 On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote:
 
 There are a goodly number of pieces (150) of dead code in Jena, of
 the
 form:
 
 org.apache.jena.mem.HashCommon:
 
void showkeys()
{
if (false)
{
System.err.print(  KEYS: );
// some logging code
System.err.println();
}
}
 
 If I understand this rightly, these are cases where we want to keep
 some
 code on deck for potential use. I'd like to suggest that many of
 these
 guys might be rewritten with a field or fields in the class, something
 like:
 
boolean useLoggingCode = false;
 
void showkeys()
{
if (useLoggingCode)
etc.
}
 
 This would make things a bit clearer and clean out a bunch of compiler
 warnings.
 
 Does this sound like a good approach? Worth doing?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 
 
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: another clean up suggestion: dead code and its resuscitation

2015-05-08 Thread aj...@virginia.edu

There is now a PR at:

https://github.com/apache/jena/pull/58

with much of this work available. The idea is not to merge that gargantuan PR, 
but to give folks an easy way to see what the code looks like after I took a 
meat ax to it. {grin}

I would be happy to create more reasonable packages of changes from that 
monster PR for serious review and possible merging. Would a module-by-module 
approach be best?

---
A. Soroka
The University of Virginia Library

On May 8, 2015, at 10:14 AM, Claude Warren cla...@xenei.com wrote:

 I think the catch of execptions that can not be thrown can be safely
 removed.
 I would also vote up removal of private methods that are never used.
 
 fields are a bit trickier, but then I am probably thinking of parameters
 and  matching an interface... Yeah, I would vote up removing unused methods.
 
 Claude
 
 
 On Fri, May 8, 2015 at 3:54 AM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 I'm building a PR [1] right now as a sort of think-piece to give us
 something concrete to look at. I'm building it up out of ONLY things that
 Eclipse/javac can determine are definitely impossible to execute or
 redundant or never used, including:
 
 - Private methods that are never used.
 - Superinterfaces that don't need declaration.
 - Fields that are never used.
 - Checked exceptions that cannot be thrown.
 
 and so far, I'm at about 11,000 lines to delete, which is… a good many.
 Certainly too many to believe that all are really totally dead stuff that
 should be gone. {grin} As you point out, some portion of this is stuff that
 we wouldn't want to lose. My hope is that we can look over this PR and
 develop some tickets for the kinds of things to which you refer (e.g.
 features that didn't make it into SPARQL) and insert some TODOs and so
 forth. And maybe we can use it as a starting place for actual pruning. I'll
 send the PR sometime tomorrow.
 
 [1] https://github.com/ajs6f/jena/tree/KillDeadThings
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote:
 
 +1 to removing dead code though what is dead is tricky. In arq and tdb
 there was some but they included code that is a useful record (e.g.
 features that didn't make it into SPARQL).  I removed obvious junk. Some is
 checking code that I'd like to leave.
 
 I had a look - a regex of if *\( *false *\) but I didn't find much in
 core (just 2)
 
 if(false) requires the compiler to generate no code and final
 boolean but in Java8, does that include effectively final?
 
 What were you looking for?
 
 I tend to agree that the use of a field makes things worse.
 
  Andy
 
 On 07/05/15 19:24, Stephen Allen wrote:
 I'd say just eliminate all of that dead code.  Also any commented code
 as
 well.  We have a source control system, one can always look into the
 history to get that stuff.  Using a field just makes it worse IMO...
 it'll
 never get removed if we do that.
 
 -Stephen
 
 
 On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote:
 
 There are a goodly number of pieces (150) of dead code in Jena, of
 the
 form:
 
 org.apache.jena.mem.HashCommon:
 
void showkeys()
{
if (false)
{
System.err.print(  KEYS: );
// some logging code
System.err.println();
}
}
 
 If I understand this rightly, these are cases where we want to keep
 some
 code on deck for potential use. I'd like to suggest that many of
 these
 guys might be rewritten with a field or fields in the class, something
 like:
 
boolean useLoggingCode = false;
 
void showkeys()
{
if (useLoggingCode)
etc.
}
 
 This would make things a bit clearer and clean out a bunch of compiler
 warnings.
 
 Does this sound like a good approach? Worth doing?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 
 
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: another clean up suggestion: dead code and its resuscitation

2015-05-07 Thread aj...@virginia.edu

I'm building a PR [1] right now as a sort of think-piece to give us something 
concrete to look at. I'm building it up out of ONLY things that Eclipse/javac 
can determine are definitely impossible to execute or redundant or never used, 
including:

- Private methods that are never used.
- Superinterfaces that don't need declaration.
- Fields that are never used.
- Checked exceptions that cannot be thrown.

and so far, I'm at about 11,000 lines to delete, which is… a good many. 
Certainly too many to believe that all are really totally dead stuff that 
should be gone. {grin} As you point out, some portion of this is stuff that we 
wouldn't want to lose. My hope is that we can look over this PR and develop 
some tickets for the kinds of things to which you refer (e.g. features that 
didn't make it into SPARQL) and insert some TODOs and so forth. And maybe we 
can use it as a starting place for actual pruning. I'll send the PR sometime 
tomorrow.

[1] https://github.com/ajs6f/jena/tree/KillDeadThings

---
A. Soroka
The University of Virginia Library

On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote:

 +1 to removing dead code though what is dead is tricky. In arq and tdb 
 there was some but they included code that is a useful record (e.g. features 
 that didn't make it into SPARQL).  I removed obvious junk. Some is checking 
 code that I'd like to leave.
 
 I had a look - a regex of if *\( *false *\) but I didn't find much in core 
 (just 2)
 
 if(false) requires the compiler to generate no code and final boolean but 
 in Java8, does that include effectively final?
 
 What were you looking for?
 
 I tend to agree that the use of a field makes things worse.
 
   Andy
 
 On 07/05/15 19:24, Stephen Allen wrote:
 I'd say just eliminate all of that dead code.  Also any commented code as
 well.  We have a source control system, one can always look into the
 history to get that stuff.  Using a field just makes it worse IMO... it'll
 never get removed if we do that.
 
 -Stephen
 
 
 On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote:
 
 There are a goodly number of pieces (150) of dead code in Jena, of the
 form:
 
 org.apache.jena.mem.HashCommon:
 
 void showkeys()
 {
 if (false)
 {
 System.err.print(  KEYS: );
 // some logging code
 System.err.println();
 }
 }
 
 If I understand this rightly, these are cases where we want to keep some
 code on deck for potential use. I'd like to suggest that many of these
 guys might be rewritten with a field or fields in the class, something like:
 
 boolean useLoggingCode = false;
 
 void showkeys()
 {
 if (useLoggingCode)
 etc.
 }
 
 This would make things a bit clearer and clean out a bunch of compiler
 warnings.
 
 Does this sound like a good approach? Worth doing?
 
 ---
 A. Soroka
 The University of Virginia Library

Re: another possible simplification

2015-05-07 Thread aj...@virginia.edu

Okay, that makes sense, although in some ways it seems more like a rationale 
for keeping _an_ interface, rather than a rationale that Jena should have its 
_own_ interface. But keeping Guava types from leaking through makes sense.

I will send a PR sometime soon with some Java 8 work in those Cache 
implementations (e.g. taking advantage of the new Map.computeIfAbsent() method 
to shorten and tighten some code and maybe using default method implementations 
to be a bit DRYer), but I won't alter Jena's Cache type itself. Then everyone 
can decide whether those are legitimate improvements in implementation without 
effect on API.

---
A. Soroka
The University of Virginia Library

On May 7, 2015, at 5:53 AM, Andy Seaborne a...@apache.org wrote:

 On 06/05/15 22:30, A. Soroka wrote:
 I've found another candidate for simplification (well, actually for
 excision):
 
 org.apache.jena.atlas.lib.cache contains several classes that
 implement an interface org.apache.jena.atlas.lib.Cache. This
 interface very closely resembles Guava's
 com.google.common.cache.Cache, and I believe that the Guava type
 could be substituted without too much fuss. The entire package
 org.apache.jena.atlas.lib.cache could go away, along with the
 
 Does this seem like a worthwhile replacement?
 
 --- A. Soroka The University of Virginia Library
 
 
 We do now use Guava cache (shaded) with the CacheGuava implementation of
 Cache.  The naming of getIfPresent or getOrFill was put in recently to
 reflect the Guava design (and the safe atomic getOrFill is quite useful
 sometimes - not always - TDB has bi-caches that need synhronized chnages).
 
 Having our own Cache interface means different providers can be used. We may 
 find a better one for certain circumstances. The interface stops guava-isms 
 like RemovalNotification leaking out too far.
 
 So on balance, I'm more inclined to keep it because we have the current 
 interface already.
 
   Andy

Re: What can be removed/simplified ?

2015-05-04 Thread aj...@virginia.edu

Thank you-- that sounds like a good move to make to prevent myself from 
breaking backwards compatibility.

What would be the best way to incorporate your material into my Java 8-related 
work? Would it be best to wait for it to be merged, or is that some time away?

---
A. Soroka
The University of Virginia Library

On May 2, 2015, at 3:50 AM, Claude Warren cla...@xenei.com wrote:

 I have ExtendedIterator contract tests in the new test suite.  So we should
 have reasonable test cover for the contract.  That code is in the old
 new_test branch and will be in the new contract test branch soon.  I you
 want I can send you the source to test your implementation with.  This will
 mean adding junit-contracts as a dependency for your tests.
 
 Claude
 
 On Fri, May 1, 2015 at 5:26 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Yes, in that case, the change was no more than extends FilterT -
 implements PredicateT. No other changes.
 
 You can take a look at what's going on at:
 
 https://github.com/apache/jena/pull/55
 
 and please comment! As a Jena newbie, I need comments. {grin}
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote:
 
 An example is:
 
 org.apache.jena.security.utils.RDFListSecFilter
 
 Which filters results based on user access and is used whereever a
 RDFList
 (or an iterator on one) is returned .
 
 Claude
 
 On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Oh, now I think I understand your point better.
 
 Yes, I have already trawled that code and worked over those reusable
 guys,
 and yes, you will certainly still be able to combine and reuse
 Predicates
 in the same way that you have used Filters. When I get this PR in, you
 can
 see some examples of that.
 
 A Java 8 Predicate is just an interface that looks much like Jena's
 Filter, which can benefit from the - lamda syntax and which is
 designed to
 fit into the Java 8 language APIs (e.g. for use with Streams).
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote:
 
 We have a number of places where Filter objects are created and reused
 (usueally due to complexity or to reduce the code footprint in terms of
 debugging).  Will it still be possible to define these complex filters
 and
 use them in multiple places.
 
 The permissions system does this in that it creates a filter for
 RDFNodes
 and then applies them to the 3 elements in a triple to create a single
 filter for triples.
 
 There are several cases like this.
 
 I will have to look at the permissions code to find a concrete example,
 but
 I think this is the case.
 
 Claude
 
 On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu
 
 wrote:
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 I think this was in response to my question about Filter?
 
 If you mean that things that currently implement Filter (outside of
 Jena's
 own code) will not be greatly affected, then yes, so I would hope. I
 will
 @Deprecated Filter and its methods, but that seems to me to be all
 that
 is
 needed for this first step.
 
 I should have a PR with this later today, when you can observe some
 real
 code and give me feedback.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote:
 
 I don't see any reason not to remove the Node functions.
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org
 wrote:
 
 (mainly for Claude - I did check jena-pemissions and didn't see any
 usage)
 
 There are a bunch of deprecated statics in Node (the correct way is
 to
 use
 NodeFactory)
 
 Node.createAnon()
 Node.createAnon(AnonId)
 Node.createLiteral(LiteralLabel)
 Node.createURI(String)
 Node.createVariable(String)
 Node.createLiteral(String)
 Node.createLiteral(String, String, boolean)
 Node.createLiteral(String, String, RDFDatatype)
 Node.createLiteral(String, RDFDatatype)
 Node.createUncachedLiteral(Object, String, RDFDatatype)
 Node.createUncachedLiteral(Object, RDFDatatype)
 
 It looks like they are not used by the jena codebase and are there
 for
 compatibility only.
 
 Any reason not to remove them?
 
 Andy
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com

Re: Java 8 Streams Was: What can be removed/simplified ?

2015-05-04 Thread aj...@virginia.edu

Thank you for the heads up: I was unaware of Commons Functor. It is nice to 
see the Commons project put a product in that space. I notice that Functor's 
basic types do not inherit from the recently introduced Java 8 types (e.g. 
Function, BiFunction), and that in fact, by a glance at some of its POMs, 
Functor seems to be using Java 5. Is there some expectation of moving that 
forward, or is Functor expected to bridge older versions of Java?

---
A. Soroka
The University of Virginia Library

On May 2, 2015, at 7:14 PM, Bruno P. Kinoshita ki...@apache.org wrote:

 It would let Jena cut out a fair bit of API and implementation code in favor 
 of letting Java itself do the work.
 
 +1
 Does this seem like a useful direction of work? I believe it could be 
 undertaken without being disruptive, and even without too much code churn 
 except when introducing Stream into the core. If it sounds like a good idea, 
 I would be happy to begin it.
 I will take a look at each item later, but probably others can confirm 
 whether that makes sense or not, since I'm still getting myself more familiar 
 with Jena code base. 
 
 But on a side note, I'm planning to start a few dev cycles on Apache Commons 
 Functor in June/July. The idea of the project is to provide FP extensions to 
 Java, much like Commons Lang does for the general language. If while you are 
 working on adding Java 8 to Jena you find yourself creating code that you 
 think could be useful for other projects, please feel free to submit an issue 
 to https://issues.apache.org/jira/browse/FUNCTOR or ping the commons dev 
 mailing list :-)
 All the bestBruno
 
 
  From: aj...@virginia.edu aj...@virginia.edu
 To: dev@jena.apache.org 
 Sent: Saturday, May 2, 2015 6:05 AM
 Subject: Java 8 Streams Was: What can be removed/simplified ?
 
 I've noticed a few more places where some Java 8 changes could be brought 
 into play in the interest of simplification, and in particular, the use of 
 Java 8 Streams seems like a nice way to go. It would let Jena cut out a fair 
 bit of API and implementation code in favor of letting Java itself do the 
 work. Here is a small program of incremental changes that I'd like to propose:
 
 - We could move NiceIterator's methods up into ExtendedIterator as default 
 implementations and factor NiceIterator out of existence.
 
 - Then, we could migrate the API of ExtendedIterator to be a close analog to 
 a subset of the API of Java 8's Stream. (It's not too far away right now.)
 
 - Then, we could begin replacing the use of ExtendedIterator, its subtypes 
 (e.g. StmtIterator), and their implementations with Java 8 Streams. That will 
 certainly take a few steps in itself, since ExtendedIterator is in use all 
 over, but I'm confident (perhaps arrogantly so {grin}) that replacing its use 
 at some fairly low-lying levels (I think around and just below 
 TripleStore.find(Triple)) will allow some quick replacement moves at the 
 levels above.
 
 - Then, we could begin exposing StreamTs in the signatures of new methods 
 on very public-facing types like Model. For example, by analogy to 
 Model.listSubjects() returning ResIterator, there could also be 
 Model.streamSubjects() returning StreamResource.
 
 And then, I hope, the community would begin migrating away from the 
 ExtendedIterator methods and to the Java 8 StreamT methods, because Stream 
 has so much attractive functionality available.
 
 Does this seem like a useful direction of work? I believe it could be 
 undertaken without being disruptive, and even without too much code churn 
 except when introducing Stream into the core. If it sounds like a good idea, 
 I would be happy to begin it.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 
 
 On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote:
 
 An example is:
 
 org.apache.jena.security.utils.RDFListSecFilter
 
 Which filters results based on user access and is used whereever a RDFList
 (or an iterator on one) is returned .
 
 Claude
 
 On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Oh, now I think I understand your point better.
 
 Yes, I have already trawled that code and worked over those reusable guys,
 and yes, you will certainly still be able to combine and reuse Predicates
 in the same way that you have used Filters. When I get this PR in, you can
 see some examples of that.
 
 A Java 8 Predicate is just an interface that looks much like Jena's
 Filter, which can benefit from the - lamda syntax and which is designed to
 fit into the Java 8 language APIs (e.g. for use with Streams).
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote:
 
 We have a number of places where Filter objects are created and reused
 (usueally due to complexity or to reduce the code footprint in terms of
 debugging).  Will it still be possible to define these complex filters
 and
 use them in multiple places

Re: Java 8 Streams Was: What can be removed/simplified ?

2015-05-04 Thread aj...@virginia.edu

Of course you are right about the balance to be made for performance. Perhaps 
this is a chance for me to check my understanding of Jena's architecture: to my 
examination, in jena-core there is no possibility to control that balance 
because jena-core abstractions do not understand the differences between 
resources that are near compute and those farther away in the network. That 
only becomes apparent to modules like jena-tdb. 

Truthfully, the qualities that attract me to this change are not performance or 
power, but concision and clarity.

I'm very familiar with Guava's Iterators, Iterables, FluentIterable, etc. but I 
don't think they offer much more than Jena's ExtendedIterator now has with 
respect to API. I certainly wouldn't mind replacing some of Jena's 
implementation code with functions from Guava, which are exceedingly 
well-exercised, and if it seems reasonable to increase the footprint in Guava 
that now obtains in the codebase, I could do that as part of a migration of 
NiceIterator into ExtendedIterator. My overall aim here (which may or may not 
be a good or important one in the context of the whole project) is to replace a 
reasonable amount of the Jena-homegrown portions of both API and implementation 
with functionally- and ergonomically- equivalent-or-superior common property 
from the largest possible community.

As to fluent syntax for basic types, are you referring to the needful plethora 
of calls to ResourceFactory.createResource() and .createLiteral() and the like? 
(Because I'm not a big fan of that sort of thing, myself. {grin})

---
A. Soroka
The University of Virginia Library

On May 4, 2015, at 2:08 PM, Paul Houle ontolo...@gmail.com wrote:

 I use the JDK8 stream stuff a lot these days but it certainly has its
 discontents.  In particular the parallel stuff is based on the Fork/Join
 framework;  it seems to do OK on correctness,  which puts it ahead of some
 miracle frameworks for parallelization.  However,  if you understand the
 rough balance between concurrency overhead,  cpu time, and time spent
 waiting for resources far from the cpu,  you can quickly tune
 ExecutorService to get much better speedup more reliably and also pipeline
 tasks which makes a big difference.
 
 Still I like the idea of being able to turn result sets to streams with a
 .stream() operator.
 
 The Google guava library has a system that does stream()-like operations to
 Iterables and Iterators and right now I like the syntax better possibly
 because I have been using it so long (with Jena objects)
 
 In the other direction you have Spark,  where you are writing what looks
 like the same kind of code but you have many options in terms of threads,
 clusters,  memory or on-disk,  etc.
 
 
 
 As for those statics,  I'd say I want to see a more fluent syntax for
 common Jena operations.  For instance,  I use the Jena in-memory model the
 way that most programmers use hashtables.  With the models you have all the
 cool Resource and Property types but you need to write code to create
 these things to put them in all the slots and it starts to obscure the
 simplicity of what is going on.
 
 
 
 On Mon, May 4, 2015 at 11:50 AM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Thank you for the heads up: I was unaware of Commons Functor. It is nice
 to see the Commons project put a product in that space. I notice that
 Functor's basic types do not inherit from the recently introduced Java 8
 types (e.g. Function, BiFunction), and that in fact, by a glance at some of
 its POMs, Functor seems to be using Java 5. Is there some expectation of
 moving that forward, or is Functor expected to bridge older versions of
 Java?
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 2, 2015, at 7:14 PM, Bruno P. Kinoshita ki...@apache.org wrote:
 
 It would let Jena cut out a fair bit of API and implementation code in
 favor of letting Java itself do the work.
 
 +1
 Does this seem like a useful direction of work? I believe it could be
 undertaken without being disruptive, and even without too much code churn
 except when introducing Stream into the core. If it sounds like a good
 idea, I would be happy to begin it.
 I will take a look at each item later, but probably others can confirm
 whether that makes sense or not, since I'm still getting myself more
 familiar with Jena code base.
 
 But on a side note, I'm planning to start a few dev cycles on Apache
 Commons Functor in June/July. The idea of the project is to provide FP
 extensions to Java, much like Commons Lang does for the general language.
 If while you are working on adding Java 8 to Jena you find yourself
 creating code that you think could be useful for other projects, please
 feel free to submit an issue to
 https://issues.apache.org/jira/browse/FUNCTOR or ping the commons dev
 mailing list :-)
 All the bestBruno
 
 
 From: aj...@virginia.edu aj...@virginia.edu
 To: dev@jena.apache.org
 Sent: Saturday, May 2, 2015 6:05 AM
 Subject

Java 8 Streams Was: What can be removed/simplified ?

2015-05-02 Thread aj...@virginia.edu

I've noticed a few more places where some Java 8 changes could be brought into 
play in the interest of simplification, and in particular, the use of Java 8 
Streams seems like a nice way to go. It would let Jena cut out a fair bit of 
API and implementation code in favor of letting Java itself do the work. Here 
is a small program of incremental changes that I'd like to propose:

- We could move NiceIterator's methods up into ExtendedIterator as default 
implementations and factor NiceIterator out of existence.

- Then, we could migrate the API of ExtendedIterator to be a close analog to a 
subset of the API of Java 8's Stream. (It's not too far away right now.)

- Then, we could begin replacing the use of ExtendedIterator, its subtypes 
(e.g. StmtIterator), and their implementations with Java 8 Streams. That will 
certainly take a few steps in itself, since ExtendedIterator is in use all 
over, but I'm confident (perhaps arrogantly so {grin}) that replacing its use 
at some fairly low-lying levels (I think around and just below 
TripleStore.find(Triple)) will allow some quick replacement moves at the levels 
above.

- Then, we could begin exposing StreamTs in the signatures of new methods on 
very public-facing types like Model. For example, by analogy to 
Model.listSubjects() returning ResIterator, there could also be 
Model.streamSubjects() returning StreamResource.

And then, I hope, the community would begin migrating away from the 
ExtendedIterator methods and to the Java 8 StreamT methods, because Stream 
has so much attractive functionality available.

Does this seem like a useful direction of work? I believe it could be 
undertaken without being disruptive, and even without too much code churn 
except when introducing Stream into the core. If it sounds like a good idea, I 
would be happy to begin it.

---
A. Soroka
The University of Virginia Library

On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote:

 An example is:
 
 org.apache.jena.security.utils.RDFListSecFilter
 
 Which filters results based on user access and is used whereever a RDFList
 (or an iterator on one) is returned .
 
 Claude
 
 On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Oh, now I think I understand your point better.
 
 Yes, I have already trawled that code and worked over those reusable guys,
 and yes, you will certainly still be able to combine and reuse Predicates
 in the same way that you have used Filters. When I get this PR in, you can
 see some examples of that.
 
 A Java 8 Predicate is just an interface that looks much like Jena's
 Filter, which can benefit from the - lamda syntax and which is designed to
 fit into the Java 8 language APIs (e.g. for use with Streams).
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote:
 
 We have a number of places where Filter objects are created and reused
 (usueally due to complexity or to reduce the code footprint in terms of
 debugging).  Will it still be possible to define these complex filters
 and
 use them in multiple places.
 
 The permissions system does this in that it creates a filter for RDFNodes
 and then applies them to the 3 elements in a triple to create a single
 filter for triples.
 
 There are several cases like this.
 
 I will have to look at the permissions code to find a concrete example,
 but
 I think this is the case.
 
 Claude
 
 On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 I think this was in response to my question about Filter?
 
 If you mean that things that currently implement Filter (outside of
 Jena's
 own code) will not be greatly affected, then yes, so I would hope. I
 will
 @Deprecated Filter and its methods, but that seems to me to be all that
 is
 needed for this first step.
 
 I should have a PR with this later today, when you can observe some real
 code and give me feedback.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote:
 
 I don't see any reason not to remove the Node functions.
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote:
 
 (mainly for Claude - I did check jena-pemissions and didn't see any
 usage)
 
 There are a bunch of deprecated statics in Node (the correct way is to
 use
 NodeFactory)
 
 Node.createAnon()
 Node.createAnon(AnonId)
 Node.createLiteral(LiteralLabel)
 Node.createURI(String)
 Node.createVariable(String)
 Node.createLiteral(String)
 Node.createLiteral(String, String, boolean)
 Node.createLiteral(String, String, RDFDatatype)
 Node.createLiteral(String, RDFDatatype)
 Node.createUncachedLiteral(Object, String, RDFDatatype)
 Node.createUncachedLiteral(Object

Re: What can be removed/simplified ?

2015-05-01 Thread aj...@virginia.edu

Oh, now I think I understand your point better.

Yes, I have already trawled that code and worked over those reusable guys, and 
yes, you will certainly still be able to combine and reuse Predicates in the 
same way that you have used Filters. When I get this PR in, you can see some 
examples of that.

A Java 8 Predicate is just an interface that looks much like Jena's Filter, 
which can benefit from the - lamda syntax and which is designed to fit into 
the Java 8 language APIs (e.g. for use with Streams).

---
A. Soroka
The University of Virginia Library

On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote:

 We have a number of places where Filter objects are created and reused
 (usueally due to complexity or to reduce the code footprint in terms of
 debugging).  Will it still be possible to define these complex filters and
 use them in multiple places.
 
 The permissions system does this in that it creates a filter for RDFNodes
 and then applies them to the 3 elements in a triple to create a single
 filter for triples.
 
 There are several cases like this.
 
 I will have to look at the permissions code to find a concrete example, but
 I think this is the case.
 
 Claude
 
 On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 As for the Filter implementation. will that be transparant to filter
 implementations?  I assume so.
 
 I think this was in response to my question about Filter?
 
 If you mean that things that currently implement Filter (outside of Jena's
 own code) will not be greatly affected, then yes, so I would hope. I will
 @Deprecated Filter and its methods, but that seems to me to be all that is
 needed for this first step.
 
 I should have a PR with this later today, when you can observe some real
 code and give me feedback.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote:
 
 I don't see any reason not to remove the Node functions.
 
 As for the Filter implementation. will that be transparant to filter
 implementations?  I assume so.
 
 On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote:
 
 (mainly for Claude - I did check jena-pemissions and didn't see any
 usage)
 
 There are a bunch of deprecated statics in Node (the correct way is to
 use
 NodeFactory)
 
 Node.createAnon()
 Node.createAnon(AnonId)
 Node.createLiteral(LiteralLabel)
 Node.createURI(String)
 Node.createVariable(String)
 Node.createLiteral(String)
 Node.createLiteral(String, String, boolean)
 Node.createLiteral(String, String, RDFDatatype)
 Node.createLiteral(String, RDFDatatype)
 Node.createUncachedLiteral(Object, String, RDFDatatype)
 Node.createUncachedLiteral(Object, RDFDatatype)
 
 It looks like they are not used by the jena codebase and are there for
 compatibility only.
 
 Any reason not to remove them?
 
   Andy
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: What can be removed/simplified ?

2015-05-01 Thread aj...@virginia.edu

Yes, in that case, the change was no more than extends FilterT - 
implements PredicateT. No other changes.

You can take a look at what's going on at:

https://github.com/apache/jena/pull/55

and please comment! As a Jena newbie, I need comments. {grin}

---
A. Soroka
The University of Virginia Library

On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote:

 An example is:
 
 org.apache.jena.security.utils.RDFListSecFilter
 
 Which filters results based on user access and is used whereever a RDFList
 (or an iterator on one) is returned .
 
 Claude
 
 On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 Oh, now I think I understand your point better.
 
 Yes, I have already trawled that code and worked over those reusable guys,
 and yes, you will certainly still be able to combine and reuse Predicates
 in the same way that you have used Filters. When I get this PR in, you can
 see some examples of that.
 
 A Java 8 Predicate is just an interface that looks much like Jena's
 Filter, which can benefit from the - lamda syntax and which is designed to
 fit into the Java 8 language APIs (e.g. for use with Streams).
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote:
 
 We have a number of places where Filter objects are created and reused
 (usueally due to complexity or to reduce the code footprint in terms of
 debugging).  Will it still be possible to define these complex filters
 and
 use them in multiple places.
 
 The permissions system does this in that it creates a filter for RDFNodes
 and then applies them to the 3 elements in a triple to create a single
 filter for triples.
 
 There are several cases like this.
 
 I will have to look at the permissions code to find a concrete example,
 but
 I think this is the case.
 
 Claude
 
 On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu
 wrote:
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 I think this was in response to my question about Filter?
 
 If you mean that things that currently implement Filter (outside of
 Jena's
 own code) will not be greatly affected, then yes, so I would hope. I
 will
 @Deprecated Filter and its methods, but that seems to me to be all that
 is
 needed for this first step.
 
 I should have a PR with this later today, when you can observe some real
 code and give me feedback.
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote:
 
 I don't see any reason not to remove the Node functions.
 
 As for the Filter implementation. will that be transparant to
 filter
 implementations?  I assume so.
 
 On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote:
 
 (mainly for Claude - I did check jena-pemissions and didn't see any
 usage)
 
 There are a bunch of deprecated statics in Node (the correct way is to
 use
 NodeFactory)
 
 Node.createAnon()
 Node.createAnon(AnonId)
 Node.createLiteral(LiteralLabel)
 Node.createURI(String)
 Node.createVariable(String)
 Node.createLiteral(String)
 Node.createLiteral(String, String, boolean)
 Node.createLiteral(String, String, RDFDatatype)
 Node.createLiteral(String, RDFDatatype)
 Node.createUncachedLiteral(Object, String, RDFDatatype)
 Node.createUncachedLiteral(Object, RDFDatatype)
 
 It looks like they are not used by the jena codebase and are there for
 compatibility only.
 
 Any reason not to remove them?
 
  Andy
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 --
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren
 
 
 
 
 -- 
 I like: Like Like - The likeliest place on the web
 http://like-like.xenei.com
 LinkedIn: http://www.linkedin.com/in/claudewarren

Re: What can be removed/simplified ?

2015-05-01 Thread aj...@virginia.edu

Great! Thank you.

Would Jena be similarly interested in trying to migrate 
org.apache.jena.util.iterator.Filter to java.util.function.Predicate?

---
A. Soroka
The University of Virginia Library

On May 1, 2015, at 3:32 AM, Andy Seaborne a...@apache.org wrote:

 On 30/04/15 17:11, aj...@virginia.edu wrote:
 I'm a long-time user of Jena, but entirely new to its internals, so
 this may be a very off-the-mark opinion, but perhaps
 org.apache.jena.util.iterator.Map1 could be swapped out for
 java.util.function.Function?
 
 --- A. Soroka The University of Virginia Library
 
 Hi there,
 
 Not off-the-mark at all.  There are lots of places where there is old code 
 that can be more naturally written in Java8.  Your pull request looks very 
 interesting. Thank you.
 
   Andy

Re: What can be removed/simplified ?

2015-04-30 Thread aj...@virginia.edu

I'm a long-time user of Jena, but entirely new to its internals, so this may be 
a very off-the-mark opinion, but perhaps org.apache.jena.util.iterator.Map1 
could be swapped out for java.util.function.Function?

---
A. Soroka
The University of Virginia Library

On Apr 30, 2015, at 10:39 AM, Andy Seaborne a...@apache.org wrote:

 On 30/04/15 07:20, Claude Warren wrote:
 While we are at it I would like to see that restriction that requires
 Grahp.getStatisticsHandler() to return the same instance every time
 removed.  It makes proxies much more difficult.
 
 Makes sense.
 
 (I would note that statistics for anything involving 2 out of 3 of the args 
 are not provided by any graph that I can find).
 
   Andy
 
 
 On Wed, Apr 29, 2015 at 8:58 PM, Andy Seaborne a...@apache.org wrote:
 
 On 29/04/15 18:17, Claude Warren wrote:
 
 I use the following:
 
 addAllowed()  is used by contract tests to determine if triples can be
 added.  Also, permission system sets this based on the users permissions.
 canBeEmpty() is used by the contract tests to determine if the deleteAll
 methods should return an empty graph.
 
 
 When is this ever false? Inference graphs?
 
 (This is not used in the current codebase as far as I can see)
 
  deleteAllowd() same use as addAllowed()
 iteratorRemoveAllowed() -- this is handy to know before the remove is
 attempted.
 
 
 This isn't honoured everywhere IIRC.  You're only looking in jena-core.
 
  sizeAccurate() -- this is used by the contract testing to determine if
 delete and add should alter the number of records reported.  I am also
 looking at adding some hash joining capabilities and knowing if the sizes
 are accurate may make a difference.  But that is all future stuff.
 
 
 FYI:
 https://github.com/afs/quack
 
  These I don't use and can see being removed:
 
 findContractSafe() -- I don't know what this one means and have never used
 it.
 handlesLiteralTyping() -- This was used but obviously sine all graphs now
 have to support literal typing this can be removed.
 
 
 and presumably not addAllowed(boolean), deleteAllowed(boolean)
 
 (I find the boolean form unhelpful because they don't say what triples can
 and can not be add/deleted.)
 
 so let's try removing:
 
 addAllowed(boolean)
 deleteAllowed(boolean)
 findContractSafe()
 handlesLiteralTyping()
 
 
 Andy
 
 
 On Wed, Apr 29, 2015 at 4:14 PM, Andy Seaborne a...@apache.org wrote:
 
  On 29/04/15 16:04, Claude Warren wrote:
 
  I have no problem retiring FileGraph
 
 Capabilities is another issue.  I have used it and it is used in several
 places in the contract tests where we have to know if the graph supports
 transactions and the like.  I find it useful.  In addtion the
 information
 containd in the capabilities is often not easily discoverable (if at
 all).
 
 
  Transactions aren't in the Capabilities interface.
 
 Which aspects of the Capabilities interface?  Some looks to be out of
 date
 (findContractSafe); some are not the right question
 (handlesLiteralTyping)
 
  Andy
 
 
 
 
 On Wed, Apr 29, 2015 at 9:45 AM, Andy Seaborne a...@apache.org wrote:
 
   Claude's email about FileGraph prompted me to think for Jena3:
 
 
   What can be removed?
   What can be simplifed?
 
 Even while keeping the current APIs, there is going to stuff that isn't
 used, isn't used much, or even gets in the way.  For maintainability,
 effectively unused features are noise and risk needing to maintain
 contracts that users don't actually use.
 
 Some things that come to mind in jena-core:
 
 FileGraph [*]
 Capabilities
 GraphTransactionHandler
 
 and with advocacy:
 
 RDFReaderF, RDFWriterF (with RIOT integration; caution for RDF/XML)
 
 Some places where interfaces don't seem to add anything:
 
 LiteralLabelImpl
 
 (actually the whole LiteralLabel thing is worth looking at - maybe we
 can
 pull the whole thing into into Node_Literal itself)
 
 AnonIds - maybe leave in the RDFAPI (they cross the boundary) but
 internall bNodes can be a couple of longs for their state (a UUID in
 other
 words, not a UID).
 
   Andy
 
 [*]
 In Java8, the app, or library code, could do this better as:
 
 update(()-{
  ... graph changes ...
 }
 
 and update() does the on-disk backup stuff.

55 matches

Mail list logo