Re: Maintenance branches
I think Claude introduced the idea of LTS releases, so I'm curious about whether he thinks that the audience for stability includes people who would use a "stable" series of the kind Osma describes, even without the Apache imprimatur. ajs6f > On Jan 24, 2017, at 2:57 PM, Andy Seabornewrote: > > > >> On 24/01/17 12:57, Osma Suominen wrote: >> 23.01.2017, 19:31, Andy Seaborne kirjoitti: >> >>> To expand on that: That would mean users could get source code to build >>> themselves, it would not be an "Apache release" and not in maven >>> central. For "products", the legal side of a release probably matters. >> >> Source code yes, but I think it would make sense to set up some kind of >> autobuilder for the stable branch, similar to how snapshots are built >> nightly. It shouldn't be much effort to set this up, but it would be a >> valuable service for users. > > It's not an Apache release. > > Snapshots are specifically allowed for developers which we include anyone > picking and testing. > > They are not releases. > > Products that want LTS stability will, I believe, want: > * The ASF release legal framework > * Assurance that the LTS will be around for the life of the product > * Ideally, support contracts (3rd party) > > It is likely because they don't have the technical capabilities or resources > in-house to investigate and report, let alone fix. > > The trouble really comes when a "bug fix" is a feature change. If the bug is > not some low thing like an NPE, one products view of a "fix" is another > products regression. > > (Believe me! It's happing to me right now - a SPARQL fix to comply with the > standard has causes interesting changes.) > > --- > > There are three options here: > > * Current > Advantage: bug fixes, most timely. > Disadvantage: picks up everything > > * A "last release+fixes" branch > Not a release ... unless voted on > Not long term stability (product life : years) > Some extra work > > * LTS > Long term commitment. > More work. > > And a point about LTS - more bug reports are nice, but contributions of fixes > is much better. > > I'm not convinced that item 2 would be much used - they last only 4 or 6 > months as I understand the concept. > > Events like Jena2->Jena3 are extremely rare. Otherwise, we add features, not > remove them, backwards compatibility is as good as a stable branch (I would > hope!). The low-cost way of careful adding to master seems to me best unless > we have additional contributions of fixes (not just reports) or other > resourcing. > >Andy
Re: Maintenance branches
I've had some experience with protocols such as are described by Osma and I think that they have real value for (particularly for large) users and sites. And as he says, they can be automated. I would be willing to help with that. I would like to learn more about Apache infrastructure. That having been said, I must also agree with Andy that, much as we might like to provide them, true LTS releases are probably beyond our strength right now. I wonder if any vendors are currently offering such a product? ajs6f > On Jan 24, 2017, at 7:57 AM, Osma Suominenwrote: > > 23.01.2017, 19:31, Andy Seaborne kirjoitti: > >> To expand on that: That would mean users could get source code to build >> themselves, it would not be an "Apache release" and not in maven >> central. For "products", the legal side of a release probably matters. > > Source code yes, but I think it would make sense to set up some kind of > autobuilder for the stable branch, similar to how snapshots are built > nightly. It shouldn't be much effort to set this up, but it would be a > valuable service for users. > >>> Currently when a user discovers and reports a bug in Jena and it gets >>> fixed in master, the user has to choose between waiting for the next >>> release or using a snapshot, >> or cherry picking - it's a distributed version control system! > > You're right, but it takes some effort and understanding of the git tree and > how to build it. > >>> which may have other unrelated issues due >>> to ongoing development. With a stable branch, there would be a third >>> option - like the previous release, but with some bugs fixed. >> >> If we want a proper release, it's a vote - quite doable, just needs >> someone to do it. > > Yes, having more frequent releases and distributing the RM burden further are > both excellent new developments. If new releases are made frequently, there > is less need for a stable branch. > > I'm not proposing creating such a stable branch at the moment, just pointing > out that if we want to better serve users who need a (semi-)supported > non-development version, a stable branch like this could be a solution that > wouldn't require much extra effort from the developers. > > -Osma > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > osma.suomi...@helsinki.fi > http://www.nationallibrary.fi
Re: Jena system initialization
> > When a class is loaded, the is not run. That happens on first use. > Ah, this is what I did not understand correctly. Okay, no problem. IMHO JenaSystem.init is a step in the right direction and better way than > the current way initialization is done. > No argument here. What would be helpful at this point is concrete improvements/alternatives and concrete evaluation esp in OSGi. [*] I would like to do that, but I am still working JENA-624, _very_ slowly (bits of Sunday afternoons only). Ah, the joys of all-volunteer projects, right? {grin} --- A. Soroka The University of Virginia Library On Sun, Sep 20, 2015 at 9:53 AM, Andy Seabornewrote: > On 20/09/15 12:16, A. Soroka wrote: > >> On Sep 18, 2015, at 6:21 PM, Andy Seaborne wrote: >> >>> >>> How would people have the chance to call JenaSystem::set before those static initializers run? Or am I misunderstanding the use of the hook? >>> >>> The documentation >>> >>>/** >>> * Set the {@link JenaSubsystemRegistry}. >>> * To have any effect, this function must be called before any other >>> Jena code, >>> * and especially before calling {@code JenaSystem.init()}. >>> */ >>> >>> Touch Jena and it initialises. >>> >> >> Yes, this is what I don’t understand. If JenaSystem.init() is >> gettingcalled in static initializers (and it is) then that means I must >> somehow >> call JenaSystem.set() in static code, too, or I don’t see how it happen >> before JenaSystem.init(). Even then, it doesn’t seem that I can >> guarantee that my JenaSystem.set() call will precede JenaSystem.init() >> getting called. >> > > This works with "*** LOAD" printed and running with debug on: > >JenaSubsystemRegistry r = new JenaSubsystemRegistryBasic() { >@Override >public void load() { >System.err.println("*** LOAD") ; >super.load(); >} >} ; > >// Set the sub-system registry >JenaSystem.setSubsystemRegistry(r); > >// Enable output if required >JenaSystem.DEBUG_INIT = true ; > >// Initialize Jena directly or indirectly >//JenaSystem.init() ; >ModelFactory.createDefaultModel() ; > > and ModelFactory has a static initializer. This is the first call to > "other Jena code". When a class is loaded, the is not run. That > happens on first use. > > Run the example code above uncommenting the "System.init()" and you will > see that the call in ModelFactory returns early (the recursive > initialization problem - much discussed in the comments and a well known > Java issue.) if the explicit JenaSystem.init() is used. > > > IMHO JenaSystem.init is a step in the right direction and better way than > the current way initialization is done. > > What would be helpful at this point is concrete improvements/alternatives > and concrete evaluation esp in OSGi. [*] > > Andy > > > http://jena.staging.apache.org/documentation/notes/system-initialization.html > > [*] JENA-913 : The OSGi integration testing in the build is broken. > > > >> --- >> A. Soroka >> The University of Virginia Library >> >> >> >
Re: Jena OSGi (was: [] 201508 Release of 23 Clerezza modules)
I've used OSGi enough to understand why Class.forName() is problematic. Some of these uses, however, seem like pretty legitimate dynamic code, for example the assembler subsystem. An OSGi solution to that need might be the OSGi service registry, but that's obviously not useful here. Some of the other uses could be replaced with the use of a plain Java ServiceLoader. I'm not sure what you mean by "Such registrations should instead by done with the java.lang.Class parameters - which can then be used directly." but I think your message was cut off? --- A. Soroka The University of Virginia Library On Sep 10, 2015, at 5:50 PM, Stian Soiland-Reyes <st...@apache.org> wrote: > Last time I looked at interdependencies there were several > Class.forName() calls around in jena-core and jena-arq > - see https://paste.apache.org/5y0W > > Class.forName() depends on the ClassLoader of the caller (by > introspecting the call stack) - but in OSGi there are multiple > ClassLoaders, think of it as one per JAR - and they can only access > packages that are declared as Imports in their META-INF/MANIFEST.MF > > > This would falls apart if the class to be called is not explicitly > included in the OSGi imports. Some of these were for instance with > jena-arq parsers and writers registering themselves with classname in > jena-core - but jena-core can't access jena-arq classes in OSGi > (Although circular imports are technically allowed in OSGi it's not > usually a good idea). > > > Now we have Jena 3, but we still have the duplication between > RDFReaderFImpl in jena-core and IO_JenaReaders in jena-arq - so this > is very much a real problem, because using riot would autoregister its > classnames in RDFReaderFImp. Third-party callers could also be > registering - although RDFReaderFImp is screaming "imp imp" all over > the place, so we should be free to change that. > > > Such registrations should instead by done with the java.lang.Class > parameters - which can then be used directly. The > > > > > > > > On 10 September 2015 at 22:50, Stian Soiland-Reyes <st...@apache.org> wrote: >> On 10 September 2015 at 18:13, aj...@virginia.edu <aj...@virginia.edu> wrote: >>> If this is a matter of "just a couple of lines in the manifest file" cannot >>> a patch be created to do that in Jena itself? Are there inter-module >>> dependency issues that make that difficult? >> >> >> In theory just setting >> >> bundle >> >> and using the maven-bundle-plugin >> is enough to auto-generate the correct META-INF metadata for OSGi. >> This can be customized (as we do for the apache-jena-osgi/jena-osgi >> module). >> >> One complication is if the external dependencies are OSGi or not - >> httpclient is one tricky one as it has done the same as Jena and >> provided a separate wrapper httpclient-osgi (and httpcore-osgi) - >> however the way they did this with Maven means that just using it as a >> dependency would still pull in a dependency on the regular httpclient >> library. So if you are a non-OSGi user you would then see the >> httpclient classes in two JARs - which with Maven version resolution >> could easily become in mismatched versions. >> >> Ironically httpclient-osgi does not depend on httpcore-osgi - so the >> one dependency that it truly need isn't stated in its pom. >> >> >> In jena-osgi I therefore excluded all those deeper dependencies: >> >> See >> https://github.com/apache/jena/blob/master/apache-jena-osgi/jena-osgi/pom.xml#L165 >> >> To avoid Jena-Maven-OSGi users the same issue, I similarly here >> slightly misused the provided for the dependencies that >> are not to be dependencies of the final jena-osgi JAR, but which are >> shadowed inside. >> https://github.com/apache/jena/blob/master/apache-jena-osgi/jena-osgi/pom.xml#L107 >> >> >> If we move to bundle then we should get >> httpclient folks to sort out their poms upstream so we can rely on >> them in a cleaner fashion across Jena. (or put this exclusion rule >> into of jena-parent) - I don't think >> copy-pasting that big block around anything that directly >> or indirectly requires things like httpclient is good. >> >> Ideally they should also move to bundle and >> avoid *-osgi, which would of course simplify things. >> >> >> There are other potential issues as Class.forName() which the current >> jena-osgi is narrowly cheating around by effectively making a single >> class loader for all of Jena (including as Reto pointed out, TDB etc) >> >> >> >> >> >> -- >> Stian Soiland-Reyes >> Apache Taverna (incubating), Apache Commons RDF (incubating) >> http://orcid.org/-0001-9842-9718 > > > > -- > Stian Soiland-Reyes > Apache Taverna (incubating), Apache Commons RDF (incubating) > http://orcid.org/-0001-9842-9718
Re: [] 201508 Release of 23 Clerezza modules
If this is a matter of "just a couple of lines in the manifest file" cannot a patch be created to do that in Jena itself? Are there inter-module dependency issues that make that difficult? --- A. Soroka The University of Virginia Library On Sep 10, 2015, at 11:49 AM, Reto Gmürwrote: > On 9 Sep 2015 13:50, "Rob Vesse" wrote: >> >> This seems a little odd to me. It looks like they are placing these >> artifacts in their own group ID. However it still sets a slightly strange >> precedence if Apache Foo can release artifacts named Apache Bar even if >> they do so under their own maven coordinates >> >> Is this something they've been doing for a long time or is this a new >> thing? > > It is something which clerezza had been doing for a very long time. Apache > servicemix does the same for other projects that do not ship OSGi bundles, > see: http://mvnrepository.com/artifact/org.apache.servicemix.bundles > > Of course in an ideal world Jena would be modular and all it's jars would > also be OSGi bundles, after all this is just a couple of lines in the > manifest file. > >> >> If new why couldn't they work with us to provide the fixes back to Jena? > > What clerezza is doing is not an actual fix, but rather a wrapping. Stian > did something similar. > > Reto >> >> Rob >> >> On 07/09/2015 17:35, "Andy Seaborne" wrote: >> >>> PMC, >>> >>> Clerezza is proposing redistributing modified Jena 2.13.0 binaries. >>> NOTICE and LICENSE have been changed. These would go into the Apache >>> release maven repo. >>> >>> The binaries are currently at: >>> >>> > https://repository.apache.org/content/repositories/orgapacheclerezza-1009/ >>> org/apache/clerezza/ext/org.apache.jena.jena-core/2.13.0_1/ >>> >>> (Modified version number as well - it does not make clear that 2.13.0_1 >>> is not Jena-project release.) >>> >>> Andy >>> >>> Forwarded Message >>> Subject: Re: [] 201508 Release of 23 Clerezza modules >>> Date: Mon, 7 Sep 2015 12:12:23 +0100 >>> From: Andy Seaborne >>> To: d...@clerezza.apache.org >>> >>> On 06/09/15 18:39, Reto Gmür wrote: On Sat, Sep 5, 2015 at 10:21 PM, Andy Seaborne wrote: > On 05/09/15 16:36, Reto Gmür wrote: > >> Hi all, >> >> This is a partial clerezza release of 23 modules bringing the >> following >> improvements: >> >> - Fixed issues preventing rdf.rdfjson and rdf.jena.sparql to expose >> their >> OSGi-DS services >> - Updated to latest version of Jersey >> - Updated Jena Version >> - Contains integration tests >> >> It contains the following artifacts that shall be released to maven >> central: >> > > Where are the convenience binaries? (I didn't see anything on > https://repository.apache.org/#stagingRepositories but may have missed > something) Enabled now. Here: > https://repository.apache.org/content/repositories/orgapacheclerezza-1009 / >>> >>> Could you have used Jena's OSGi artifact? >>> >>> The binaries have had the NOTICE and LICENSE files replaced in both jar >>> and sources.jar. These miss the necessary declarations. >>> >>> Andy >>> Cheers, Reto >>> >>> >>> >> >> >> >>
Re: 3rd party modifying Jena binaries. Re: [] 201508 Release of 23 Clerezza modules
Is it your impression that the "special OSGi spice" additions are something that Jena could reasonably adopt into normal builds? Then maybe they wouldn't feel the need to do this… --- A. Soroka The University of Virginia Library On Sep 9, 2015, at 4:06 PM, Andy Seabornewrote: > > On 09/09/15 11:49, Rob Vesse wrote: >> This seems a little odd to me. It looks like they are placing these >> artifacts in their own group ID. However it still sets a slightly strange >> precedence if Apache Foo can release artifacts named Apache Bar even if >> they do so under their own maven coordinates > > We do something vaguely similar with Google Guava using "jena-shaded-guava". > The original Guava binaries do not include NOTICE and LICENSE files. But > then we change the class file and sources in accordance with the package > names. Maybe Clerezza should shade to under org.apache.clerezza.ext.jena. > > Clerezza artifact labelling does confuse. > > The modifications to Jena binaries are that there is other stuff in the jars > for OSGi, timestamps are "now" not "then". You can't tell by looking at jars > whether there code changes, but the related pom looks like a shade-OSGi step. > > This is not specific to Jena - there are other jars having had the same > process applied to them. > > The removing the NOTICE and LICENSE is a problem. > > They are specific to the modules and ought to carried over - they can have > more added but removing the contents of another open source projects N is a > big no-no. > > Andy > >> Is this something they've been doing for a long time or is this a new >> thing? >> >> If new why couldn't they work with us to provide the fixes back to Jena? >> >> Rob >> >> On 07/09/2015 17:35, "Andy Seaborne" wrote: >> >>> PMC, >>> >>> Clerezza is proposing redistributing modified Jena 2.13.0 binaries. >>> NOTICE and LICENSE have been changed. These would go into the Apache >>> release maven repo. >>> >>> The binaries are currently at: >>> >>> https://repository.apache.org/content/repositories/orgapacheclerezza-1009/ >>> org/apache/clerezza/ext/org.apache.jena.jena-core/2.13.0_1/ >>> >>> (Modified version number as well - it does not make clear that 2.13.0_1 >>> is not Jena-project release.) >>> >>> Andy >>> >>> Forwarded Message >>> Subject: Re: [] 201508 Release of 23 Clerezza modules >>> Date: Mon, 7 Sep 2015 12:12:23 +0100 >>> From: Andy Seaborne >>> To: d...@clerezza.apache.org >>> >>> On 06/09/15 18:39, Reto Gmür wrote: On Sat, Sep 5, 2015 at 10:21 PM, Andy Seaborne wrote: > On 05/09/15 16:36, Reto Gmür wrote: > >> Hi all, >> >> This is a partial clerezza release of 23 modules bringing the >> following >> improvements: >> >> - Fixed issues preventing rdf.rdfjson and rdf.jena.sparql to expose >> their >> OSGi-DS services >> - Updated to latest version of Jersey >> - Updated Jena Version >> - Contains integration tests >> >> It contains the following artifacts that shall be released to maven >> central: >> > > Where are the convenience binaries? (I didn't see anything on > https://repository.apache.org/#stagingRepositories but may have missed > something) Enabled now. Here: https://repository.apache.org/content/repositories/orgapacheclerezza-1009 / >>> >>> Could you have used Jena's OSGi artifact? >>> >>> The binaries have had the NOTICE and LICENSE files replaced in both jar >>> and sources.jar. These miss the necessary declarations. >>> >>> Andy >>> Cheers, Reto >>> >>> >>> >> >> >> >> >
Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"
I apologize if I am being thick here, but I don't understand how one goes about checking the potential match without some kind of covering resource against which to do that, something with a full representation of the graph. Can you elaborate on how to check the validity of the match? Thank you for taking the time to walk through this! --- A. Soroka The University of Virginia Library On Aug 31, 2015, at 10:04 AM, Claude Warrenwrote: > Step 3 is about removing the false positives from the bloom filter. It does > not require an index, it requires checking the values to ensure match.
Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"
I'm still a bit confused as to why you don't regard step 3 as being potentially very expensive. In order to verify a match, we will have to examine an "exact" index, and that (as Andy remarked) is likely to require traversal, or else we throw away all the space gains. Is this technique a way to pay a lot of time for a lot of space savings? Perhaps it is appropriate for an alternative implementation for very large datasets? --- A. Soroka The University of Virginia Library On Aug 31, 2015, at 6:48 AM, Claude Warren <cla...@xenei.com> wrote: > to find find(G,S,*,*) with bloom filters and return an iterator you > > 1. construct a bloom filter with G and S > 2. scan the list of quads checking for matches. > 3. for each result that matches verify that it has G and S (I have done > this with an extended iterator in Jena) > > result is an iterator that returns all (G,S,*,*) quads. > > similar tests can be performed for any pattern -- same code used. > > Step 2 is the expensive one. But the bloom filter check is so efficient > that it becomes very difficult to perform search operations in less time > than it takes to scan the list. > > Claude > > On Mon, Aug 31, 2015 at 11:01 AM, Andy Seaborne <a...@apache.org> wrote: > >> On 29/08/15 14:55, Claude Warren wrote: >> >>> Something I have been thinking about >>> >>> you could replace GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single >>> bloomfilter implementation. It means a 2 step process to find matches but >>> it might be fast enough and reduce the overhead significantly. >>> >>> I did an in-memory and a relational DB based version recently, but it was >>> just a quick POC. >>> >>> Claude >>> >> >> So we're talking about in-memory, where the items are java classes. A >> quad is 2 slots java overhead + 4 slots for G, S, P, O pointers. That's 48 >> bytes if the heap is >32G and 24 bytes otherwise (compressed pointers or 32 >> bit). >> >> For storage, the key test is "contains" to maintain the "set of" >> semantics. Something to stop index traversal for each insert would be >> great but it's still stored and 1 not up to 6 would be good. (Note that >> most data is unique quads.) >> >> The import retrieval operation is find(G,S,P,O) where any of those can be >> a wildcard and return (ideally as a stream) all matching quads with a >> prefix. The multiple indexes exist to find based on prefix. >> >> How would that work for, say find(G,S,*,*) with bloom filters and 1b >> quads? How does the code go from returning G,S,P1,O1 to the next G,S,P1,O2 >> without trying every value for the O slot? >> >> For a hash map based hierarchical index G->S->P->O, it's O(1) to find the >> start of the scan then datastructure iteration. A hash-based is not >> necessarily the best choice [*] but it's a baseline to discuss. >> >> And in memory, will a bloom filter-based system be faster? Because of >> false-positives, isn't a definitively index still needed? If one is kept, >> not 6, there could be great space gains but every quad returned is a >> top-to-bottom traversal of that index (which is now a not a range index). >> >> The design should work for 1+ billion in-memory quads - that's the way the >> world is going. >> >> So each quad is reduced to a >>> single bloom filter comprising 4 items (15-bytes). >>> >> >>Andy >> >> [*] even in memory, it might be worth allocating internal ids and working >> in longs like a disk based system because it is more compact - naive >> hashmaps take a lot of space to when storing small items like quads. >> tradeoffs, tradeoffs, ... >> >> >> >> >>> On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka <aj...@virginia.edu> wrote: >>> >>> Hey, folks-- >>>> >>>> There hasn't been too much feedback on my proposal for a journaling >>>> DatasetGraph: >>>> >>>> https://github.com/ajs6f/jena/tree/JournalingDatasetgraph >>>> >>>> which was and is to be a step towards JENA-624: Develop a new in-memory >>>> RDF Dataset implementation. So I'm moving on to look at the real problem: >>>> an in-memory DatasetGraph with high concurrency, for use with modern >>>> hardware running many, many threads in large core memory. >>>> >>>> I'm beginning to sketch out rough code, and I'd like to run some design >>>> decisions past the list to get cri
Re: JENA-624: Develop a new in-memory RDF Dataset implementation
Thanks for the feedback! I can see how one Bloom filter could be used with an accompanying structure to replace one of the indexes, but I don't quite see how one could replace all of them-- can you elaborate? --- A. Soroka The University of Virginia Library On Aug 29, 2015, at 9:55 AM, Claude Warren cla...@xenei.com wrote: Something I have been thinking about you could replace GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single bloomfilter implementation. It means a 2 step process to find matches but it might be fast enough and reduce the overhead significantly. I did an in-memory and a relational DB based version recently, but it was just a quick POC. Claude On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka aj...@virginia.edu wrote: Hey, folks-- There hasn't been too much feedback on my proposal for a journaling DatasetGraph: https://github.com/ajs6f/jena/tree/JournalingDatasetgraph which was and is to be a step towards JENA-624: Develop a new in-memory RDF Dataset implementation. So I'm moving on to look at the real problem: an in-memory DatasetGraph with high concurrency, for use with modern hardware running many, many threads in large core memory. I'm beginning to sketch out rough code, and I'd like to run some design decisions past the list to get criticism/advice/horrified warnings/whatever needs to be said. 1) All-transactional action: i.e. no non-transactional operation. This is obviously a great thing for simplifying my work, but I hope it won't be out of line with the expected uses for this stuff. 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I figure to play to the strength of in-core-memory operation: raw speed, but obviously this is going to cost space. 3) At least for now, all commits succeed. 4) The use of persistent datastructures to avoid complex and error-prone fine-grained locking regimes. I'm using http://pcollections.org/ for now, but I am in no way committed to it nor do I claim to have thoroughly vetted it. It's simple but enough to get started, and that's all I need to bring the real design questions into focus. 5) Snapshot isolation. Transactions do not see commits that occur during their lifetime. Each works entirely from the state of the DatasetGraph at the start of its life. 6) Only as many as one transaction per thread, for now. Transactions are not thread-safe. These are simplifying assumptions that could be relaxed later. My current design operates as follows: At the start of a transaction, a fresh in-transaction reference is taken atomically from the AtomicReference that points to the index block. As operations are performed in the transaction, that in-transaction reference is progressed (in the sense in which any persistent datastructure is progressed) while the operations are recorded. Upon an abort, the in-transaction reference and the record are just thrown away. Upon a commit, the in-transaction reference is thrown away and the operation record is re-run against the main reference (the one that is copied at the beginning of a transaction). That rerun happens inside an atomic update (hence the use of AtomicReference). This all should avoid the need for explicit locking in Jena and should confine any blocking against the indexes to the actual duration of a commit. What do you guys think? --- A. Soroka The University of Virginia Library -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: JENA-624: Develop a new in-memory RDF Dataset implementation
In fact, this is why I tried (for a first try) a design with only one transaction committing at a time, which amounts to SW in terms of serializability, I thought. But I am allowing multiple writers to assemble changes in multiple transactions at the same time, and I think that is what will prevent the use of swap-into-commit. Maybe this is a bad trade? Since JENA-624 contemplates very high concurrency, is it worth doing a MR+SW design at all? But MRMW seems very hard. {grin} I had some ideas about structuring indexes in such a way as to allow for more fine-grained locking and using merge for actual MW, but as you point out, locking down to particular resources is not able to guarantee against conflicts between conceptual entities. I also had some nightmares trying to think about how to manage bnodes across multiple writers. --- A. Soroka The University of Virginia Library On Aug 28, 2015, at 6:17 AM, Andy Seaborne a...@apache.org wrote: On 27/08/15 16:53, aj...@virginia.edu wrote: Andy-- Thanks, these comments are really helpful! I've replied in-line in a few places to clarify or answer questions, or ask some of my own. {grin} --- A. Soroka The University of Virginia Library If there are multiple writers, then (1) system aborts will always be possible (conflicting updates) and (2) locking on datastructres is necessary ... or timestamps and vector clocks or some such. Right, see below. Again, there are multiple writers, but they only see themselves, and only one committer. Only one committer at a time prevents conflicts, since there is no schema to violate, but it is a brutal way to deal with the problem. And the re-run scheme of operation means it will be a very real bottleneck. 5) Snapshot isolation. Transactions do not see commits that occur during their lifetime. Each works entirely from the state of the DatasetGraph at the start of its life. But they see their own updates presumably? Right, that's exactly the purpose of taking off their own reference to the persistent datastructures at the start of the transaction. They evolve their datastructures independently. When used in a program, persistent datastructures diverge when two writes act from the same base point. Transactions do more - they are serializing all operations so there is a linear sequence of versions. This is the problem you identify below. 6) Only as many as one transaction per thread, for now. Transactions are not thread-safe. These are simplifying assumptions that could be relaxed later. TDB ended up there as well. There is, internally, a transaction object but it's held in a ThreadLocal and fetched when needed. Otherwise a lot of interface need a transaction parameter and its hard to reuse other code that does pass it through. That's close to what I sketched out. I have taken a second take on transactions with TDB2. This module is an independent transactions system, unlike TDB1 where it's TDB1-specific. https://github.com/afs/mantis/tree/master/dboe-transaction It needs documentation for use on its own but I have used in in another project to coordinate distributed transactions. (dboe = database operating environment) I need to study this more. Obviously, if I can take over some of your work, that would be ideal. My current design operates as follows: snipped Looks good. I don't quite understand the need to record and rerun though - isn't the power of pcollections that there can be old and new roots to the datastructures and commit is swap to new one, abort is forget the new one. Yeah, but my worry (perhaps just my misunderstanding) is over transactions interacting badly in the presence of snapshot isolation. Let's say we did use the technique of atomic swap, and consider the following scenario: T=-1 The committed datastructures contain triples T. T=0 Transaction 1 begins, taking a reference to the datastructures T=1 Transaction 2 begins, taking its own reference to the datastructures T=3 Transaction 1 does some updates, adding some triples T_1 to its own branch, resulting in T+T_1. T=4 Transaction 2 does some updates, adding some triples T_2 to its own branch, resulting in T+T_2. T=5 Transaction 1 commits, so that the committed triples are now T + T_1. T=6 Transaction 2 commits, so that the committed triples are now T + T_2. We lost Transaction 1's T_1 triples. I think this technique actually requires _merge_ instead of swap, either merge-into-open-transactions (after a commit) which isn't snapshot isolation or merge-into-commit (instead of swap-into-commit). But there's plenty of chance that I'm just misunderstanding this whole thing. {grin} I have not designed a transaction system over persistent datastructures before, and I welcome correction. I also need to research more about persistent datastructures with merge capability. which is why 2+ writers needs locking or aborts. The common
Re: JENA-624: Develop a new in-memory RDF Dataset implementation
Ah, okay, I see the problem more clearly now. Thanks! It seems to me now that the best immediate road forward is to go to true MR+SW (a write lock for the dataset), since I take from your remarks that you think that would be valuable in itself. That would be straightforward. I have read a few papers that discuss doing MW by locking at the granularity of triple patterns or BGPs, but I have to admit that it will take more study before I am ready to implement something like that. {grin} --- A. Soroka The University of Virginia Library On Aug 28, 2015, at 7:42 AM, Andy Seaborne a...@apache.org wrote: On 28/08/15 12:22, aj...@virginia.edu wrote: In fact, this is why I tried (for a first try) a design with only one transaction committing at a time, which amounts to SW in terms of serializability, I thought. No :-( But I am allowing multiple writers to assemble changes in multiple transactions at the same time, and I think that is what will prevent the use of swap-into-commit. Maybe this is a bad trade? Since JENA-624 contemplates very high concurrency, is it worth doing a MR+SW design at all? But MRMW seems very hard. {grin} I had some ideas about structuring indexes in such a way as to allow for more fine-grained locking and using merge for actual MW, but as you point out, locking down to particular resources is not able to guarantee against conflicts between conceptual entities. I also had some nightmares trying to think about how to manage bnodes across multiple writers. See my example for a counter example. It's not 2 commits at once to avoid, it is that W2 is reading a pre-W1 comit view of the world. W1 starts and takes a start-of-transaction pointer to datastructures. W1 reads the account balance as 10 W2 start ditto. W2 reads the account balance as 10 W1 updates and commits The account balance visible to any new reader is 15 W2 updates and commits The account balance visible to any new reader is 17 but it should be 22. The +5 has been lost. Your scheme keeps the database datastructures safe, but at the data model level, can cause inconsistency and loss of change. Either an application level resolution of changes or something like 2-phase locking is needed and even then there are issues of non-repeatable reads and phantoms reads. https://en.wikipedia.org/wiki/Isolation_%28database_systems%29 It gets very nasty when aggregations (COUNT, SUM) happen. You can get answers that are not from any state of the data that ever existed. Andy
Re: JENA-624: Develop a new in-memory RDF Dataset implementation
Andy-- Thanks, these comments are really helpful! I've replied in-line in a few places to clarify or answer questions, or ask some of my own. {grin} --- A. Soroka The University of Virginia Library On Aug 27, 2015, at 5:35 AM, Andy Seaborne a...@apache.org wrote: 1) All-transactional action: i.e. no non-transactional operation. This is obviously a great thing for simplifying my work, but I hope it won't be out of line with the expected uses for this stuff. You could add an auto-commit feature so that any update outside a transaction has a transaction wrapper applied. Feature. I can and will. 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I figure to play to the strength of in-core-memory operation: raw speed, but obviously this is going to cost space. There are choices :-) esp in memory are the datastructure kind can chnage on the way down. e.g. have a hash map for GS-PO and keep the PO tightly packed (a few for each S) and scan them. Very true. Ideally, I would like to offer some knobs to users to choose their own balance between speed and space. I'll step back and consider a few more designs before going further with this six-way approach. Is that going to 6 pcollections datastructres or all held in one datastructres (c.f. BerkeleyDB)? Right now I am looking at a setup with six independent indexes addressed through a single class, but that was just the first thing that came to mind that seemed reasonable. I am not committed to that by any means. If I step away from the six-way, the question changes. 3) At least for now, all commits succeed. 4) The use of persistent datastructures to avoid complex and error-prone fine-grained locking regimes. I'm using http://pcollections.org/ for now, but I am in no way committed to it nor do I claim to have thoroughly vetted it. It's simple but enough to get started, and that's all I need to bring the real design questions into focus. Is a consequence that there is one truly active writer (and many readers)? Something like that. If you look at the scheme of operation below, all writers are invisible to each other and can write at will, but only one writer can commit at a time. That may very well not be enough concurrency, but it's just a starting place. If there are multiple writers, then (1) system aborts will always be possible (conflicting updates) and (2) locking on datastructres is necessary ... or timestamps and vector clocks or some such. Right, see below. Again, there are multiple writers, but they only see themselves, and only one committer. Only one committer at a time prevents conflicts, since there is no schema to violate, but it is a brutal way to deal with the problem. And the re-run scheme of operation means it will be a very real bottleneck. 5) Snapshot isolation. Transactions do not see commits that occur during their lifetime. Each works entirely from the state of the DatasetGraph at the start of its life. But they see their own updates presumably? Right, that's exactly the purpose of taking off their own reference to the persistent datastructures at the start of the transaction. They evolve their datastructures independently. 6) Only as many as one transaction per thread, for now. Transactions are not thread-safe. These are simplifying assumptions that could be relaxed later. TDB ended up there as well. There is, internally, a transaction object but it's held in a ThreadLocal and fetched when needed. Otherwise a lot of interface need a transaction parameter and its hard to reuse other code that does pass it through. That's close to what I sketched out. I have taken a second take on transactions with TDB2. This module is an independent transactions system, unlike TDB1 where it's TDB1-specific. https://github.com/afs/mantis/tree/master/dboe-transaction It needs documentation for use on its own but I have used in in another project to coordinate distributed transactions. (dboe = database operating environment) I need to study this more. Obviously, if I can take over some of your work, that would be ideal. My current design operates as follows: snipped Looks good. I don't quite understand the need to record and rerun though - isn't the power of pcollections that there can be old and new roots to the datastructures and commit is swap to new one, abort is forget the new one. Yeah, but my worry (perhaps just my misunderstanding) is over transactions interacting badly in the presence of snapshot isolation. Let's say we did use the technique of atomic swap, and consider the following scenario: T=-1 The committed datastructures contain triples T. T=0 Transaction 1 begins, taking a reference to the datastructures T=1 Transaction 2 begins, taking its own reference to the datastructures T=3 Transaction 1 does some updates, adding some triples T_1 to its own branch, resulting in T+T_1. T=4 Transaction 2 does some
Re: RDFConnection
Just a thought on ergonomics: it might be nice to separate clear and delete, so instead of RDFConnection::delete either clearing or deleting a graph depending on whether it is the default graph, you have finer control and can clear a non-default graph. --- A. Soroka The University of Virginia Library On Aug 4, 2015, at 6:21 PM, Andy Seaborne a...@apache.org wrote: There's a note in the interface // Query // Maybe more query forms: querySelect(Query)? select(Query)? At the moment, the operations are the basic ones (the SPARQL protocols for query, update and GSP). There's scope to add forms on top. void execSelect(Query query, ConsumerQuerySolution action) is one possibility. Andy On 04/08/15 16:14, aj...@virginia.edu wrote: Is this a little bit like Sesame 4's new Repository helper type? Not totally the same thing, but similar in that it's bringing a lot of convenience together around the notion of dataset? http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_and_transaction_handling --- A. Soroka The University of Virginia Library On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote: Stephen, all, Recently on users@ there was a question about the s-* in java. That got me thinking about an interface to pull together all SPARQL operations into one application-facing place. We have jena-jdbc, and jena-client already - this is my sketch take. [1] RDFConnection Currently, it's a sketch-for-discussion; it's a bit DatasetAccessor-like + SPARQL query + SPARQL Update. And some whole-dataset-REST-ish operations (that Fuseki happens to support). It's a chance to redo things a bit. RDFConnection uses the existing SPARQL+RDF classes and abstractions in ARQ, not strings, [*] rather than putting all app-visible clases in one package. Adding an equivalent of DatabaseClient to represent one place would be good - and add the admin operations, for Fuseki at least. Also, a streaming load possibility. Comments? Specific use cases? Andy (multi-operation transactions ... later!) [*] You can use strings as well - that's the way to get arbitrary non-standard extensions through. [1] https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfconnection/RDFConnection.java
Re: Journaling DatasetGraph
Thanks for the feedback Andy. 2/ Datasets that provide support for MW cases and don't provide transactions seem rather unlikely so may be document what kind of DatasetGraph is being supported by DatasetGraphWithRecord then just use the underlying lock.. Okay, that's certainly simpler! And it keeps my grubby fingers out of Lock. {grin} 3/ There are two thing to protect in DatasetGraphWithRecord : the underlying dataset and transaction log for supporting abort for writers only. They can have separate mechanisms. Use the dataset lock for the DatasetGraph actions and make the transaction undo log operations be safe by other means. You mean an independent lock visible only inside DatasetGraphWithRecord? .. hmm ... the order of entries in the log may matter so true parallel MW looks increasing hard to deal with anyway. Document and not worry for now? My fear has been that MW means a) a log per write-transaction and connections from the transaction to a particular set of states for the indexes b) with those forward states invisible outside the transaction c) and all the nightmare fun of merging states! --- A. Soroka The University of Virginia Library On Aug 4, 2015, at 4:32 PM, Andy Seaborne a...@apache.org wrote: On 03/08/15 17:13, aj...@virginia.edu wrote: I've made some emendations to (hopefully) fix this problem. In order to so do, I added a method to Lock itself to report the quality of an instance, simply as an enumeration. I had hoped to avoid touching any of the extant code, but because Lock is a public type that can be instantiated by anyone, I just can't see how to resolve this problem without some way for a Lock to categorize itself independently of the type system's inheritance. Feedback welcome! A few things occur to me: 1/ The transaction log is for supporting abort for writers only. Nothing needs to be done in DatasetGraphWithRecord for readers. DatasetGraphWithLock does what's needed. So you don't even need to startRecording for a READ (and the commit clear - _end always aborts is an interesting way to do it!). 2/ Datasets that provide support for MW cases and don't provide transactions seem rather unlikely so may be document what kind of DatasetGraph is being supported by DatasetGraphWithRecord then just use the underlying lock.. It's not just a case of using ConcurrentHashMap, say, as likely there would be multiple of them for different indexes and that would give weird consistency issues as different parts get updated safely with respect to part of the datastructure but it will be visibly different depending on what the reader uses. So I think MW will have additional coordination. 3/ There are two thing to protect in DatasetGraphWithRecord : the underlying dataset and transaction log for supporting abort for writers only. They can have separate mechanisms. Use the dataset lock for the DatasetGraph actions and make the transaction undo log operations be safe by other means. .. hmm ... the order of entries in the log may matter so true parallel MW looks increasing hard to deal with anyway. Document and not worry for now? Andy --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote: The lock provided by the underlying dataset may matter. DatasetGraphs support critical sections. DatasetGraphWithLock uses critical sections of the underlying dataset. I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors). DatasetGraphWithRecord is relying on single-W for its own datastructures. Andy On 29/07/15 21:22, aj...@virginia.edu wrote: I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless. Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord? --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote: On 27/07/15 18:06, aj...@virginia.edu wrote: Is there some specific reason as to why you override the DatasetGraphWithLock lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1] Some implementations (there aren't any) may not even be able
Re: RDFConnection
Ah, that makes my distinction pretty meaningless! This abstraction seems meant to rub out just such differences. This does remind me of another potential nice small feature: a StreamTriple construct(Query query) method, maybe at first via QueryExecution::execConstructTriples. The AutoCloseable-ity of QueryExecution could pass through Stream's QueryExecution AutoCloseable-ity. With clever implementation eventually, some of the methods on Stream (e.g. filter) could get passed through to SPARQL execution. --- A. Soroka The University of Virginia Library On Aug 5, 2015, at 9:37 AM, Rob Vesse rve...@dotnetrdf.org wrote: The main complicating factor is that clear and delete are only separate operations if the storage layer stores graph names separately from graph data which the SPARQL specification specifically do not require For storage systems like TDB where only quads are stored the existence of a named graph is predicated by the existence of some quads in that graph and so delete is equivalent to clear because if you remove all quads for a graph TDB doesn't know about that graph any more The SPARQL specifications actually explicitly call this complication out in several places (search for empty graphs in the SPARQL 1.1 update spec) and various SPARQL Updates behaviours may differ depending on whether the storage layer records the presence of empty graphs or not Rob On 05/08/2015 13:44, aj...@virginia.edu aj...@virginia.edu wrote: Just a thought on ergonomics: it might be nice to separate clear and delete, so instead of RDFConnection::delete either clearing or deleting a graph depending on whether it is the default graph, you have finer control and can clear a non-default graph. --- A. Soroka The University of Virginia Library On Aug 4, 2015, at 6:21 PM, Andy Seaborne a...@apache.org wrote: There's a note in the interface // Query // Maybe more query forms: querySelect(Query)? select(Query)? At the moment, the operations are the basic ones (the SPARQL protocols for query, update and GSP). There's scope to add forms on top. void execSelect(Query query, ConsumerQuerySolution action) is one possibility. Andy On 04/08/15 16:14, aj...@virginia.edu wrote: Is this a little bit like Sesame 4's new Repository helper type? Not totally the same thing, but similar in that it's bringing a lot of convenience together around the notion of dataset? http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_an d_transaction_handling --- A. Soroka The University of Virginia Library On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote: Stephen, all, Recently on users@ there was a question about the s-* in java. That got me thinking about an interface to pull together all SPARQL operations into one application-facing place. We have jena-jdbc, and jena-client already - this is my sketch take. [1] RDFConnection Currently, it's a sketch-for-discussion; it's a bit DatasetAccessor-like + SPARQL query + SPARQL Update. And some whole-dataset-REST-ish operations (that Fuseki happens to support). It's a chance to redo things a bit. RDFConnection uses the existing SPARQL+RDF classes and abstractions in ARQ, not strings, [*] rather than putting all app-visible clases in one package. Adding an equivalent of DatabaseClient to represent one place would be good - and add the admin operations, for Fuseki at least. Also, a streaming load possibility. Comments? Specific use cases? Andy (multi-operation transactions ... later!) [*] You can use strings as well - that's the way to get arbitrary non-standard extensions through. [1] https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfco nnection/RDFConnection.java
Re: RDFConnection
Is this a little bit like Sesame 4's new Repository helper type? Not totally the same thing, but similar in that it's bringing a lot of convenience together around the notion of dataset? http://rdf4j.org/doc/4/programming.docbook?view#Stream_based_querying_and_transaction_handling --- A. Soroka The University of Virginia Library On Aug 2, 2015, at 3:05 PM, Andy Seaborne a...@apache.org wrote: Stephen, all, Recently on users@ there was a question about the s-* in java. That got me thinking about an interface to pull together all SPARQL operations into one application-facing place. We have jena-jdbc, and jena-client already - this is my sketch take. [1] RDFConnection Currently, it's a sketch-for-discussion; it's a bit DatasetAccessor-like + SPARQL query + SPARQL Update. And some whole-dataset-REST-ish operations (that Fuseki happens to support). It's a chance to redo things a bit. RDFConnection uses the existing SPARQL+RDF classes and abstractions in ARQ, not strings, [*] rather than putting all app-visible clases in one package. Adding an equivalent of DatabaseClient to represent one place would be good - and add the admin operations, for Fuseki at least. Also, a streaming load possibility. Comments? Specific use cases? Andy (multi-operation transactions ... later!) [*] You can use strings as well - that's the way to get arbitrary non-standard extensions through. [1] https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/rdfconnection/RDFConnection.java
Re: Journaling DatasetGraph
I've made some emendations to (hopefully) fix this problem. In order to so do, I added a method to Lock itself to report the quality of an instance, simply as an enumeration. I had hoped to avoid touching any of the extant code, but because Lock is a public type that can be instantiated by anyone, I just can't see how to resolve this problem without some way for a Lock to categorize itself independently of the type system's inheritance. Feedback welcome! --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote: The lock provided by the underlying dataset may matter. DatasetGraphs support critical sections. DatasetGraphWithLock uses critical sections of the underlying dataset. I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors). DatasetGraphWithRecord is relying on single-W for its own datastructures. Andy On 29/07/15 21:22, aj...@virginia.edu wrote: I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless. Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord? --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote: On 27/07/15 18:06, aj...@virginia.edu wrote: Is there some specific reason as to why you override the DatasetGraphWithLock lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1] Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read). There aren't any (the rules are R-safe) - locks are always LockMRSW. [1] http://jena.apache.org/documentation/notes/concurrency-howto.html Andy
Re: Journaling DatasetGraph
I think I understand the problem now. Assuming I do, I see two cases: 1) The underlying dataset has locking that is _more_ restrictive that MRSW, in which case DatasetGraphWithRecord must expose that locking, lest it break the underlying impl. 2) The underlying dataset has locking that is _less_ restrictive that MRSW, in which case DatasetGraphWithRecord must eclipse that locking, lest it break DatasetGraphWithRecord's impl. So my task is to adopt some careful meaning for more and less as used above and use it to make DatasetGraphWithRecord's locking more intelligent. I do not see anything in Jena that would answer to the purpose, but maybe I am missing something. {fingers-crossed} --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 5:04 PM, Andy Seaborne a...@apache.org wrote: The lock provided by the underlying dataset may matter. DatasetGraphs support critical sections. DatasetGraphWithLock uses critical sections of the underlying dataset. I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors). DatasetGraphWithRecord is relying on single-W for its own datastructures. Andy On 29/07/15 21:22, aj...@virginia.edu wrote: I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless. Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord? --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote: On 27/07/15 18:06, aj...@virginia.edu wrote: Is there some specific reason as to why you override the DatasetGraphWithLock lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1] Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read). There aren't any (the rules are R-safe) - locks are always LockMRSW. [1] http://jena.apache.org/documentation/notes/concurrency-howto.html Andy
Re: Journaling DatasetGraph
I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless. Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord? --- A. Soroka The University of Virginia Library On Jul 29, 2015, at 4:14 PM, Andy Seaborne a...@apache.org wrote: On 27/07/15 18:06, aj...@virginia.edu wrote: Is there some specific reason as to why you override the DatasetGraphWithLock lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1] Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read). There aren't any (the rules are R-safe) - locks are always LockMRSW. [1] http://jena.apache.org/documentation/notes/concurrency-howto.html Andy
Re: Journaling DatasetGraph
Thanks for the feedback, Andy! See comment in-line below. --- A. Soroka The University of Virginia Library On Jul 25, 2015, at 7:43 AM, Andy Seaborne a...@apache.org wrote: A first look - there's quite a lot to do with the release at the moment. Right, I don't expect anyone to get around to much consideration of this until that is over. Good luck! Having a separate set of functionality to the underlying DatasetGraph is good for the MRSW case and with that composition on multiple datasets, text indexes etc etc. For the MR+SW, I think the more connected nature of transactions and implementation might make it harder to have independent functionality but we'll see. I agree. That's why I did this as a wrap-around. I don't think MR+SW _can_ be done that way, but we'll see… Yes - addGraph ought to be a copy. The general dataset where the app can put together a collection of different graph types is the exception but needed for the case of some graphs being inference, maybe some not. As I wrote, I believe that my current code does this solidly and the test shows it, but I'm not sure that the impl is as efficient as possible. Suggestions welcome! One of the things that strikes me is that extending Quad to be a QuadOperation breaks being a Quad. It adds functionality a quad does not have. Two quads are equal if they have the same G/S/P/O and that's not true for QuadOperation. An operation is a pair - the action and the data - not data. I'm not sure I understand the objection here: all classes inherit from Object and virtually all of them add functionality Object does not have and break its equality definition. I certainly understand the view on operations you're taking, but I'm proposing a different one that includes data, action (in my code, that comes in the form of type, not an enumeration, so that I can replace cases in your code with polymorphism) _and_ service type. Adding a quad to a special index might be substantially different than adding it to a dataset. e.g. Putting a QuadOperation into a DatasetGraph would cause problems. Because of the equality question? I _think_ I understand this objection; are you saying that logic for things like DatasetGraph::contains becomes problematic? To my mind it implies a more sophisticated type of comparison (using equivalence and not equals()) instead of a different kind of data structure. I'll try to make some corrections to show what I mean and give you something to react to. I may be wrong here, but I'd like to follow out the idea. ListBackedOperationRecordOpType extends ReversibleOperationRecordOpType public class ListBackedOperationRecordOpType extends InvertibleOperation?, ?, ?, ? implements ReversibleOperationRecordOpType { while, yes, a collection of operations could be an operation datasets don't provide such composite operations so the abstraction is not used. And the reverse of it would be recursive - each operation needs reversing. I am _not_ making the claim here that a collection of operations could be an operation. A record (in my code) is just a record. It is _not_ usable as an aggregate operation and doesn't subtype Operation. There is no use of records as operations nor any intended such use, so no problem. I'd keep log (= list of operations) as a separate concept from the operations themselves. One key operation of a ListBackedOperationRecord is clear and Operations are Or this is a naming thing, is record the log entry or the log itself? Something seems to have been eaten out of your mail (!) but anyway, a record _is_ a separate concept from operation. There is ReversibleOperationRecord and there is Operation and the only relationship between them is that Operation is a parameter type for ReversibleOperationRecord::add and part of the parameter type for ReversibleOperationRecord::consume. As far as names, I'm not sure what you mean-- ReversibleOperationRecord the type? That's a log. It contains Operations, but _is not one itself_. Is there some specific reason as to why you override the DatasetGraphWithLock lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) One difference is the notion of reversing an operation is not a feature of the operation itself, it's the way it is played back. Partially, this is efficiency (which may not matter) as it reduces the object churn but also it puts undo-playback in one place (e.g. reading and writing from storage, which might be non-heap memory, or a compacted form (or even a disk) for where large+long transactions even on in-memory lead to excessive object use. Just an idea. Yeah, I intentionally separated the two (reverse an
Re: Journaling DatasetGraph
One of the things that strikes me is that extending Quad to be a QuadOperation breaks being a Quad. It adds functionality a quad does not have. Two quads are equal if they have the same G/S/P/O and that's not true for QuadOperation. An operation is a pair - the action and the data - not data. e.g. Putting a QuadOperation into a DatasetGraph would cause problems. Andy-- I've thought harder about this and I've realized that whether or not I can make a navel-gazing argument about correctness, the typing is obviously confusing and that's damnation enough. I'll fix this to stop extending Quad. --- A. Soroka The University of Virginia Library On Jul 25, 2015, at 7:43 AM, Andy Seaborne a...@apache.org wrote: On 23/07/15 14:18, aj...@virginia.edu wrote: After a longish conversation with Andy Seaborne, I've worked up a simple journaling DatasetGraph wrapping implementation. The idea is to use journaling to support proper aborting behavior (which I believe this code does) and to add to that a semantic for DatasetGraph::addGraph that copies tuples instead of leaving a reference to the added Graph (which I believe this code also does). Between these two behaviors, the idea is to be able to support transactionality (MRSW only) reasonably well. The idea is (if this code looks like a reasonable direction) to move onwards to an implementation that uses persistent data structures for covering indexes in order to get at least to MR+SW and eventually to attack JENA-624: Develop a new in-memory RDF Dataset implementation. Feedback / advice / criticism greedily desired and welcome! https://github.com/ajs6f/jena/tree/JournalingDatasetgraph https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph --- A. Soroka The University of Virginia Library Hi there, A first look - there's quite a lot to do with the release at the moment. Having a separate set of functionality to the underlying DatasetGraph is good for the MRSW case and with that composition on multiple datasets, text indexes etc etc. For the MR+SW, I think the more connected nature of transactions and implementation might make it harder to have independent functionality but we'll see. https://github.com/afs/mantis/tree/master/dboe-transaction is a take on a trasnaction mechanism. I'm using it at the moment so I'm finding otu what works ... and what does not. Yes - addGraph ought to be a copy. The general dataset where the app can put together a collection of different graph types is the exception but needed for the case of some graphs being inference, maybe some not. One of the things that strikes me is that extending Quad to be a QuadOperation breaks being a Quad. It adds functionality a quad does not have. Two quads are equal if they have the same G/S/P/O and that's not true for QuadOperation. An operation is a pair - the action and the data - not data. e.g. Putting a QuadOperation into a DatasetGraph would cause problems. ListBackedOperationRecordOpType extends ReversibleOperationRecordOpType [[ public class ListBackedOperationRecordOpType extends InvertibleOperation?, ?, ?, ? implements ReversibleOperationRecordOpType { ]] while, yes, a collection of operations could be an operation, datasets don't provide such composite operations so the abstraction is not used. And the reverse of it would be recursive - each operation needs reversing. I'd keep log (= list of operations) as a separate concept from the operations themselves. One key operation of a ListBackedOperationRecord is clear and Operations are Or this is a naming thing, is record the log entry or the log itself? Is there some specific reason as to why you override the DatasetGraphWithLock lock? My take on this is: https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg One difference is the notion of reversing an operation is not a feature of the operation itself, it's the way it is played back. Partially, this is efficiency (which may not matter) as it reduces the object churn but also it puts undo-playback in one place (e.g. reading and writing from storage, which might be non-heap memory, or a compacted form (or even a disk) for where large+long transactions even on in-memory lead to excessive object use. Just an idea. Andy
Re: Iter vs. ExtendedIterator
Since I'm trying to get to an understanding from which I can write a PR with some new Javadocs for these type, let me try out the following: Iter should never be used for a return type or parameter type in the public contract of a class. It is only to be used inside implementation code and it can be instantiated only to allow method-chaining as part of a calculation. ExtendedIterator should only be used as a return type or parameter type in the public contract of a class when that is specifically required by a type being implemented. Do those remarks capture facts about the two types? --- A. Soroka The University of Virginia Library On Jul 21, 2015, at 3:36 PM, Andy Seaborne a...@apache.org wrote: On 21/07/15 15:38, A. Soroka wrote: A question came up for me, as a Jena newbie, in the course of JENA-966: LazyIterator. The type ExtendedIterator in jena-core is used widely through jena-core. It features several convenient methods for use with iteration, like mapping through functions, filtering, and concatenation. The type Iter in jena-base is used widely through jena-base and jena-arq. It features many convenient methods for use with iteration, like everything ExtendedIterator does plus much more, (e.g. folding, selecting, reducing…). What is the difference in use for these two types? Why are they distinct? Is there some means by which it can be made clear when to use each and why? I would be happy to write a simple class Javadoc for Iter (which currently has none at all) to let folks know when to use it, if someone will explain that to me. --- A. Soroka The University of Virginia Library Iter is used in SDB and TDB as well where there are lots of iterators for all sort so things. ExtendedIterator only works with ExtendedIterator. Not everything generates ExtendedIterators. Iter is for working with java.util.Iterator; it is a different style where the statics are more important than the class methods. It does allow chaining but generally I don't think that style is very common in the code base. Andy
Journaling DatasetGraph
After a longish conversation with Andy Seaborne, I've worked up a simple journaling DatasetGraph wrapping implementation. The idea is to use journaling to support proper aborting behavior (which I believe this code does) and to add to that a semantic for DatasetGraph::addGraph that copies tuples instead of leaving a reference to the added Graph (which I believe this code also does). Between these two behaviors, the idea is to be able to support transactionality (MRSW only) reasonably well. The idea is (if this code looks like a reasonable direction) to move onwards to an implementation that uses persistent data structures for covering indexes in order to get at least to MR+SW and eventually to attack JENA-624: Develop a new in-memory RDF Dataset implementation. Feedback / advice / criticism greedily desired and welcome! https://github.com/ajs6f/jena/tree/JournalingDatasetgraph https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph --- A. Soroka The University of Virginia Library
Re: Iter vs. ExtendedIterator
Okay, so if I were writing some new code in a Jena module, and I needed to do some of the tasks for which these guys have facilities (e.g. filtering), how should I select a type to use? Should I only use ExtendedIterator's methods if the thing I already have in hand is an ExtendedIterator? Put another way, is it ever appropriate to create an ExtendedIterator in a situation in which I am not beholden to so do by interface requirements? Thanks for helping me get some understanding on this. --- A. Soroka The University of Virginia Library On Jul 21, 2015, at 3:36 PM, Andy Seaborne a...@apache.org wrote: Iter is used in SDB and TDB as well where there are lots of iterators for all sort so things. ExtendedIterator only works with ExtendedIterator. Not everything generates ExtendedIterators. Iter is for working with java.util.Iterator; it is a different style where the statics are more important than the class methods. It does allow chaining but generally I don't think that style is very common in the code base. Andy
Re: Fuskei and ETags
On Jun 29, 2015, at 9:33 AM, Claude Warren cla...@xenei.com wrote: If there were an ETag per dataset and a method on the dataset to force an ETag reset would this address the index issue in that the indexer could reset the ETag when it deemed appropriate? It might-- for that indexer. I would be concerned about setups in which another process acted against the data out of sight of Fuseki. But would the ETag be on ARQ's Dataset itself? If I understand what's going on here correctly (debatable at best), Dataset should not have any HTTP concerns mixed into it. ETag would be on something closer to Fuseki's DataService, which I do not think would normally be accessible to an indexer which is only aware of what's on disk… but this is all from my understanding of the architecture, which is pretty minimal. {grin} Maybe some kind of last changed timestamp could reasonably go on Dataset to support this kind of function? In any case I would go with the first choice. It definitely seems like the most bang for the least buck. Is there anything that prohibits sending both an ETag and a constant expires? I havn't looked but I recall they are not mutually exclusive. Yes, I think you are correct. I suppose a bad ETag will never be known to be such as long as it is inside the range of a still-good Expires, but that is a question for the administrator configuring Fuseki, it seems to me. There is also Cache-Control, of course, in the same field of functionality. --- A. Soroka The University of Virginia Library
Re: Fuskei and ETags
I can only speak for the use cases I actually know about. ETags would get used, because the most important web app in my concern that is potentially a client to Fuseki would be able to use them. But that is just one case. JENA-626 would be great in any regard. --- A. Soroka The University of Virginia Library On Jun 29, 2015, at 12:20 PM, Andy Seaborne a...@apache.org wrote: There is no case of external modification of the database which Fuseki is running. A disaster will occur otherwise. [Modifying externally while running requires a different approach (e.g. switching between two copies of the database ... maybe ... so many ways to corrupt a database ... ).] E-tags is a quite technical solution - will any system actually use it for real even if it is the right solution? We wouldn't want to find out that etags support does not get used. For the SPARQL Protocols case (with query stings), it might not really get used. Has caching of requests including query string rolled out to any degree? (a point from discussion in JENA-388). If query string currently cause no caching by intermediaries in practice, will clients cache which is the case of one client reissuing the same query? Possible but is it likely? See also JENA-626 SPARQL Query Caching. That would make a difference - different client apps starting up often ask the same query to get started. Andy On 29/06/15 16:03, Claude Warren wrote: I am not familure with how the indexing interplays with the rest of the Jena system. My assumption is, like you, that we only want the ETag in the Fuseki layer. However, to generate an ETag it seems like Fuseki will need to be able to ask the underlying dataset when the last change occured, but then you also want to know if indexing has changed so that results my be changed as well. If we consider ETag generation separate from the Dataset then the ETag generator could register as a listener to the dataset and react whenever a change occurs to the model.This doesn't solve the problem of responding to index updates. However, whatever interface the listener uses to trigger an ETag change could just as well be done by an indexer. Is there an indexer listener interface (ala Model/Graph listeners)? In this solution the ETag gets input from any registered component. I think that each registered component should have a name and a value. The ETag generator would retain the most recent value for each registered component and generate a new ETag when a value changes. So I see a class with 2 methods void ETagGenerator.change( String name, String value ) and String ETagGenerator.getTag(); // to retrieve the current tag. Claude On Mon, Jun 29, 2015 at 2:50 PM, aj...@virginia.edu aj...@virginia.edu wrote: On Jun 29, 2015, at 9:33 AM, Claude Warren cla...@xenei.com wrote: If there were an ETag per dataset and a method on the dataset to force an ETag reset would this address the index issue in that the indexer could reset the ETag when it deemed appropriate? It might-- for that indexer. I would be concerned about setups in which another process acted against the data out of sight of Fuseki. But would the ETag be on ARQ's Dataset itself? If I understand what's going on here correctly (debatable at best), Dataset should not have any HTTP concerns mixed into it. ETag would be on something closer to Fuseki's DataService, which I do not think would normally be accessible to an indexer which is only aware of what's on disk… but this is all from my understanding of the architecture, which is pretty minimal. {grin} Maybe some kind of last changed timestamp could reasonably go on Dataset to support this kind of function? In any case I would go with the first choice. It definitely seems like the most bang for the least buck. Is there anything that prohibits sending both an ETag and a constant expires? I havn't looked but I recall they are not mutually exclusive. Yes, I think you are correct. I suppose a bad ETag will never be known to be such as long as it is inside the range of a still-good Expires, but that is a question for the administrator configuring Fuseki, it seems to me. There is also Cache-Control, of course, in the same field of functionality. --- A. Soroka The University of Virginia Library
Re: CMS diff: Reviewing Contributions
Good point. (Speaking as someone who regularly has to be corrected about this {grin}.) --- A. Soroka The University of Virginia Library On Jun 29, 2015, at 12:57 PM, Andy Seaborne a...@apache.org wrote: Good comments - I've made some revisions to the page based on this input. It reminded me to add a request for pull requests to have commits focused on the pull requests/contribution functionality, not details of how the code has evolved up to that point (i.e the internal history). Different audiences. Andy On 26/06/15 16:12, A. Soroka wrote: Clone URL (Committers only): https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/getting_involved%2Freviewing_contributions.mdtext A. Soroka Index: trunk/content/getting_involved/reviewing_contributions.mdtext === --- trunk/content/getting_involved/reviewing_contributions.mdtext (revision 1655891) +++ trunk/content/getting_involved/reviewing_contributions.mdtext (working copy) @@ -29,6 +29,19 @@ @author tags will not prevent a contribution being accepted but **should** be removed by the committer who integrates the contribution. +## Code style + +Jena does not have a particular formal code style specification at this time, but here are some simple tips for keeping your contribution in good order: + +- Don't create a method signature that throws checked exceptions that aren't ever actually thrown from the code in that method unless an API supertype specifies that signature. Otherwise, clients of your code will have to include unnecessary handling code. +- Don't leave unused imports in your code. Any IDE can solve that problem with one keystroke. :) +- If a type declares a supertype that isn't a required declaration, consider whether that clarifies or confuses the intent. The former is okay, the latter not so good. +- Minimize the new compiler warnings your patch creates. If you use @SuppressWarnings to hide them, please add a comment explaining the situation or a TODO with a potential future fix that would allow removing the suppression. +- Remove unused local variables or fields or uninteresting unused private methods. If it's debugging detritus, consider replacing it with good logging code for future use, if that seems likely to become useful. +- If there is valuable code in some unused private method, add a @SuppressWarnings(unused) with an explanation of when it might become useful. If there is valuable but unused code inside a used method, consider breaking it out into a private method and adding a @SuppressWarnings(unused) and an explanation. + + + ## Contribution to Apache The Apache License states that any contribution to an Apache project is automatically considered to be contributed to the Apache foundation and thus liable for inclusion in an Apache project **unless** the contributor explicitly states otherwise.
Re: [jira] [Commented] (JENA-966) LazyIterator
Right, I updated my comment right after I made it, when I noticed the difference. I shouldn't think it matters which one to keep. LazyIterator is a little shorter to write. :) There are a number of other Iterators (noted in the comments to that ticket) that seem to be depreciate-able. E.g. SingletonIterator has equivalent Guava functionality, and UniqueExtendedIterator has its own comments suggesting that it be deprecated (New development should use use codeUniqueFilter/code…). As I said in an earlier message, I will issue a reworked PR #79 with those suggestions, but I will not touch the lazy iterators. --- A. Soroka The University of Virginia Library On Jun 24, 2015, at 11:32 AM, Claude Warren cla...@xenei.com wrote: Yes. LazyIterator implements ExtendedIterator. LateBindingIterator implements Iterator My plan -- probably won't execute until tomorrow night -- is to complete the implementation of LazyIterator for both (2.13.1 and 3.0.0) and then deprecate LateBinding in favor of Lazy as ExtendedIterator implements Iterator. Though I could very easily be swayed to alter LateBindingIterator to implement ExtendedIterator and deprecate LazyIterator. Claude On Wed, Jun 24, 2015 at 4:13 PM, A. Soroka (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599547#comment-14599547 ] A. Soroka commented on JENA-966: Is there any difference between {{LateBindingIterator}} and {{LazyIterator}}? LazyIterator Key: JENA-966 URL: https://issues.apache.org/jira/browse/JENA-966 Project: Apache Jena Issue Type: Bug Components: Core Affects Versions: Jena 3.0.0 Reporter: Claude Warren Assignee: Claude Warren LazyIterator is an abstract class. The documentation indicates that the create() method needs to be overridden to create an instance. From this I would expect that now LazyIterator(){ @Override public ExtendedIteratorModel create() { ... }}; Would work however LazyIterator does not override: remoteNext(), andThen(), toList(), and toSet(). I believe these should be implemented in the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [jira] [Comment Edited] (JENA-966) LazyIterator
The Stream API is definitely significantly different from the Function API. Maybe you mean that Stream is significantly different from Iterator (which is surely is)? Everything you say about deprecation seems very fair to me, and that's what happened to a number of other types (e.g. in jena-core). I had a PR in to actually remove these guys under discussion. (Mentioned in the comments to this ticket.) I will rework that to only deprecate them and add advice on how to use the Java 8 idioms instead, and we can take that into your proposed other ticket. As far as Java 8 adoption, I appreciate the difficulties there. I will step out of the way and let others discuss that, because I am lucky to be in a position where the issue is not too urgent. --- A. Soroka The University of Virginia Library On Jun 22, 2015, at 4:23 PM, Claude Warren cla...@xenei.com wrote: I must have misunderstood a post from Andy then. My error. I thought there was a comment from Andy that indicated that Supplier was part of Stream and that Stream was significantly different from Function. As I said, my error and I am happy to drop that point. As for the code base, for any publicly accessible surfaces we have to consider that they may be used outside of our base in products built upon Jena. The the class was exposed then removing it should not be cut and done. (I learned this lesson the hard way). Naturally this does not apply to internal code where the interface remains the same and the implementation changes. Thus my suggestion to fill out the class. I also proposed that we open an Epic to discuss how to move toward the Function approach you propose. I think that we will need a two prong approach. Retain the current interfaces that have been publicly available while marking them as deprecated and pointing to the Function approach as the replacement. I would think that we could mark as deprecated and indicate that they will be removed in 3.1.0 (or some such). Perhaps we should discuss how long to keep deprecated bits around before removing them. I think that 3.x.y deprecates something then 3.x+1.0 should be the earliest it should be removed. I also have concerns about the full on Java 8 adoption path we are on as there are cases where Java 8 is not available. Working for IBM I can tell you that we still support Java 6 and that the Java 6 IBM ships is patched to resolve security issues. But the fact remains that there are environment that are not at java 8 and won't be there any time soon. Our customers are reticent to move java versions in their large environments. I have a project where we are going to have to back port Jena 13.1 to Java 6 (if possible). But that is neither here nor there with regards to the topic at hand. Claude On Mon, Jun 22, 2015 at 9:03 PM, A. Soroka (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596545#comment-14596545 ] A. Soroka edited comment on JENA-966 at 6/22/15 8:02 PM: - Just as a sidenote for anyone following this who is not familiar with the use of {{Supplier}}: {{Supplier}} is not part of the Java 8 Stream API, it's part of the Function API. It is a very simple SAM interface that is intended to hold a computation. So one might do: {code:language=java} # the only costs of the line below are an assignment and object creation SupplierFoo fooForLater = () - expensiveCompution(); # do some other stuff # the line below is where we pay the cost of expensiveCompution() Foo myFoo = fooForLater.get(); {code} was (Author: ajs6f): Just as a sidenote for anyone following this who is not familiar with the use of {{Supplier}}: {{Supplier}} is not part of the Java 8 Stream API, it's part of the Function API. It is a very simple SAM interface that is intended to hold a computation. So one might do: {code:language=java} # the only cost of the line below is an assignment SupplierFoo fooForLater = () - expensiveCompution(); # do some other stuff # the line below is where we pay the cost of expensiveCompution() Foo myFoo = fooForLater.get(); {code} LazyIterator Key: JENA-966 URL: https://issues.apache.org/jira/browse/JENA-966 Project: Apache Jena Issue Type: Bug Components: Core Affects Versions: Jena 3.0.0 Reporter: Claude Warren Assignee: Claude Warren LazyIterator is an abstract class. The documentation indicates that the create() method needs to be overridden to create an instance. From this I would expect that now LazyIterator(){ @Override public ExtendedIteratorModel create() { ... }}; Would work however LazyIterator does not override: remoteNext(), andThen(), toList(), and
Re: [jira] [Commented] (JENA-966) LazyIterator
How about using a Java 8 SupplierIteratorT? That's pretty lazy. --- A. Soroka The University of Virginia Library On Jun 17, 2015, at 5:07 AM, Claude Warren cla...@xenei.com wrote: I wanted to use it in an application. Is there a replacement? On Wed, Jun 17, 2015 at 9:10 AM, Andy Seaborne (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589458#comment-14589458 ] Andy Seaborne commented on JENA-966: There are no uses of this class in the codebase anymore. We can remove it. LazyIterator Key: JENA-966 URL: https://issues.apache.org/jira/browse/JENA-966 Project: Apache Jena Issue Type: Bug Components: Core Affects Versions: Jena 3.0.0 Reporter: Claude Warren Assignee: Claude Warren LazyIterator is an abstract class. The documentation indicates that the create() method needs to be overridden to create an instance. From this I would expect that now LazyIterator(){ @Override public ExtendedIteratorModel create() { ... }}; Would work however LazyIterator does not override: remoteNext(), andThen(), toList(), and toSet(). I believe these should be implemented in the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: CLI libraries
In the PR I submitted a day or two ago, I added a DEPRECATED: Please use riot instead message to the help of rdfcat, but I didn't have it emit that message to stderr on all runs. That seems like a good move to me. I'll add that, and we can then decide whether to go all the way to having that be the _only_ think rdfcat does in Jena 3. --- A. Soroka The University of Virginia Library On Jun 10, 2015, at 6:39 AM, Andy Seaborne a...@apache.org wrote: On 09/06/15 17:11, aj...@virginia.edu wrote: I don't see any actual references in the documentation to rdfcat. Perhaps it can be deprecated? Interesting question - how to deprecate a command line tool? Print a This is deprecated message to stderr? As a Jena3 step we can be faster with migration. jena.rdfcat = only a message saying use riot? But first - is riot a good enough replacement? Does it need more documentation? (probabaly yes as facilities got added incrementally : --formatted=FORMAT needs to be the default output style and streaming require intervention). Andy --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote: People use rdfcat :-( but nowadays riot is better IMO (scale, speed, arguments, ..) but I'm not unbiased.
Re: Trouble Building Under Eclipse
I work on other projects for which we separate the lifecycles of the main product and ancillary or supporting products (e.g. configuration for Checkstyles) and it works well so long as: 1) The sidecar artifacts are available from Maven Central or an appropriate more specific repository. This avoids any annoying double-build situations. 2) The cost of building/publishing the sidecar artifacts is low. This is because it's done less frequently and therefore less expertise develops in the community about doing it. As always in dev workflows, YMMV, but shaded Guava does seem to me like a good candidate. If the conversation about project code style picks up again (and I will be trying to move that forward in a message tomorrow) then artifacts related thereto might also be good candidates. --- A. Soroka The University of Virginia Library On Jun 10, 2015, at 5:47 AM, Andy Seaborne a...@apache.org wrote: On 09/06/15 16:26, aj...@virginia.edu wrote: Okay, now I get why we're sticking with shading in Guava, at least for now (since this seems like the kind of problem that OSGi solves and hopefully Jigsaw will solve). Are there objections to ejecting shaded Guava from the main dev effort into its own orbit? Or is there a dev cycle associated to the main one that makes sense as a home for Guava? I don't mind either way - doesn't seem like a clear cut right or wrong. Currently, we have a single build and it produces a single consistent cut of versions (e.g. the binary distribution includes dependencies). jena-shade-guava is the same version as main jena version. One release vote. How often does Guava versions change? 16,17,18 were close together (a few months) but 18, the latest, was Aug 2014. Andy --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 3:11 PM, Andy Seaborne a...@apache.org wrote: Hadoop/Elephas is an example of a general problem with Guava. By reputation, upgrading Guava across versions has been problematic - subtle and not-so-subtle changes of behaviour or removed code. When Jena is used as a library, the system or application in which it is used might use Guava itself - and need a specific version. But Jena uses Guava and needs a specific version with certain code in it, which might be different. We are isolating Jena's use of Guava from the system in which Jena is used. Hadoop's have very strong requirements on Guava versions - it might well apply to other user applications as well. We do exclude/ in the sense that dependency-reduced-pom.xml POM of jena-shared-guava does not mention com.google.guava:guava. Elephas picks up the Hadoop dependency. Andy On 08/06/15 14:26, aj...@virginia.edu wrote: I think the idea of breaking the shaded Guava artifact out of the main cycle is great. It's clearly not a subject of work under most circumstances and having one less moving part in a developer's mix is usually a good thing, especially for the simple-minded ({raises hand}). Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that blow up Hadoop? Or should I go experiment and find out? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote: Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project
Re: TDB2
Is there some high level overview of Lizard/Mantis/TDB2 yet extant? Like the kind of thing we might see at a conference? In any event, thanks for working on this-- it's great to know that Jena will be able to cluster soon. --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 1:24 PM, Andy Seaborne a...@apache.org wrote: On 08/06/15 17:48, Marco Neumann wrote: is TDB2 going to replace TDB or is TDB2 a new cluster product? Whatever people (users, developers) want. Migrating Dbs is not as easy as ungrading code. Running oaj.tdb and oaj.tdb2 side by side (TDB2 is itself 7 maven modules ATM - some can be combined as they are small and just a good idea at the time). TDB2 is not the cluster (that's Lizard). Mantis started as the separation out of the low level code needed for Lizard. Initially validation of the reworking of transaction and datastructures, a little extra work has made it as viable as TDB2 Andy (oaj = org.apache.jena) Marco On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote: Informational announcement: TDB2 TDB2 is a reworking of TDB based on updated implementations of transactions and transactional data structures for project Lizard (a clustered SPARQL store). TDB2 has: * Arbitrary scale write-once transactions * New transaction system - can add other first class components. (e.g. text indexes, cache tables) * Models works across transaction boundaries * Cleaner, simpler, more maintainable TDB2 databases are not compatible with TDB databases. It uses a more efficient encoding for RDF terms. [1] Being a database, the new indexing and transaction code needs time to settle to bring the maturity up. I'm using that tech in Lizard development. Andy TDB2 code: https://github.com/afs/mantis/tree/master/tdb2 Lizard slides: http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard [1] An upgrade path using TDB1-style encoding is possible; it is an one-way upgrade path and not reversible [2]. TDB2 adds control files for the copy-on-write data structures that TDB1 does not understand. [2] Actually, if the encoding is compatible, what will happen is that TDB1 will see the database at the time of the upgrade. Welcome to copy-on-write immutable data structures.
Re: Trouble Building Under Eclipse
Okay, now I get why we're sticking with shading in Guava, at least for now (since this seems like the kind of problem that OSGi solves and hopefully Jigsaw will solve). Are there objections to ejecting shaded Guava from the main dev effort into its own orbit? Or is there a dev cycle associated to the main one that makes sense as a home for Guava? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 3:11 PM, Andy Seaborne a...@apache.org wrote: Hadoop/Elephas is an example of a general problem with Guava. By reputation, upgrading Guava across versions has been problematic - subtle and not-so-subtle changes of behaviour or removed code. When Jena is used as a library, the system or application in which it is used might use Guava itself - and need a specific version. But Jena uses Guava and needs a specific version with certain code in it, which might be different. We are isolating Jena's use of Guava from the system in which Jena is used. Hadoop's have very strong requirements on Guava versions - it might well apply to other user applications as well. We do exclude/ in the sense that dependency-reduced-pom.xml POM of jena-shared-guava does not mention com.google.guava:guava. Elephas picks up the Hadoop dependency. Andy On 08/06/15 14:26, aj...@virginia.edu wrote: I think the idea of breaking the shaded Guava artifact out of the main cycle is great. It's clearly not a subject of work under most circumstances and having one less moving part in a developer's mix is usually a good thing, especially for the simple-minded ({raises hand}). Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that blow up Hadoop? Or should I go experiment and find out? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote: Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
On Jun 8, 2015, at 6:35 PM, Andy Seaborne a...@apache.org wrote: less - there is no transactionality across the contained graphs. (Model.graph transactions aren't connected to dataset transactions) Ah, glad I asked! {grin} As far as model-as-views-of-datasets: is it true that all that is needed for this is a good in-memory dataset? It would be useful for working in-memory. For example default union graph can bne made to work efficiently, as can dataset transactions. Okay, so it's more that having a good in-memory dataset would be helpful here? I'm just trying to establish if you see the in-memory dataset improvement as _blocking_ models-as-views or just that models-as-views would be worth more and work better accompanied by a better in-memory dataset. What about datasets that are much too large for memory? Or impls of Dataset that incur network latency in operation? Or do these cases just imply the need for the right kinds of laziness in views on Datasets? Models from TDB are already views. public class GraphTDB extends GraphView … Cool. So we already have that laziness in hand in the form of GraphView. --- A. Soroka The University of Virginia Library
Re: CLI libraries
I don't see any actual references in the documentation to rdfcat. Perhaps it can be deprecated? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote: People use rdfcat :-( but nowadays riot is better IMO (scale, speed, arguments, ..) but I'm not unbiased.
Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
So to be clear, part of the idea here is to boost the visibility of transactions, and one of the things that wants doing as part of that is to provide for copy-on-add-graph semantics for the in-memory dataset so that transactionality is coherent across such a dataset. Right now it instead is a sort of patchwork of whatever forms of transactionality were available in the graphs that have been added to it, which isn't an attractive thing to advertise, and may not even really work all the time. As far as model-as-views-of-datasets: is it true that all that is needed for this is a good in-memory dataset? What about datasets that are much too large for memory? Or impls of Dataset that incur network latency in operation? Or do these cases just imply the need for the right kinds of laziness in views on Datasets? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote: On 08/06/15 10:25, Claude Warren wrote: What exactly is this review asking? Change in strategy or change in docs? Both :-) concurrency-howto does not mention transactions except in passing. It shoudl be more pro-transactions IMO. A possibility is that Dataset are all transactional, even is that is only DatasetGraphWithLock; No Dataset.supportsTransactions - its always true. Remove Dataset.getlock. concurrency-howto would be for model-only use. Everything else is transaction in style. The documentation should reflect this preferred style. If we had (hi ajs6f!) an in-memory dataset as well as the general container one, and the in-memory one were transactional, copy-in for addGraph, we could make models be views of datasets always. Creating a model would have an implicit Dataset if a free standing model. Andy On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org wrote: Andy Seaborne created JENA-957: -- Summary: Review concurrency howto in the light of transactions. Key: JENA-957 URL: https://issues.apache.org/jira/browse/JENA-957 Project: Apache Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor http://jena.apache.org/documentation/notes/concurrency-howto.html Include {{DatasetGraphWithLock}}. Consider if that should be the default for in-memory and general datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Trouble Building Under Eclipse
I think the idea of breaking the shaded Guava artifact out of the main cycle is great. It's clearly not a subject of work under most circumstances and having one less moving part in a developer's mix is usually a good thing, especially for the simple-minded ({raises hand}). Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that blow up Hadoop? Or should I go experiment and find out? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote: Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
CLI libraries
In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, in package jena.cmdline. If that is correct, and Jena has its own CLI code, wouldn't it be better to replace this with a modern CLI library like that provided by Apache Commons? Does that sound like a ticket? --- A. Soroka The University of Virginia Library
Re: CLI libraries
Okay, that makes sense. Is the larger move (the construction of 'jena-cmd') worth an epic in Jira? With the smaller (take arq.cmd* to jena-base/jena.cmd* and drop jena-core/jena.cmdline) as a first story therein? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote: On 08/06/15 15:47, aj...@virginia.edu wrote: In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, in package jena.cmdline. If that is correct, and Jena has its own CLI code, wouldn't it be better to replace this with a modern CLI library like that provided by Apache Commons? Does that sound like a ticket? arq.cmdline.CmdLineArgs The whole cmd support does more than Apache Commons CLI. Around command line processing is support for grouping and reuse across commands, and an execution model. There are a lot of commands -- Apache Commons CLI would also cause chnages in syntax. e.g. arq.cmd does not treat -- and - differently; combined POSIX like options aren't supported. (jena.cmdline looks like some partial copy to get older development working). A useful goal might be to have a module jena-cmd which is after SDB, TDB and the rest with the set of command line tools we deed to be the public set of commands (some of the old stuff needs retiring or at least incompatibly brought into the general style - e.g. rdfcompare). People use rdfcat :-( but nowadays riot is better IMO (scale, speed, arguments, ..) but I'm not unbiased. A useful but bounded stpe might be to take arq.cmd* to jena-base/jena.cmd* and drop jena-core/jena.cmdline (not tried this so there maybe a forgotten dependency). Andy --- A. Soroka The University of Virginia Library
Re: Trouble Building Under Eclipse
I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
Re: Commons RDF
Wearing my Jena user's hat for a moment, this would be lovely and I would be happy to help with it. A project [1] on which I work persists RDF via some very complex mappings into and out of a JCR repository, and being able to stream it a little more gracefully would be a nice win for us. Those mappings are basically formed out of iterators and transformations, kind of a poor man's Stream API, but we're moving to rebuild over the real Streams API. Maybe this could be generalized into a more popular use case? [1]: http://www.fedora-commons.org/ --- A. Soroka The University of Virginia Library On May 14, 2015, at 3:46 PM, Stian Soiland-Reyes st...@apache.org wrote: I'm also interested in making Jena parsers and serializers usable directly from a Commons RDF perspective, without interaction with intermediate Jena core objects. E.g something like: StreamTriple s = JenaCommonsRDF.read(inputStream, Lang.Turtle) And vice versa for write. Such a bridge should be possible on top of StreamRDF and RIOT, right? Perhaps a Worker thread is needed if there is pull vs push issues. Should we start a branch, or first flesh out the rough edges of such a bridge module in the wiki? On 13 May 2015 15:59, Andy Seaborne a...@apache.org wrote: On 12/05/15 15:26, A. Soroka wrote: At: http://commonsrdf.incubator.apache.org/implementations.html It says Apache Jena is considering a compatibility interface that provides and consumes Commons RDF objects. I'm wondering if there have been any experiments to that end, or whether Jena is waiting for some resources to explore that possibility? I would be happy to give a go at making a simple module that just implements the current Commons RDF API types over jena-core in a simple way, to get things started. --- A. Soroka The University of Virginia Library I have some code that mocks up commonsrdf over Jena in the sense that it uses jena behind the RDFTermFactory; that's the easy bit. It's limited and definitely not a bridge between the two APIs. It is merely exploring the commonsrdf work. It would mess up the existing interfaces no end to add commonsrdf as interfaces to Model/Resource; and Graph/Triple/Node is generalized RDF so the type model does not fit. It needs a bridge module and a proper module would be good. ((I also have https://github.com/afs/commonsrdf-container which is even more minimal than the simple implementation. Not Jena related.)) Some other interesting projects: An in-memory dataset : JENA-624 Have a specifically in-memory DatasetGraph to complement the current general purpose dataset. Bruno is working on JENA-632 In fact, I can see commonsrdf being at the center of a new API, very Java8 specific, that is oriented around processing RDF stream style - see the email from Paul Houle. Or take StreamRDF and add java8-stream-ness around it (maybe not directly changing but making it the source for java8-streams - some issues of pull-streams and push-stream styles here which are hard when efficiency is considered). Andy
code quality tools WAS: Code policies
There seems to be some consensus that it would be nice to bring in some automated code quality facilities for Jena. So far, the ones that have been mentioned are: 1) Sonar, which is on the way-- https://issues.apache.org/jira/browse/INFRA-9469 2) FindBugs, for which good Maven support exists: http://gleclaire.github.io/findbugs-maven-plugin/ 3) PMD, for which again, good Maven support exists http://maven.apache.org/plugins/maven-pmd-plugin/ I've made a ticket for trying out FindBugs and PMD: https://issues.apache.org/jira/browse/JENA-941 and I'll happily work it. Maybe we'll like the feedback, maybe not, but it's always good to get more info. --- A. Soroka The University of Virginia Library On May 12, 2015, at 5:22 PM, Bruno P. Kinoshita ki...@apache.org wrote: I think that something like checkstyle, PMD and FindBugs, and update the contribution page asking contributors to review their changes before sending PR's or patches would help. It would be good to avoid replicating unnecessary policies in the web site though. Like suggesting that we expect the contributor to use 80 columns. That would mean that we would have to update the checktyle XML rule file and the web site if we decided to use 120 or any other number. We can probably leave some basic policies (no tabs, no unused imports, etc). WDYT? Bruno From: A. Soroka aj...@virginia.edu To: dev@jena.apache.org dev@jena.apache.org Sent: Wednesday, May 13, 2015 2:07 AM Subject: Code policies From comments on some clean up PRs I submitted over this past weekend, it seems that it would be nice to have some rough code standards that could help newcomers _without_ inhibiting anyone from contributing. Possible policies that came up included: • Don't give a method signature that throws checked exceptions that aren't ever thrown from the code in that method unless an API supertype specs it. • Don't leave unused imports in. Any IDE can solve that problem with one keystroke. {grin} • If a type declares a supertype that isn't a required declaration, consider whether that clarifies or confuses the intent. The former is okay, the latter not so good. • Minimize the compiler warnings your code throws up. If you use @SuppressWarnings to hide them, please add a comment explaining the situation or a TODO with a potential future fix that would allow removing the suppression. • Remove unused local variables or fields or uninteresting unused private methods. If it's debugging detritus, consider replacing it with good logging code for future use, if that seems likely to become useful. • If there is valuable code in some unused private method, add a @SuppressWarnings with an explanation of when it might become useful. If there is valuable but dead code inside a live method, consider breaking it out into a private method and adding a @SuppressWarnings and an explanation. If we can develop a reasonable list of expectations for contributions (and presumably, for the current code base) I will be happy to write some text for project site pages and try to encode some of the expectations with Maven Checkstyles. To be clear, I'm not suggesting any kind of blocking step in the build process, just a chance for some handy feedback about code submissions. Thoughts? --- A. Soroka The University of Virginia Library
Re: another clean up suggestion: dead code and its resuscitation
I've laid in a ticket: https://issues.apache.org/jira/browse/JENA-938 and attached a few PRs of reasonable size. They contain the removal of superinterfaces that don't need declaration, checked exceptions that cannot be thrown, and unnecessary typecasts. Those seemed to be entirely non-controversial moves to make. They result in additions and deletions, but the net result is many, many lines that are shorter and easier to read and 439 fewer LOC in total. It's not clear to me that there was consensus about removing never-called private methods, unreachable code (i.e. if (false) {…}) or unused fields. I think I could send in at least one more PR with the removal of unused local variables? That also seems generally non-controversial. --- A. Soroka The University of Virginia Library On May 8, 2015, at 10:14 AM, Claude Warren cla...@xenei.com wrote: I think the catch of execptions that can not be thrown can be safely removed. I would also vote up removal of private methods that are never used. fields are a bit trickier, but then I am probably thinking of parameters and matching an interface... Yeah, I would vote up removing unused methods. Claude On Fri, May 8, 2015 at 3:54 AM, aj...@virginia.edu aj...@virginia.edu wrote: I'm building a PR [1] right now as a sort of think-piece to give us something concrete to look at. I'm building it up out of ONLY things that Eclipse/javac can determine are definitely impossible to execute or redundant or never used, including: - Private methods that are never used. - Superinterfaces that don't need declaration. - Fields that are never used. - Checked exceptions that cannot be thrown. and so far, I'm at about 11,000 lines to delete, which is… a good many. Certainly too many to believe that all are really totally dead stuff that should be gone. {grin} As you point out, some portion of this is stuff that we wouldn't want to lose. My hope is that we can look over this PR and develop some tickets for the kinds of things to which you refer (e.g. features that didn't make it into SPARQL) and insert some TODOs and so forth. And maybe we can use it as a starting place for actual pruning. I'll send the PR sometime tomorrow. [1] https://github.com/ajs6f/jena/tree/KillDeadThings --- A. Soroka The University of Virginia Library On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote: +1 to removing dead code though what is dead is tricky. In arq and tdb there was some but they included code that is a useful record (e.g. features that didn't make it into SPARQL). I removed obvious junk. Some is checking code that I'd like to leave. I had a look - a regex of if *\( *false *\) but I didn't find much in core (just 2) if(false) requires the compiler to generate no code and final boolean but in Java8, does that include effectively final? What were you looking for? I tend to agree that the use of a field makes things worse. Andy On 07/05/15 19:24, Stephen Allen wrote: I'd say just eliminate all of that dead code. Also any commented code as well. We have a source control system, one can always look into the history to get that stuff. Using a field just makes it worse IMO... it'll never get removed if we do that. -Stephen On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote: There are a goodly number of pieces (150) of dead code in Jena, of the form: org.apache.jena.mem.HashCommon: void showkeys() { if (false) { System.err.print( KEYS: ); // some logging code System.err.println(); } } If I understand this rightly, these are cases where we want to keep some code on deck for potential use. I'd like to suggest that many of these guys might be rewritten with a field or fields in the class, something like: boolean useLoggingCode = false; void showkeys() { if (useLoggingCode) etc. } This would make things a bit clearer and clean out a bunch of compiler warnings. Does this sound like a good approach? Worth doing? --- A. Soroka The University of Virginia Library -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: another clean up suggestion: dead code and its resuscitation
There is now a PR at: https://github.com/apache/jena/pull/58 with much of this work available. The idea is not to merge that gargantuan PR, but to give folks an easy way to see what the code looks like after I took a meat ax to it. {grin} I would be happy to create more reasonable packages of changes from that monster PR for serious review and possible merging. Would a module-by-module approach be best? --- A. Soroka The University of Virginia Library On May 8, 2015, at 10:14 AM, Claude Warren cla...@xenei.com wrote: I think the catch of execptions that can not be thrown can be safely removed. I would also vote up removal of private methods that are never used. fields are a bit trickier, but then I am probably thinking of parameters and matching an interface... Yeah, I would vote up removing unused methods. Claude On Fri, May 8, 2015 at 3:54 AM, aj...@virginia.edu aj...@virginia.edu wrote: I'm building a PR [1] right now as a sort of think-piece to give us something concrete to look at. I'm building it up out of ONLY things that Eclipse/javac can determine are definitely impossible to execute or redundant or never used, including: - Private methods that are never used. - Superinterfaces that don't need declaration. - Fields that are never used. - Checked exceptions that cannot be thrown. and so far, I'm at about 11,000 lines to delete, which is… a good many. Certainly too many to believe that all are really totally dead stuff that should be gone. {grin} As you point out, some portion of this is stuff that we wouldn't want to lose. My hope is that we can look over this PR and develop some tickets for the kinds of things to which you refer (e.g. features that didn't make it into SPARQL) and insert some TODOs and so forth. And maybe we can use it as a starting place for actual pruning. I'll send the PR sometime tomorrow. [1] https://github.com/ajs6f/jena/tree/KillDeadThings --- A. Soroka The University of Virginia Library On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote: +1 to removing dead code though what is dead is tricky. In arq and tdb there was some but they included code that is a useful record (e.g. features that didn't make it into SPARQL). I removed obvious junk. Some is checking code that I'd like to leave. I had a look - a regex of if *\( *false *\) but I didn't find much in core (just 2) if(false) requires the compiler to generate no code and final boolean but in Java8, does that include effectively final? What were you looking for? I tend to agree that the use of a field makes things worse. Andy On 07/05/15 19:24, Stephen Allen wrote: I'd say just eliminate all of that dead code. Also any commented code as well. We have a source control system, one can always look into the history to get that stuff. Using a field just makes it worse IMO... it'll never get removed if we do that. -Stephen On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote: There are a goodly number of pieces (150) of dead code in Jena, of the form: org.apache.jena.mem.HashCommon: void showkeys() { if (false) { System.err.print( KEYS: ); // some logging code System.err.println(); } } If I understand this rightly, these are cases where we want to keep some code on deck for potential use. I'd like to suggest that many of these guys might be rewritten with a field or fields in the class, something like: boolean useLoggingCode = false; void showkeys() { if (useLoggingCode) etc. } This would make things a bit clearer and clean out a bunch of compiler warnings. Does this sound like a good approach? Worth doing? --- A. Soroka The University of Virginia Library -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: another clean up suggestion: dead code and its resuscitation
I'm building a PR [1] right now as a sort of think-piece to give us something concrete to look at. I'm building it up out of ONLY things that Eclipse/javac can determine are definitely impossible to execute or redundant or never used, including: - Private methods that are never used. - Superinterfaces that don't need declaration. - Fields that are never used. - Checked exceptions that cannot be thrown. and so far, I'm at about 11,000 lines to delete, which is… a good many. Certainly too many to believe that all are really totally dead stuff that should be gone. {grin} As you point out, some portion of this is stuff that we wouldn't want to lose. My hope is that we can look over this PR and develop some tickets for the kinds of things to which you refer (e.g. features that didn't make it into SPARQL) and insert some TODOs and so forth. And maybe we can use it as a starting place for actual pruning. I'll send the PR sometime tomorrow. [1] https://github.com/ajs6f/jena/tree/KillDeadThings --- A. Soroka The University of Virginia Library On May 7, 2015, at 7:07 PM, Andy Seaborne a...@apache.org wrote: +1 to removing dead code though what is dead is tricky. In arq and tdb there was some but they included code that is a useful record (e.g. features that didn't make it into SPARQL). I removed obvious junk. Some is checking code that I'd like to leave. I had a look - a regex of if *\( *false *\) but I didn't find much in core (just 2) if(false) requires the compiler to generate no code and final boolean but in Java8, does that include effectively final? What were you looking for? I tend to agree that the use of a field makes things worse. Andy On 07/05/15 19:24, Stephen Allen wrote: I'd say just eliminate all of that dead code. Also any commented code as well. We have a source control system, one can always look into the history to get that stuff. Using a field just makes it worse IMO... it'll never get removed if we do that. -Stephen On Thu, May 7, 2015 at 11:26 AM, A. Soroka aj...@virginia.edu wrote: There are a goodly number of pieces (150) of dead code in Jena, of the form: org.apache.jena.mem.HashCommon: void showkeys() { if (false) { System.err.print( KEYS: ); // some logging code System.err.println(); } } If I understand this rightly, these are cases where we want to keep some code on deck for potential use. I'd like to suggest that many of these guys might be rewritten with a field or fields in the class, something like: boolean useLoggingCode = false; void showkeys() { if (useLoggingCode) etc. } This would make things a bit clearer and clean out a bunch of compiler warnings. Does this sound like a good approach? Worth doing? --- A. Soroka The University of Virginia Library
Re: another possible simplification
Okay, that makes sense, although in some ways it seems more like a rationale for keeping _an_ interface, rather than a rationale that Jena should have its _own_ interface. But keeping Guava types from leaking through makes sense. I will send a PR sometime soon with some Java 8 work in those Cache implementations (e.g. taking advantage of the new Map.computeIfAbsent() method to shorten and tighten some code and maybe using default method implementations to be a bit DRYer), but I won't alter Jena's Cache type itself. Then everyone can decide whether those are legitimate improvements in implementation without effect on API. --- A. Soroka The University of Virginia Library On May 7, 2015, at 5:53 AM, Andy Seaborne a...@apache.org wrote: On 06/05/15 22:30, A. Soroka wrote: I've found another candidate for simplification (well, actually for excision): org.apache.jena.atlas.lib.cache contains several classes that implement an interface org.apache.jena.atlas.lib.Cache. This interface very closely resembles Guava's com.google.common.cache.Cache, and I believe that the Guava type could be substituted without too much fuss. The entire package org.apache.jena.atlas.lib.cache could go away, along with the Does this seem like a worthwhile replacement? --- A. Soroka The University of Virginia Library We do now use Guava cache (shaded) with the CacheGuava implementation of Cache. The naming of getIfPresent or getOrFill was put in recently to reflect the Guava design (and the safe atomic getOrFill is quite useful sometimes - not always - TDB has bi-caches that need synhronized chnages). Having our own Cache interface means different providers can be used. We may find a better one for certain circumstances. The interface stops guava-isms like RemovalNotification leaking out too far. So on balance, I'm more inclined to keep it because we have the current interface already. Andy
Re: What can be removed/simplified ?
Thank you-- that sounds like a good move to make to prevent myself from breaking backwards compatibility. What would be the best way to incorporate your material into my Java 8-related work? Would it be best to wait for it to be merged, or is that some time away? --- A. Soroka The University of Virginia Library On May 2, 2015, at 3:50 AM, Claude Warren cla...@xenei.com wrote: I have ExtendedIterator contract tests in the new test suite. So we should have reasonable test cover for the contract. That code is in the old new_test branch and will be in the new contract test branch soon. I you want I can send you the source to test your implementation with. This will mean adding junit-contracts as a dependency for your tests. Claude On Fri, May 1, 2015 at 5:26 PM, aj...@virginia.edu aj...@virginia.edu wrote: Yes, in that case, the change was no more than extends FilterT - implements PredicateT. No other changes. You can take a look at what's going on at: https://github.com/apache/jena/pull/55 and please comment! As a Jena newbie, I need comments. {grin} --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote: An example is: org.apache.jena.security.utils.RDFListSecFilter Which filters results based on user access and is used whereever a RDFList (or an iterator on one) is returned . Claude On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu wrote: Oh, now I think I understand your point better. Yes, I have already trawled that code and worked over those reusable guys, and yes, you will certainly still be able to combine and reuse Predicates in the same way that you have used Filters. When I get this PR in, you can see some examples of that. A Java 8 Predicate is just an interface that looks much like Jena's Filter, which can benefit from the - lamda syntax and which is designed to fit into the Java 8 language APIs (e.g. for use with Streams). --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote: We have a number of places where Filter objects are created and reused (usueally due to complexity or to reduce the code footprint in terms of debugging). Will it still be possible to define these complex filters and use them in multiple places. The permissions system does this in that it creates a filter for RDFNodes and then applies them to the 3 elements in a triple to create a single filter for triples. There are several cases like this. I will have to look at the permissions code to find a concrete example, but I think this is the case. Claude On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu wrote: As for the Filter implementation. will that be transparant to filter implementations? I assume so. I think this was in response to my question about Filter? If you mean that things that currently implement Filter (outside of Jena's own code) will not be greatly affected, then yes, so I would hope. I will @Deprecated Filter and its methods, but that seems to me to be all that is needed for this first step. I should have a PR with this later today, when you can observe some real code and give me feedback. --- A. Soroka The University of Virginia Library On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote: I don't see any reason not to remove the Node functions. As for the Filter implementation. will that be transparant to filter implementations? I assume so. On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote: (mainly for Claude - I did check jena-pemissions and didn't see any usage) There are a bunch of deprecated statics in Node (the correct way is to use NodeFactory) Node.createAnon() Node.createAnon(AnonId) Node.createLiteral(LiteralLabel) Node.createURI(String) Node.createVariable(String) Node.createLiteral(String) Node.createLiteral(String, String, boolean) Node.createLiteral(String, String, RDFDatatype) Node.createLiteral(String, RDFDatatype) Node.createUncachedLiteral(Object, String, RDFDatatype) Node.createUncachedLiteral(Object, RDFDatatype) It looks like they are not used by the jena codebase and are there for compatibility only. Any reason not to remove them? Andy -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com
Re: Java 8 Streams Was: What can be removed/simplified ?
Thank you for the heads up: I was unaware of Commons Functor. It is nice to see the Commons project put a product in that space. I notice that Functor's basic types do not inherit from the recently introduced Java 8 types (e.g. Function, BiFunction), and that in fact, by a glance at some of its POMs, Functor seems to be using Java 5. Is there some expectation of moving that forward, or is Functor expected to bridge older versions of Java? --- A. Soroka The University of Virginia Library On May 2, 2015, at 7:14 PM, Bruno P. Kinoshita ki...@apache.org wrote: It would let Jena cut out a fair bit of API and implementation code in favor of letting Java itself do the work. +1 Does this seem like a useful direction of work? I believe it could be undertaken without being disruptive, and even without too much code churn except when introducing Stream into the core. If it sounds like a good idea, I would be happy to begin it. I will take a look at each item later, but probably others can confirm whether that makes sense or not, since I'm still getting myself more familiar with Jena code base. But on a side note, I'm planning to start a few dev cycles on Apache Commons Functor in June/July. The idea of the project is to provide FP extensions to Java, much like Commons Lang does for the general language. If while you are working on adding Java 8 to Jena you find yourself creating code that you think could be useful for other projects, please feel free to submit an issue to https://issues.apache.org/jira/browse/FUNCTOR or ping the commons dev mailing list :-) All the bestBruno From: aj...@virginia.edu aj...@virginia.edu To: dev@jena.apache.org Sent: Saturday, May 2, 2015 6:05 AM Subject: Java 8 Streams Was: What can be removed/simplified ? I've noticed a few more places where some Java 8 changes could be brought into play in the interest of simplification, and in particular, the use of Java 8 Streams seems like a nice way to go. It would let Jena cut out a fair bit of API and implementation code in favor of letting Java itself do the work. Here is a small program of incremental changes that I'd like to propose: - We could move NiceIterator's methods up into ExtendedIterator as default implementations and factor NiceIterator out of existence. - Then, we could migrate the API of ExtendedIterator to be a close analog to a subset of the API of Java 8's Stream. (It's not too far away right now.) - Then, we could begin replacing the use of ExtendedIterator, its subtypes (e.g. StmtIterator), and their implementations with Java 8 Streams. That will certainly take a few steps in itself, since ExtendedIterator is in use all over, but I'm confident (perhaps arrogantly so {grin}) that replacing its use at some fairly low-lying levels (I think around and just below TripleStore.find(Triple)) will allow some quick replacement moves at the levels above. - Then, we could begin exposing StreamTs in the signatures of new methods on very public-facing types like Model. For example, by analogy to Model.listSubjects() returning ResIterator, there could also be Model.streamSubjects() returning StreamResource. And then, I hope, the community would begin migrating away from the ExtendedIterator methods and to the Java 8 StreamT methods, because Stream has so much attractive functionality available. Does this seem like a useful direction of work? I believe it could be undertaken without being disruptive, and even without too much code churn except when introducing Stream into the core. If it sounds like a good idea, I would be happy to begin it. --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote: An example is: org.apache.jena.security.utils.RDFListSecFilter Which filters results based on user access and is used whereever a RDFList (or an iterator on one) is returned . Claude On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu wrote: Oh, now I think I understand your point better. Yes, I have already trawled that code and worked over those reusable guys, and yes, you will certainly still be able to combine and reuse Predicates in the same way that you have used Filters. When I get this PR in, you can see some examples of that. A Java 8 Predicate is just an interface that looks much like Jena's Filter, which can benefit from the - lamda syntax and which is designed to fit into the Java 8 language APIs (e.g. for use with Streams). --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote: We have a number of places where Filter objects are created and reused (usueally due to complexity or to reduce the code footprint in terms of debugging). Will it still be possible to define these complex filters and use them in multiple places
Re: Java 8 Streams Was: What can be removed/simplified ?
Of course you are right about the balance to be made for performance. Perhaps this is a chance for me to check my understanding of Jena's architecture: to my examination, in jena-core there is no possibility to control that balance because jena-core abstractions do not understand the differences between resources that are near compute and those farther away in the network. That only becomes apparent to modules like jena-tdb. Truthfully, the qualities that attract me to this change are not performance or power, but concision and clarity. I'm very familiar with Guava's Iterators, Iterables, FluentIterable, etc. but I don't think they offer much more than Jena's ExtendedIterator now has with respect to API. I certainly wouldn't mind replacing some of Jena's implementation code with functions from Guava, which are exceedingly well-exercised, and if it seems reasonable to increase the footprint in Guava that now obtains in the codebase, I could do that as part of a migration of NiceIterator into ExtendedIterator. My overall aim here (which may or may not be a good or important one in the context of the whole project) is to replace a reasonable amount of the Jena-homegrown portions of both API and implementation with functionally- and ergonomically- equivalent-or-superior common property from the largest possible community. As to fluent syntax for basic types, are you referring to the needful plethora of calls to ResourceFactory.createResource() and .createLiteral() and the like? (Because I'm not a big fan of that sort of thing, myself. {grin}) --- A. Soroka The University of Virginia Library On May 4, 2015, at 2:08 PM, Paul Houle ontolo...@gmail.com wrote: I use the JDK8 stream stuff a lot these days but it certainly has its discontents. In particular the parallel stuff is based on the Fork/Join framework; it seems to do OK on correctness, which puts it ahead of some miracle frameworks for parallelization. However, if you understand the rough balance between concurrency overhead, cpu time, and time spent waiting for resources far from the cpu, you can quickly tune ExecutorService to get much better speedup more reliably and also pipeline tasks which makes a big difference. Still I like the idea of being able to turn result sets to streams with a .stream() operator. The Google guava library has a system that does stream()-like operations to Iterables and Iterators and right now I like the syntax better possibly because I have been using it so long (with Jena objects) In the other direction you have Spark, where you are writing what looks like the same kind of code but you have many options in terms of threads, clusters, memory or on-disk, etc. As for those statics, I'd say I want to see a more fluent syntax for common Jena operations. For instance, I use the Jena in-memory model the way that most programmers use hashtables. With the models you have all the cool Resource and Property types but you need to write code to create these things to put them in all the slots and it starts to obscure the simplicity of what is going on. On Mon, May 4, 2015 at 11:50 AM, aj...@virginia.edu aj...@virginia.edu wrote: Thank you for the heads up: I was unaware of Commons Functor. It is nice to see the Commons project put a product in that space. I notice that Functor's basic types do not inherit from the recently introduced Java 8 types (e.g. Function, BiFunction), and that in fact, by a glance at some of its POMs, Functor seems to be using Java 5. Is there some expectation of moving that forward, or is Functor expected to bridge older versions of Java? --- A. Soroka The University of Virginia Library On May 2, 2015, at 7:14 PM, Bruno P. Kinoshita ki...@apache.org wrote: It would let Jena cut out a fair bit of API and implementation code in favor of letting Java itself do the work. +1 Does this seem like a useful direction of work? I believe it could be undertaken without being disruptive, and even without too much code churn except when introducing Stream into the core. If it sounds like a good idea, I would be happy to begin it. I will take a look at each item later, but probably others can confirm whether that makes sense or not, since I'm still getting myself more familiar with Jena code base. But on a side note, I'm planning to start a few dev cycles on Apache Commons Functor in June/July. The idea of the project is to provide FP extensions to Java, much like Commons Lang does for the general language. If while you are working on adding Java 8 to Jena you find yourself creating code that you think could be useful for other projects, please feel free to submit an issue to https://issues.apache.org/jira/browse/FUNCTOR or ping the commons dev mailing list :-) All the bestBruno From: aj...@virginia.edu aj...@virginia.edu To: dev@jena.apache.org Sent: Saturday, May 2, 2015 6:05 AM Subject
Java 8 Streams Was: What can be removed/simplified ?
I've noticed a few more places where some Java 8 changes could be brought into play in the interest of simplification, and in particular, the use of Java 8 Streams seems like a nice way to go. It would let Jena cut out a fair bit of API and implementation code in favor of letting Java itself do the work. Here is a small program of incremental changes that I'd like to propose: - We could move NiceIterator's methods up into ExtendedIterator as default implementations and factor NiceIterator out of existence. - Then, we could migrate the API of ExtendedIterator to be a close analog to a subset of the API of Java 8's Stream. (It's not too far away right now.) - Then, we could begin replacing the use of ExtendedIterator, its subtypes (e.g. StmtIterator), and their implementations with Java 8 Streams. That will certainly take a few steps in itself, since ExtendedIterator is in use all over, but I'm confident (perhaps arrogantly so {grin}) that replacing its use at some fairly low-lying levels (I think around and just below TripleStore.find(Triple)) will allow some quick replacement moves at the levels above. - Then, we could begin exposing StreamTs in the signatures of new methods on very public-facing types like Model. For example, by analogy to Model.listSubjects() returning ResIterator, there could also be Model.streamSubjects() returning StreamResource. And then, I hope, the community would begin migrating away from the ExtendedIterator methods and to the Java 8 StreamT methods, because Stream has so much attractive functionality available. Does this seem like a useful direction of work? I believe it could be undertaken without being disruptive, and even without too much code churn except when introducing Stream into the core. If it sounds like a good idea, I would be happy to begin it. --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote: An example is: org.apache.jena.security.utils.RDFListSecFilter Which filters results based on user access and is used whereever a RDFList (or an iterator on one) is returned . Claude On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu wrote: Oh, now I think I understand your point better. Yes, I have already trawled that code and worked over those reusable guys, and yes, you will certainly still be able to combine and reuse Predicates in the same way that you have used Filters. When I get this PR in, you can see some examples of that. A Java 8 Predicate is just an interface that looks much like Jena's Filter, which can benefit from the - lamda syntax and which is designed to fit into the Java 8 language APIs (e.g. for use with Streams). --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote: We have a number of places where Filter objects are created and reused (usueally due to complexity or to reduce the code footprint in terms of debugging). Will it still be possible to define these complex filters and use them in multiple places. The permissions system does this in that it creates a filter for RDFNodes and then applies them to the 3 elements in a triple to create a single filter for triples. There are several cases like this. I will have to look at the permissions code to find a concrete example, but I think this is the case. Claude On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu wrote: As for the Filter implementation. will that be transparant to filter implementations? I assume so. I think this was in response to my question about Filter? If you mean that things that currently implement Filter (outside of Jena's own code) will not be greatly affected, then yes, so I would hope. I will @Deprecated Filter and its methods, but that seems to me to be all that is needed for this first step. I should have a PR with this later today, when you can observe some real code and give me feedback. --- A. Soroka The University of Virginia Library On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote: I don't see any reason not to remove the Node functions. As for the Filter implementation. will that be transparant to filter implementations? I assume so. On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote: (mainly for Claude - I did check jena-pemissions and didn't see any usage) There are a bunch of deprecated statics in Node (the correct way is to use NodeFactory) Node.createAnon() Node.createAnon(AnonId) Node.createLiteral(LiteralLabel) Node.createURI(String) Node.createVariable(String) Node.createLiteral(String) Node.createLiteral(String, String, boolean) Node.createLiteral(String, String, RDFDatatype) Node.createLiteral(String, RDFDatatype) Node.createUncachedLiteral(Object, String, RDFDatatype) Node.createUncachedLiteral(Object
Re: What can be removed/simplified ?
Oh, now I think I understand your point better. Yes, I have already trawled that code and worked over those reusable guys, and yes, you will certainly still be able to combine and reuse Predicates in the same way that you have used Filters. When I get this PR in, you can see some examples of that. A Java 8 Predicate is just an interface that looks much like Jena's Filter, which can benefit from the - lamda syntax and which is designed to fit into the Java 8 language APIs (e.g. for use with Streams). --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote: We have a number of places where Filter objects are created and reused (usueally due to complexity or to reduce the code footprint in terms of debugging). Will it still be possible to define these complex filters and use them in multiple places. The permissions system does this in that it creates a filter for RDFNodes and then applies them to the 3 elements in a triple to create a single filter for triples. There are several cases like this. I will have to look at the permissions code to find a concrete example, but I think this is the case. Claude On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu wrote: As for the Filter implementation. will that be transparant to filter implementations? I assume so. I think this was in response to my question about Filter? If you mean that things that currently implement Filter (outside of Jena's own code) will not be greatly affected, then yes, so I would hope. I will @Deprecated Filter and its methods, but that seems to me to be all that is needed for this first step. I should have a PR with this later today, when you can observe some real code and give me feedback. --- A. Soroka The University of Virginia Library On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote: I don't see any reason not to remove the Node functions. As for the Filter implementation. will that be transparant to filter implementations? I assume so. On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote: (mainly for Claude - I did check jena-pemissions and didn't see any usage) There are a bunch of deprecated statics in Node (the correct way is to use NodeFactory) Node.createAnon() Node.createAnon(AnonId) Node.createLiteral(LiteralLabel) Node.createURI(String) Node.createVariable(String) Node.createLiteral(String) Node.createLiteral(String, String, boolean) Node.createLiteral(String, String, RDFDatatype) Node.createLiteral(String, RDFDatatype) Node.createUncachedLiteral(Object, String, RDFDatatype) Node.createUncachedLiteral(Object, RDFDatatype) It looks like they are not used by the jena codebase and are there for compatibility only. Any reason not to remove them? Andy -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: What can be removed/simplified ?
Yes, in that case, the change was no more than extends FilterT - implements PredicateT. No other changes. You can take a look at what's going on at: https://github.com/apache/jena/pull/55 and please comment! As a Jena newbie, I need comments. {grin} --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:19 PM, Claude Warren cla...@xenei.com wrote: An example is: org.apache.jena.security.utils.RDFListSecFilter Which filters results based on user access and is used whereever a RDFList (or an iterator on one) is returned . Claude On Fri, May 1, 2015 at 5:12 PM, aj...@virginia.edu aj...@virginia.edu wrote: Oh, now I think I understand your point better. Yes, I have already trawled that code and worked over those reusable guys, and yes, you will certainly still be able to combine and reuse Predicates in the same way that you have used Filters. When I get this PR in, you can see some examples of that. A Java 8 Predicate is just an interface that looks much like Jena's Filter, which can benefit from the - lamda syntax and which is designed to fit into the Java 8 language APIs (e.g. for use with Streams). --- A. Soroka The University of Virginia Library On May 1, 2015, at 12:07 PM, Claude Warren cla...@xenei.com wrote: We have a number of places where Filter objects are created and reused (usueally due to complexity or to reduce the code footprint in terms of debugging). Will it still be possible to define these complex filters and use them in multiple places. The permissions system does this in that it creates a filter for RDFNodes and then applies them to the 3 elements in a triple to create a single filter for triples. There are several cases like this. I will have to look at the permissions code to find a concrete example, but I think this is the case. Claude On Fri, May 1, 2015 at 4:53 PM, aj...@virginia.edu aj...@virginia.edu wrote: As for the Filter implementation. will that be transparant to filter implementations? I assume so. I think this was in response to my question about Filter? If you mean that things that currently implement Filter (outside of Jena's own code) will not be greatly affected, then yes, so I would hope. I will @Deprecated Filter and its methods, but that seems to me to be all that is needed for this first step. I should have a PR with this later today, when you can observe some real code and give me feedback. --- A. Soroka The University of Virginia Library On May 1, 2015, at 11:47 AM, Claude Warren cla...@xenei.com wrote: I don't see any reason not to remove the Node functions. As for the Filter implementation. will that be transparant to filter implementations? I assume so. On Fri, May 1, 2015 at 4:16 PM, Andy Seaborne a...@apache.org wrote: (mainly for Claude - I did check jena-pemissions and didn't see any usage) There are a bunch of deprecated statics in Node (the correct way is to use NodeFactory) Node.createAnon() Node.createAnon(AnonId) Node.createLiteral(LiteralLabel) Node.createURI(String) Node.createVariable(String) Node.createLiteral(String) Node.createLiteral(String, String, boolean) Node.createLiteral(String, String, RDFDatatype) Node.createLiteral(String, RDFDatatype) Node.createUncachedLiteral(Object, String, RDFDatatype) Node.createUncachedLiteral(Object, RDFDatatype) It looks like they are not used by the jena codebase and are there for compatibility only. Any reason not to remove them? Andy -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: What can be removed/simplified ?
Great! Thank you. Would Jena be similarly interested in trying to migrate org.apache.jena.util.iterator.Filter to java.util.function.Predicate? --- A. Soroka The University of Virginia Library On May 1, 2015, at 3:32 AM, Andy Seaborne a...@apache.org wrote: On 30/04/15 17:11, aj...@virginia.edu wrote: I'm a long-time user of Jena, but entirely new to its internals, so this may be a very off-the-mark opinion, but perhaps org.apache.jena.util.iterator.Map1 could be swapped out for java.util.function.Function? --- A. Soroka The University of Virginia Library Hi there, Not off-the-mark at all. There are lots of places where there is old code that can be more naturally written in Java8. Your pull request looks very interesting. Thank you. Andy
Re: What can be removed/simplified ?
I'm a long-time user of Jena, but entirely new to its internals, so this may be a very off-the-mark opinion, but perhaps org.apache.jena.util.iterator.Map1 could be swapped out for java.util.function.Function? --- A. Soroka The University of Virginia Library On Apr 30, 2015, at 10:39 AM, Andy Seaborne a...@apache.org wrote: On 30/04/15 07:20, Claude Warren wrote: While we are at it I would like to see that restriction that requires Grahp.getStatisticsHandler() to return the same instance every time removed. It makes proxies much more difficult. Makes sense. (I would note that statistics for anything involving 2 out of 3 of the args are not provided by any graph that I can find). Andy On Wed, Apr 29, 2015 at 8:58 PM, Andy Seaborne a...@apache.org wrote: On 29/04/15 18:17, Claude Warren wrote: I use the following: addAllowed() is used by contract tests to determine if triples can be added. Also, permission system sets this based on the users permissions. canBeEmpty() is used by the contract tests to determine if the deleteAll methods should return an empty graph. When is this ever false? Inference graphs? (This is not used in the current codebase as far as I can see) deleteAllowd() same use as addAllowed() iteratorRemoveAllowed() -- this is handy to know before the remove is attempted. This isn't honoured everywhere IIRC. You're only looking in jena-core. sizeAccurate() -- this is used by the contract testing to determine if delete and add should alter the number of records reported. I am also looking at adding some hash joining capabilities and knowing if the sizes are accurate may make a difference. But that is all future stuff. FYI: https://github.com/afs/quack These I don't use and can see being removed: findContractSafe() -- I don't know what this one means and have never used it. handlesLiteralTyping() -- This was used but obviously sine all graphs now have to support literal typing this can be removed. and presumably not addAllowed(boolean), deleteAllowed(boolean) (I find the boolean form unhelpful because they don't say what triples can and can not be add/deleted.) so let's try removing: addAllowed(boolean) deleteAllowed(boolean) findContractSafe() handlesLiteralTyping() Andy On Wed, Apr 29, 2015 at 4:14 PM, Andy Seaborne a...@apache.org wrote: On 29/04/15 16:04, Claude Warren wrote: I have no problem retiring FileGraph Capabilities is another issue. I have used it and it is used in several places in the contract tests where we have to know if the graph supports transactions and the like. I find it useful. In addtion the information containd in the capabilities is often not easily discoverable (if at all). Transactions aren't in the Capabilities interface. Which aspects of the Capabilities interface? Some looks to be out of date (findContractSafe); some are not the right question (handlesLiteralTyping) Andy On Wed, Apr 29, 2015 at 9:45 AM, Andy Seaborne a...@apache.org wrote: Claude's email about FileGraph prompted me to think for Jena3: What can be removed? What can be simplifed? Even while keeping the current APIs, there is going to stuff that isn't used, isn't used much, or even gets in the way. For maintainability, effectively unused features are noise and risk needing to maintain contracts that users don't actually use. Some things that come to mind in jena-core: FileGraph [*] Capabilities GraphTransactionHandler and with advocacy: RDFReaderF, RDFWriterF (with RIOT integration; caution for RDF/XML) Some places where interfaces don't seem to add anything: LiteralLabelImpl (actually the whole LiteralLabel thing is worth looking at - maybe we can pull the whole thing into into Node_Literal itself) AnonIds - maybe leave in the RDFAPI (they cross the boundary) but internall bNodes can be a couple of longs for their state (a UUID in other words, not a UID). Andy [*] In Java8, the app, or library code, could do this better as: update(()-{ ... graph changes ... } and update() does the on-disk backup stuff.