I second that. I am now finalising the integration of ES and should have a good production quality implementation ready in a week's time. At that time I would want you guys to have a look at the implementation and provide feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the code in jena-text module and do a round of testing.
Thanks, Anuj Kumar On 2 Mar 2017 22:28, "A. Soroka" <[email protected]> wrote: > I do agree that trying to juggle different versions of Lucene libraries is > probably not a realistic option right now. Luckily (if I understand the > conversation thus far correctly) we have a solid alternative; getting our > current Lucene dependency upgraded should allow us to (eventually) merge > Anuj's work into the mainstream of development. Someone please tell me if I > have that wrong! :grin: > > Let me reiterate that this seems like very good work and speaking for > myself, I certainly want to get it included into Jena. It's just a question > of fitting it in correctly, which might take a bit of time. > > --- > A. Soroka > The University of Virginia Library > > > On Mar 1, 2017, at 1:27 PM, Osma Suominen <[email protected]> > wrote: > > > > Hi Anuj! > > > > I have nothing against modularity in general. However, I cannot see how > your proposal could work in practice for the Fuseki build, due to the > reasons I mentioned in my previous message (and Adam seemed to concur). > > > > In any case, I'll see what I can do to get the Lucene upgrade moving > again. If all current Jena modules (ie jena-text and jena-spatial) were > upgraded to Lucene 6.4.1, then you could just add your ES classes to > jena-text, right? I think that would be better for everyone than having to > maintain your own separate module. > > > > -Osma > > > > 01.03.2017, 16:59, anuj kumar kirjoitti: > >> I personally have no preference as to how the code in Jena should be > >> structured, as long as I am able to use it :). > >> I have personal preference of doing it in a specific way because IMO, > it is > >> modular which makes it much easier to maintain in the long run. But > again > >> it may not be the quickest one. > >> > >> I already have been given a deadline, by the company to have ES > extension > >> implemented in the next 15 days :). What this means is that I will be > >> maintaining the ES code extension to Jena Text at-least locally for a > >> coming period of time. I would be more than happy to contribute to Jena > >> community whatever is required to have a proper ElasticSearch > >> Implementation in place, whether within jena-text module or as a > separate > >> module. Till the time Lucene and Solr is not upgraded to the latest > >> version, I will have to maintain a separate module for jena-text-es. > >> > >> Cheers! > >> Anuj Kumar > >> > >> > >> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <[email protected]> wrote: > >> > >>> Osma-- > >>> > >>> The short answer is that yes, given the right tools you _can_ have > >>> different versions of code accessible in different ways. The longer > answer > >>> is that it's probably not a viable alternative for Jena for this > problem, > >>> at least not without a lot of other change. > >>> > >>> You are right to point to the classloader mechanism as being at the > heart > >>> of this question, but I must alter your remark just slightly. From "the > >>> Java classloader only sees a single, flat package/class namespace and > a set > >>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, > >>> flat package/class namespace and a set of compiled classes". > >>> > >>> This is the fact that OSGi uses to make it possible to maintain strict > >>> module boundaries (and even dynamic module relationships at run-time). > Each > >>> OSGi bundle sees its own classloader, and the framework is responsible > for > >>> connecting bundles up to ensure that every bundle has what it needs in > the > >>> way of types to function, based on metadata that the bundles provide > to the > >>> framework. It's an incredibly powerful system (I use it every day and > enjoy > >>> it enormously) but it's also very "heavy" and requires a good deal of > >>> investment to use. In particular, it's probably too large to put > _inside_ > >>> Jena. (I frequently put Jena inside an OSGi instance, on the other > hand.) > >>> > >>> Java 9 Jigsaw [1] offers some possibility for strong modularization of > >>> this kind, but it's really meant for the JDK itself, not application > >>> libraries. In theory, we could "roll our own" classloader management > for > >>> this problem. That sounds like more than a bit of a rabbit hole to me. > >>> There might be another, more lightweight, toolkit out there to this > >>> purpose, but I'm not aware of any myself. > >>> > >>> Otherwise, yes, you get into shading and the like. We have to do that > for > >>> Guava for now because of HADOOP-10101 (grumble grumble) but it's > hardly a > >>> thing we want to do any more of than needed, I don't think. > >>> > >>> --- > >>> A. Soroka > >>> The University of Virginia Library > >>> > >>> [1] http://openjdk.java.net/projects/jigsaw/ > >>> > >>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <[email protected]> > >>> wrote: > >>>> > >>>> Hi Anuj! > >>>> > >>>> Thanks for the clarification. > >>>> > >>>> However, I'm still not sure I understand the situation completely. I > >>> know Maven can perform a lot of tricks, but Maven modules are just > >>> convenient ways to structure a Java project. Maven cannot change the > fact > >>> that at runtime, module divisions don't really matter (except that they > >>> usually correspond to package sub-namespaces) and the Java classloader > only > >>> sees a single, flat package/class namespace and a set of compiled > classes > >>> (usually within JARs) in the classpath that it needs to check to find > the > >>> right classes, and if there are two versions of the same library (eg > >>> Lucene) with overlapping class names, that's going to cause trouble. > The > >>> only way around that is to shade some of the libraries, i.e. rename > them so > >>> that they end up in another, non-conflicting namespace. Apparently > >>> Elasticsearch also did some of that in the past [1] but nowadays tries > to > >>> avoid it. > >>>> > >>>> Does your assumption 1 ("At a given point in time, only a single > >>> Indexing Technology is used") imply that in the assembler > configuration, > >>> you cannot have ja:loadClass declarations for both Lucene and ES > backends? > >>> Or how do you run something like Fuseki that contains (in a single big > JAR) > >>> both the jena-text and jena-text-es modules with all their > dependencies, > >>> one of which requires the Lucene 4.x classes and the other one the > Lucene > >>> 6.4.1 classes? How do you ensure that only one of them is used at a > time, > >>> and that the Java classloader, even though it has access to both > versions > >>> of Lucene, only loads classes from the single, correct one and not the > >>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" > >>> packages, so that you don't end up with two Lucene versions within the > same > >>> Fuseki JAR? > >>>> > >>>> -Osma > >>>> > >>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade > >>>> > >>>> 01.03.2017, 11:03, anuj kumar kirjoitti: > >>>>> Hi Osma, > >>>>> > >>>>> I understand what you are saying. There are ways to mitigate risks > and > >>>>> balance the refactoring without affecting the existing modules. But I > >>> will > >>>>> not delve into those now. I am not an expert in Jena to convincingly > say > >>>>> that it is possible, without any hiccups. But I can take a guess and > say > >>>>> that it is indeed possible :) > >>>>> > >>>>> For the question: "is it even possible to mix modules that depend on > >>>>> different versions of the Lucene libraries within the same project?" > >>>>> > >>>>> I actually do not understand what you mean by mixing modules. I > assume > >>> you > >>>>> mean having jena-text and jena-text-es as dependencies in a build > >>> without > >>>>> causing the build to conflict. If that is what you mean than the > answer > >>> is > >>>>> yes it is possible and quite simple as well. Let me explain how it is > >>>>> possible. But before that some assumption which I want to call out > >>>>> explicitly. > >>>>> > >>>>> *Assumption:* > >>>>> 1. At a given point in time, only a single Indexing Technology is > used > >>> for > >>>>> text based indexing and searching via Jean. What this means is that > we > >>> will > >>>>> either use Lucene Implementation OR Solr Implementation OR ES > >>>>> Implementation at any given point in time. > >>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes > but > >>>>> only on jena-text classes, if at all. > >>>>> > >>>>> Based on these assumptions it is possible to create a build that > >>> contains > >>>>> jena-text based common classes + ES specific classes without any > >>>>> compatibility issues. And it is infact quite simple. I did it in the > >>>>> current jena-text-es module and ran the entire build which succeeded. > >>>>> The key is to include the latest Lucene dependencies at the very > >>> beginning > >>>>> in the pom and then include jena-text dependency. Maven will then > >>>>> automatically resolve the dependency issues by including the Lucene > >>>>> librarires that we included in our es specific pom. Have a look the > pom > >>> of > >>>>> jena-text-es module here to see how it can be done : > >>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Anuj Kumar > >>>>> > >>>>> > >>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < > >>> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hi Anuj, > >>>>>> > >>>>>> I understand your concerns. However, we also need to balance between > >>> the > >>>>>> needs of individual modules/features and the whole codebase. I'm > >>> willing to > >>>>>> put in the effort to keep the other modules up to date with newer > >>> Lucene > >>>>>> versions. Lucene upgrade requirements are well documented, the only > >>> hitches > >>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene > >>>>>> features that were dropped from newer versions. > >>>>>> > >>>>>> A perhaps stupid question to more experienced Java developers: is it > >>> even > >>>>>> possible to mix modules that depend on different versions of the > Lucene > >>>>>> libraries within the same project? In my (quite limited) > understanding > >>> of > >>>>>> Java projects and libraries, this requires special arrangements > (e.g. > >>>>>> shading) as the Java package/class namespace is shared by all the > code > >>>>>> running within the same JVM. > >>>>>> > >>>>>> So can you create, say, a Fuseki build that contains the current > >>> jena-text > >>>>>> module (depending on Lucene 4.x) and the new jena-text-es module > >>> (depending > >>>>>> on Lucene 6.4.1) without any compatibility issues? > >>>>>> > >>>>>> -Osma > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> My 2 Cents : > >>>>>>> > >>>>>>> The reason I proposed to have separate modules for Lucene, Solr and > >>> ES is > >>>>>>> exactly for avoiding the "All or Nothing" approach we need to take > if > >>> we > >>>>>>> club them all together. If they stay together and if in the near > >>> future I > >>>>>>> want to upgrade ES to another version, I also need to again upgrade > >>> Lucene > >>>>>>> and Solr and possibly another implementation that may have been > added > >>>>>>> during the time. As we all know, this means weeks of work if not > >>> months to > >>>>>>> get the changes released. This will personally de-motivate me to do > >>>>>>> anything and I will probably start maintaining my version of > >>> Jena-Text as > >>>>>>> that would be much simpler to do than to upgrade and test and in > the > >>>>>>> process own(read fix bugs) the upgrade for each and every > technology. > >>>>>>> > >>>>>>> If they are developed as separate modules, they can evolve > >>> independently > >>>>>>> of > >>>>>>> each other and we can avoid situations where we cant upgrade to > latest > >>>>>>> version of Lucene because we do not know what effect it will have > on > >>> Solr > >>>>>>> Implementation. > >>>>>>> > >>>>>>> We can start with having a separate Module for Jena Text ES and see > >>> how > >>>>>>> things go. If they go well, we could extract out Solr and Lucene > out > >>> of > >>>>>>> Jena Text. > >>>>>>> > >>>>>>> Again this is just a suggestion based on my limited industry > >>> experience. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Anuj Kumar > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < > >>> [email protected] > >>>>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: > >>>>>>>> > >>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc > >>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena. > apache.org > >>> %3E > >>>>>>>>> ? In other words, might it be better to factor out between -text > and > >>>>>>>>> -spatial and _then_ try to upgrade the Lucene version? > >>>>>>>>> > >>>>>>>>> > >>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer > >>> to do > >>>>>>>> the actual work! > >>>>>>>> > >>>>>>>> I don't use the Solr component now, but I could easily see so > >>> doing... > >>>>>>>> > >>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any > >>> work to > >>>>>>>>> maintain it, so consider that just a very small and blurry data > >>> point. > >>>>>>>>> :) > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out > how > >>> to > >>>>>>>> get > >>>>>>>> it running... If you could just try that with some toy data, then > >>> your > >>>>>>>> data > >>>>>>>> point would be a lot less blurry :) I haven't used Solr for > >>> anything, so > >>>>>>>> I'm not very familiar with how to set it up, and the jena-text > >>>>>>>> instructions > >>>>>>>> are pretty vague unfortunately. > >>>>>>>> > >>>>>>>> > >>>>>>>> -Osma > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Osma Suominen > >>>>>>>> D.Sc. (Tech), Information Systems Specialist > >>>>>>>> National Library of Finland > >>>>>>>> P.O. Box 26 (Kaikukatu 4) > >>>>>>>> 00014 HELSINGIN YLIOPISTO > >>>>>>>> Tel. +358 50 3199529 > >>>>>>>> [email protected] > >>>>>>>> http://www.nationallibrary.fi > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> Osma Suominen > >>>>>> D.Sc. (Tech), Information Systems Specialist > >>>>>> National Library of Finland > >>>>>> P.O. Box 26 (Kaikukatu 4) > >>>>>> 00014 HELSINGIN YLIOPISTO > >>>>>> Tel. +358 50 3199529 > >>>>>> [email protected] > >>>>>> http://www.nationallibrary.fi > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Osma Suominen > >>>> D.Sc. (Tech), Information Systems Specialist > >>>> National Library of Finland > >>>> P.O. Box 26 (Kaikukatu 4) > >>>> 00014 HELSINGIN YLIOPISTO > >>>> Tel. +358 50 3199529 > >>>> [email protected] > >>>> http://www.nationallibrary.fi > >>> > >>> > >> > >> > > > > > > -- > > Osma Suominen > > D.Sc. (Tech), Information Systems Specialist > > National Library of Finland > > P.O. Box 26 (Kaikukatu 4) > > 00014 HELSINGIN YLIOPISTO > > Tel. +358 50 3199529 > > [email protected] > > http://www.nationallibrary.fi > >
