I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time.  At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.

Thanks,
Anuj Kumar

On 2 Mar 2017 22:28, "A. Soroka" <[email protected]> wrote:

> I do agree that trying to juggle different versions of Lucene libraries is
> probably not a realistic option right now. Luckily (if I understand the
> conversation thus far correctly) we have a solid alternative; getting our
> current Lucene dependency upgraded should allow us to (eventually) merge
> Anuj's work into the mainstream of development. Someone please tell me if I
> have that wrong! :grin:
>
> Let me reiterate that this seems like very good work and speaking for
> myself, I certainly want to get it included into Jena. It's just a question
> of fitting it in correctly, which might take a bit of time.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Mar 1, 2017, at 1:27 PM, Osma Suominen <[email protected]>
> wrote:
> >
> > Hi Anuj!
> >
> > I have nothing against modularity in general. However, I cannot see how
> your proposal could work in practice for the Fuseki build, due to the
> reasons I mentioned in my previous message (and Adam seemed to concur).
> >
> > In any case, I'll see what I can do to get the Lucene upgrade moving
> again. If all current Jena modules (ie jena-text and jena-spatial) were
> upgraded to Lucene 6.4.1, then you could just add your ES classes to
> jena-text, right? I think that would be better for everyone than having to
> maintain your own separate module.
> >
> > -Osma
> >
> > 01.03.2017, 16:59, anuj kumar kirjoitti:
> >> I personally have no preference as to how the code in Jena should be
> >> structured, as long as I am able to use it :).
> >> I have personal preference of doing it in a specific way because IMO,
> it is
> >> modular which makes it much easier to maintain in the long run. But
> again
> >> it may not be the quickest one.
> >>
> >> I already have been given a deadline, by the company to have ES
> extension
> >> implemented in the next 15 days :). What this means is that I will be
> >> maintaining the ES code extension to Jena Text at-least locally for a
> >> coming period of time. I would be more than happy to contribute to Jena
> >> community whatever is required to have a proper ElasticSearch
> >> Implementation in place, whether within jena-text module or as a
> separate
> >> module. Till the time Lucene and Solr is not upgraded to the latest
> >> version, I will have to maintain a separate module for jena-text-es.
> >>
> >> Cheers!
> >> Anuj Kumar
> >>
> >>
> >> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <[email protected]> wrote:
> >>
> >>> Osma--
> >>>
> >>> The short answer is that yes, given the right tools you _can_ have
> >>> different versions of code accessible in different ways. The longer
> answer
> >>> is that it's probably not a viable alternative for Jena for this
> problem,
> >>> at least not without a lot of other change.
> >>>
> >>> You are right to point to the classloader mechanism as being at the
> heart
> >>> of this question, but I must alter your remark just slightly. From "the
> >>> Java classloader only sees a single, flat package/class namespace and
> a set
> >>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
> >>> flat package/class namespace and a set of compiled classes".
> >>>
> >>> This is the fact that OSGi uses to make it possible to maintain strict
> >>> module boundaries (and even dynamic module relationships at run-time).
> Each
> >>> OSGi bundle sees its own classloader, and the framework is responsible
> for
> >>> connecting bundles up to ensure that every bundle has what it needs in
> the
> >>> way of types to function, based on metadata that the bundles provide
> to the
> >>> framework. It's an incredibly powerful system (I use it every day and
> enjoy
> >>> it enormously) but it's also very "heavy" and requires a good deal of
> >>> investment to use. In particular, it's probably too large to put
> _inside_
> >>> Jena. (I frequently put Jena inside an OSGi instance, on the other
> hand.)
> >>>
> >>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
> >>> this kind, but it's really meant for the JDK itself, not application
> >>> libraries. In theory, we could "roll our own" classloader management
> for
> >>> this problem. That sounds like more than a bit of a rabbit hole to me.
> >>> There might be another, more lightweight, toolkit out there to this
> >>> purpose, but I'm not aware of any myself.
> >>>
> >>> Otherwise, yes, you get into shading and the like. We have to do that
> for
> >>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
> hardly a
> >>> thing we want to do any more of than needed, I don't think.
> >>>
> >>> ---
> >>> A. Soroka
> >>> The University of Virginia Library
> >>>
> >>> [1] http://openjdk.java.net/projects/jigsaw/
> >>>
> >>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <[email protected]>
> >>> wrote:
> >>>>
> >>>> Hi Anuj!
> >>>>
> >>>> Thanks for the clarification.
> >>>>
> >>>> However, I'm still not sure I understand the situation completely. I
> >>> know Maven can perform a lot of tricks, but Maven modules are just
> >>> convenient ways to structure a Java project. Maven cannot change the
> fact
> >>> that at runtime, module divisions don't really matter (except that they
> >>> usually correspond to package sub-namespaces) and the Java classloader
> only
> >>> sees a single, flat package/class namespace and a set of compiled
> classes
> >>> (usually within JARs) in the classpath that it needs to check to find
> the
> >>> right classes, and if there are two versions of the same library (eg
> >>> Lucene) with overlapping class names, that's going to cause trouble.
> The
> >>> only way around that is to shade some of the libraries, i.e. rename
> them so
> >>> that they end up in another, non-conflicting namespace. Apparently
> >>> Elasticsearch also did some of that in the past [1] but nowadays tries
> to
> >>> avoid it.
> >>>>
> >>>> Does your assumption 1 ("At a given point in time, only a single
> >>> Indexing Technology is used") imply that in the assembler
> configuration,
> >>> you cannot have ja:loadClass declarations for both Lucene and ES
> backends?
> >>> Or how do you run something like Fuseki that contains (in a single big
> JAR)
> >>> both the jena-text and jena-text-es modules with all their
> dependencies,
> >>> one of which requires the Lucene 4.x classes and the other one the
> Lucene
> >>> 6.4.1 classes? How do you ensure that only one of them is used at a
> time,
> >>> and that the Java classloader, even though it has access to both
> versions
> >>> of Lucene, only loads classes from the single, correct one and not the
> >>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
> >>> packages, so that you don't end up with two Lucene versions within the
> same
> >>> Fuseki JAR?
> >>>>
> >>>> -Osma
> >>>>
> >>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
> >>>>
> >>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
> >>>>> Hi Osma,
> >>>>>
> >>>>> I understand what you are saying. There are ways to mitigate risks
> and
> >>>>> balance the refactoring without affecting the existing modules. But I
> >>> will
> >>>>> not delve into those now. I am not an expert in Jena to convincingly
> say
> >>>>> that it is possible, without any hiccups. But I can take a guess and
> say
> >>>>> that it is indeed possible :)
> >>>>>
> >>>>> For the question: "is it even possible to mix modules that depend on
> >>>>> different versions of the Lucene libraries within the same project?"
> >>>>>
> >>>>> I actually do not understand what you mean by mixing modules. I
> assume
> >>> you
> >>>>> mean having jena-text and jena-text-es as dependencies in a build
> >>> without
> >>>>> causing the build to conflict. If that is what you mean than the
> answer
> >>> is
> >>>>> yes it is possible and quite simple as well. Let me explain how it is
> >>>>> possible. But before that some assumption which I want to call out
> >>>>> explicitly.
> >>>>>
> >>>>> *Assumption:*
> >>>>> 1. At a given point in time, only a single Indexing Technology is
> used
> >>> for
> >>>>> text based indexing and searching via Jean. What this means is that
> we
> >>> will
> >>>>> either use Lucene Implementation OR Solr Implementation OR ES
> >>>>> Implementation at any given point in time.
> >>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
> but
> >>>>> only on jena-text classes, if at all.
> >>>>>
> >>>>> Based on these assumptions it is possible to create a build that
> >>> contains
> >>>>> jena-text based common classes + ES specific classes without any
> >>>>> compatibility issues. And it is infact quite simple. I did it in the
> >>>>> current jena-text-es module and ran the entire build which succeeded.
> >>>>> The key is to include the latest Lucene dependencies at the very
> >>> beginning
> >>>>> in the pom and then include jena-text dependency. Maven will then
> >>>>> automatically resolve the dependency issues by including the Lucene
> >>>>> librarires that we included in our es specific pom. Have a look the
> pom
> >>> of
> >>>>> jena-text-es module here to see how it can be done :
> >>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Anuj Kumar
> >>>>>
> >>>>>
> >>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
> >>> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Anuj,
> >>>>>>
> >>>>>> I understand your concerns. However, we also need to balance between
> >>> the
> >>>>>> needs of individual modules/features and the whole codebase. I'm
> >>> willing to
> >>>>>> put in the effort to keep the other modules up to date with newer
> >>> Lucene
> >>>>>> versions. Lucene upgrade requirements are well documented, the only
> >>> hitches
> >>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
> >>>>>> features that were dropped from newer versions.
> >>>>>>
> >>>>>> A perhaps stupid question to more experienced Java developers: is it
> >>> even
> >>>>>> possible to mix modules that depend on different versions of the
> Lucene
> >>>>>> libraries within the same project? In my (quite limited)
> understanding
> >>> of
> >>>>>> Java projects and libraries, this requires special arrangements
> (e.g.
> >>>>>> shading) as the Java package/class namespace is shared by all the
> code
> >>>>>> running within the same JVM.
> >>>>>>
> >>>>>> So can you create, say, a Fuseki build that contains the current
> >>> jena-text
> >>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
> >>> (depending
> >>>>>> on Lucene 6.4.1) without any compatibility issues?
> >>>>>>
> >>>>>> -Osma
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> My 2 Cents :
> >>>>>>>
> >>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
> >>> ES is
> >>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
> if
> >>> we
> >>>>>>> club them all together. If they stay together and if in the near
> >>> future I
> >>>>>>> want to upgrade ES to another version, I also need to again upgrade
> >>> Lucene
> >>>>>>> and Solr and possibly another implementation that may have been
> added
> >>>>>>> during the time. As we all know, this means weeks of work if not
> >>> months to
> >>>>>>> get the changes released. This will personally de-motivate me to do
> >>>>>>> anything and I will probably start maintaining my version of
> >>> Jena-Text as
> >>>>>>> that would be much simpler to do than to upgrade and test and in
> the
> >>>>>>> process own(read fix bugs) the upgrade for each and every
> technology.
> >>>>>>>
> >>>>>>> If they are developed as separate modules, they can evolve
> >>> independently
> >>>>>>> of
> >>>>>>> each other and we can avoid situations where we cant upgrade to
> latest
> >>>>>>> version of Lucene because we do not know what effect it will have
> on
> >>> Solr
> >>>>>>> Implementation.
> >>>>>>>
> >>>>>>> We can start with having a separate Module for Jena Text ES and see
> >>> how
> >>>>>>> things go. If they go well, we could extract out Solr and Lucene
> out
> >>> of
> >>>>>>> Jena Text.
> >>>>>>>
> >>>>>>> Again this is just a suggestion based on my limited industry
> >>> experience.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Anuj Kumar
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
> >>> [email protected]
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
> >>>>>>>>
> >>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
> >>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
> apache.org
> >>> %3E
> >>>>>>>>> ? In other words, might it be better to factor out between -text
> and
> >>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
> >>> to do
> >>>>>>>> the actual work!
> >>>>>>>>
> >>>>>>>> I don't use the Solr component now, but I could easily see so
> >>> doing...
> >>>>>>>>
> >>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
> >>> work to
> >>>>>>>>> maintain it, so consider that just a very small and blurry data
> >>> point.
> >>>>>>>>> :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
> how
> >>> to
> >>>>>>>> get
> >>>>>>>> it running... If you could just try that with some toy data, then
> >>> your
> >>>>>>>> data
> >>>>>>>> point would be a lot less blurry :) I haven't used Solr for
> >>> anything, so
> >>>>>>>> I'm not very familiar with how to set it up, and the jena-text
> >>>>>>>> instructions
> >>>>>>>> are pretty vague unfortunately.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Osma
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Osma Suominen
> >>>>>>>> D.Sc. (Tech), Information Systems Specialist
> >>>>>>>> National Library of Finland
> >>>>>>>> P.O. Box 26 (Kaikukatu 4)
> >>>>>>>> 00014 HELSINGIN YLIOPISTO
> >>>>>>>> Tel. +358 50 3199529
> >>>>>>>> [email protected]
> >>>>>>>> http://www.nationallibrary.fi
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Osma Suominen
> >>>>>> D.Sc. (Tech), Information Systems Specialist
> >>>>>> National Library of Finland
> >>>>>> P.O. Box 26 (Kaikukatu 4)
> >>>>>> 00014 HELSINGIN YLIOPISTO
> >>>>>> Tel. +358 50 3199529
> >>>>>> [email protected]
> >>>>>> http://www.nationallibrary.fi
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Osma Suominen
> >>>> D.Sc. (Tech), Information Systems Specialist
> >>>> National Library of Finland
> >>>> P.O. Box 26 (Kaikukatu 4)
> >>>> 00014 HELSINGIN YLIOPISTO
> >>>> Tel. +358 50 3199529
> >>>> [email protected]
> >>>> http://www.nationallibrary.fi
> >>>
> >>>
> >>
> >>
> >
> >
> > --
> > Osma Suominen
> > D.Sc. (Tech), Information Systems Specialist
> > National Library of Finland
> > P.O. Box 26 (Kaikukatu 4)
> > 00014 HELSINGIN YLIOPISTO
> > Tel. +358 50 3199529
> > [email protected]
> > http://www.nationallibrary.fi
>
>

Reply via email to