I agree Osma. If Lucent is upgraded to 6.4.1 it would be much easier for me to integrate the Elastic Search implementation.
But I am still waiting for someone to provide me a hint as to how I can index multiple predicate values. This is the most pressing issue for me currently. Thanks, Anuj Kumar On 1 Mar 2017 19:27, "Osma Suominen" <[email protected]> wrote: > Hi Anuj! > > I have nothing against modularity in general. However, I cannot see how > your proposal could work in practice for the Fuseki build, due to the > reasons I mentioned in my previous message (and Adam seemed to concur). > > In any case, I'll see what I can do to get the Lucene upgrade moving > again. If all current Jena modules (ie jena-text and jena-spatial) were > upgraded to Lucene 6.4.1, then you could just add your ES classes to > jena-text, right? I think that would be better for everyone than having to > maintain your own separate module. > > -Osma > > 01.03.2017, 16:59, anuj kumar kirjoitti: > >> I personally have no preference as to how the code in Jena should be >> structured, as long as I am able to use it :). >> I have personal preference of doing it in a specific way because IMO, it >> is >> modular which makes it much easier to maintain in the long run. But again >> it may not be the quickest one. >> >> I already have been given a deadline, by the company to have ES extension >> implemented in the next 15 days :). What this means is that I will be >> maintaining the ES code extension to Jena Text at-least locally for a >> coming period of time. I would be more than happy to contribute to Jena >> community whatever is required to have a proper ElasticSearch >> Implementation in place, whether within jena-text module or as a separate >> module. Till the time Lucene and Solr is not upgraded to the latest >> version, I will have to maintain a separate module for jena-text-es. >> >> Cheers! >> Anuj Kumar >> >> >> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <[email protected]> wrote: >> >> Osma-- >>> >>> The short answer is that yes, given the right tools you _can_ have >>> different versions of code accessible in different ways. The longer >>> answer >>> is that it's probably not a viable alternative for Jena for this problem, >>> at least not without a lot of other change. >>> >>> You are right to point to the classloader mechanism as being at the heart >>> of this question, but I must alter your remark just slightly. From "the >>> Java classloader only sees a single, flat package/class namespace and a >>> set >>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, >>> flat package/class namespace and a set of compiled classes". >>> >>> This is the fact that OSGi uses to make it possible to maintain strict >>> module boundaries (and even dynamic module relationships at run-time). >>> Each >>> OSGi bundle sees its own classloader, and the framework is responsible >>> for >>> connecting bundles up to ensure that every bundle has what it needs in >>> the >>> way of types to function, based on metadata that the bundles provide to >>> the >>> framework. It's an incredibly powerful system (I use it every day and >>> enjoy >>> it enormously) but it's also very "heavy" and requires a good deal of >>> investment to use. In particular, it's probably too large to put _inside_ >>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.) >>> >>> Java 9 Jigsaw [1] offers some possibility for strong modularization of >>> this kind, but it's really meant for the JDK itself, not application >>> libraries. In theory, we could "roll our own" classloader management for >>> this problem. That sounds like more than a bit of a rabbit hole to me. >>> There might be another, more lightweight, toolkit out there to this >>> purpose, but I'm not aware of any myself. >>> >>> Otherwise, yes, you get into shading and the like. We have to do that for >>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a >>> thing we want to do any more of than needed, I don't think. >>> >>> --- >>> A. Soroka >>> The University of Virginia Library >>> >>> [1] http://openjdk.java.net/projects/jigsaw/ >>> >>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <[email protected]> >>>> >>> wrote: >>> >>>> >>>> Hi Anuj! >>>> >>>> Thanks for the clarification. >>>> >>>> However, I'm still not sure I understand the situation completely. I >>>> >>> know Maven can perform a lot of tricks, but Maven modules are just >>> convenient ways to structure a Java project. Maven cannot change the fact >>> that at runtime, module divisions don't really matter (except that they >>> usually correspond to package sub-namespaces) and the Java classloader >>> only >>> sees a single, flat package/class namespace and a set of compiled classes >>> (usually within JARs) in the classpath that it needs to check to find the >>> right classes, and if there are two versions of the same library (eg >>> Lucene) with overlapping class names, that's going to cause trouble. The >>> only way around that is to shade some of the libraries, i.e. rename them >>> so >>> that they end up in another, non-conflicting namespace. Apparently >>> Elasticsearch also did some of that in the past [1] but nowadays tries to >>> avoid it. >>> >>>> >>>> Does your assumption 1 ("At a given point in time, only a single >>>> >>> Indexing Technology is used") imply that in the assembler configuration, >>> you cannot have ja:loadClass declarations for both Lucene and ES >>> backends? >>> Or how do you run something like Fuseki that contains (in a single big >>> JAR) >>> both the jena-text and jena-text-es modules with all their dependencies, >>> one of which requires the Lucene 4.x classes and the other one the Lucene >>> 6.4.1 classes? How do you ensure that only one of them is used at a time, >>> and that the Java classloader, even though it has access to both versions >>> of Lucene, only loads classes from the single, correct one and not the >>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" >>> packages, so that you don't end up with two Lucene versions within the >>> same >>> Fuseki JAR? >>> >>>> >>>> -Osma >>>> >>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade >>>> >>>> 01.03.2017, 11:03, anuj kumar kirjoitti: >>>> >>>>> Hi Osma, >>>>> >>>>> I understand what you are saying. There are ways to mitigate risks and >>>>> balance the refactoring without affecting the existing modules. But I >>>>> >>>> will >>> >>>> not delve into those now. I am not an expert in Jena to convincingly say >>>>> that it is possible, without any hiccups. But I can take a guess and >>>>> say >>>>> that it is indeed possible :) >>>>> >>>>> For the question: "is it even possible to mix modules that depend on >>>>> different versions of the Lucene libraries within the same project?" >>>>> >>>>> I actually do not understand what you mean by mixing modules. I assume >>>>> >>>> you >>> >>>> mean having jena-text and jena-text-es as dependencies in a build >>>>> >>>> without >>> >>>> causing the build to conflict. If that is what you mean than the answer >>>>> >>>> is >>> >>>> yes it is possible and quite simple as well. Let me explain how it is >>>>> possible. But before that some assumption which I want to call out >>>>> explicitly. >>>>> >>>>> *Assumption:* >>>>> 1. At a given point in time, only a single Indexing Technology is used >>>>> >>>> for >>> >>>> text based indexing and searching via Jean. What this means is that we >>>>> >>>> will >>> >>>> either use Lucene Implementation OR Solr Implementation OR ES >>>>> Implementation at any given point in time. >>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes >>>>> but >>>>> only on jena-text classes, if at all. >>>>> >>>>> Based on these assumptions it is possible to create a build that >>>>> >>>> contains >>> >>>> jena-text based common classes + ES specific classes without any >>>>> compatibility issues. And it is infact quite simple. I did it in the >>>>> current jena-text-es module and ran the entire build which succeeded. >>>>> The key is to include the latest Lucene dependencies at the very >>>>> >>>> beginning >>> >>>> in the pom and then include jena-text dependency. Maven will then >>>>> automatically resolve the dependency issues by including the Lucene >>>>> librarires that we included in our es specific pom. Have a look the pom >>>>> >>>> of >>> >>>> jena-text-es module here to see how it can be done : >>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml >>>>> >>>>> >>>>> Thanks, >>>>> Anuj Kumar >>>>> >>>>> >>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < >>>>> >>>> [email protected]> >>> >>>> wrote: >>>>> >>>>> Hi Anuj, >>>>>> >>>>>> I understand your concerns. However, we also need to balance between >>>>>> >>>>> the >>> >>>> needs of individual modules/features and the whole codebase. I'm >>>>>> >>>>> willing to >>> >>>> put in the effort to keep the other modules up to date with newer >>>>>> >>>>> Lucene >>> >>>> versions. Lucene upgrade requirements are well documented, the only >>>>>> >>>>> hitches >>> >>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene >>>>>> features that were dropped from newer versions. >>>>>> >>>>>> A perhaps stupid question to more experienced Java developers: is it >>>>>> >>>>> even >>> >>>> possible to mix modules that depend on different versions of the Lucene >>>>>> libraries within the same project? In my (quite limited) understanding >>>>>> >>>>> of >>> >>>> Java projects and libraries, this requires special arrangements (e.g. >>>>>> shading) as the Java package/class namespace is shared by all the code >>>>>> running within the same JVM. >>>>>> >>>>>> So can you create, say, a Fuseki build that contains the current >>>>>> >>>>> jena-text >>> >>>> module (depending on Lucene 4.x) and the new jena-text-es module >>>>>> >>>>> (depending >>> >>>> on Lucene 6.4.1) without any compatibility issues? >>>>>> >>>>>> -Osma >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> My 2 Cents : >>>>>>> >>>>>>> The reason I proposed to have separate modules for Lucene, Solr and >>>>>>> >>>>>> ES is >>> >>>> exactly for avoiding the "All or Nothing" approach we need to take if >>>>>>> >>>>>> we >>> >>>> club them all together. If they stay together and if in the near >>>>>>> >>>>>> future I >>> >>>> want to upgrade ES to another version, I also need to again upgrade >>>>>>> >>>>>> Lucene >>> >>>> and Solr and possibly another implementation that may have been added >>>>>>> during the time. As we all know, this means weeks of work if not >>>>>>> >>>>>> months to >>> >>>> get the changes released. This will personally de-motivate me to do >>>>>>> anything and I will probably start maintaining my version of >>>>>>> >>>>>> Jena-Text as >>> >>>> that would be much simpler to do than to upgrade and test and in the >>>>>>> process own(read fix bugs) the upgrade for each and every technology. >>>>>>> >>>>>>> If they are developed as separate modules, they can evolve >>>>>>> >>>>>> independently >>> >>>> of >>>>>>> each other and we can avoid situations where we cant upgrade to >>>>>>> latest >>>>>>> version of Lucene because we do not know what effect it will have on >>>>>>> >>>>>> Solr >>> >>>> Implementation. >>>>>>> >>>>>>> We can start with having a separate Module for Jena Text ES and see >>>>>>> >>>>>> how >>> >>>> things go. If they go well, we could extract out Solr and Lucene out >>>>>>> >>>>>> of >>> >>>> Jena Text. >>>>>>> >>>>>>> Again this is just a suggestion based on my limited industry >>>>>>> >>>>>> experience. >>> >>>> >>>>>>> Thanks, >>>>>>> Anuj Kumar >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < >>>>>>> >>>>>> [email protected] >>> >>>> >>>>>>>> wrote: >>>>>>> >>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: >>>>>>> >>>>>>>> >>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc >>>>>>>> >>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa >>>>>>>>> che.org >>>>>>>>> >>>>>>>> %3E >>> >>>> ? In other words, might it be better to factor out between -text and >>>>>>>>> -spatial and _then_ try to upgrade the Lucene version? >>>>>>>>> >>>>>>>>> >>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer >>>>>>>> >>>>>>> to do >>> >>>> the actual work! >>>>>>>> >>>>>>>> I don't use the Solr component now, but I could easily see so >>>>>>>> >>>>>>> doing... >>> >>>> >>>>>>>> that's pretty vague, I know, and I'm not in a position to do any >>>>>>>>> >>>>>>>> work to >>> >>>> maintain it, so consider that just a very small and blurry data >>>>>>>>> >>>>>>>> point. >>> >>>> :) >>>>>>>>> >>>>>>>>> >>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out how >>>>>>>> >>>>>>> to >>> >>>> get >>>>>>>> it running... If you could just try that with some toy data, then >>>>>>>> >>>>>>> your >>> >>>> data >>>>>>>> point would be a lot less blurry :) I haven't used Solr for >>>>>>>> >>>>>>> anything, so >>> >>>> I'm not very familiar with how to set it up, and the jena-text >>>>>>>> instructions >>>>>>>> are pretty vague unfortunately. >>>>>>>> >>>>>>>> >>>>>>>> -Osma >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Osma Suominen >>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>> National Library of Finland >>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>> Tel. +358 50 3199529 >>>>>>>> [email protected] >>>>>>>> http://www.nationallibrary.fi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Osma Suominen >>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>> National Library of Finland >>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>> 00014 HELSINGIN YLIOPISTO >>>>>> Tel. +358 50 3199529 >>>>>> [email protected] >>>>>> http://www.nationallibrary.fi >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Osma Suominen >>>> D.Sc. (Tech), Information Systems Specialist >>>> National Library of Finland >>>> P.O. Box 26 (Kaikukatu 4) >>>> 00014 HELSINGIN YLIOPISTO >>>> Tel. +358 50 3199529 >>>> [email protected] >>>> http://www.nationallibrary.fi >>>> >>> >>> >>> >> >> > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > [email protected] > http://www.nationallibrary.fi >
