I Osma, I briefly looked at the pull request. I beieve we need to upgrade Lucene and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene 4.9.1
Also how do i log into issues.apache.org and where to file this bug? Thanks, Anuj Kumar On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <osma.suomi...@helsinki.fi> wrote: > Hi Anuj, > > It's great that we found agreement over this! > > I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and > made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4 > as an intermediate step). I'll wait for comments on the PR and if people > think it's OK I will merge it soon to Jena master. Meanwhile, you can > already base your ES implementation on that branch [2] if you like. > > Could you please open a JIRA issue on issues.apache.org explaining the > Elasticsearch support feature, so that we have a place for tracking this > work, request comments etc. > > Also I suggest we move the discussion around this to the developers' list ( > d...@jena.apache.org) where it's more appropriate. > > -Osma > > [1] https://github.com/apache/jena/pull/219 > > [2] https://github.com/osma/jena/tree/jena-1250-lucene6 > > > 03.03.2017, 02:45, anuj kumar kirjoitti: > >> I second that. I am now finalising the integration of ES and should have a >> good production quality implementation ready in a week's time. At that >> time I would want you guys to have a look at the implementation and >> provide >> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the >> code in jena-text module and do a round of testing. >> >> Thanks, >> Anuj Kumar >> >> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote: >> >> I do agree that trying to juggle different versions of Lucene libraries is >>> probably not a realistic option right now. Luckily (if I understand the >>> conversation thus far correctly) we have a solid alternative; getting our >>> current Lucene dependency upgraded should allow us to (eventually) merge >>> Anuj's work into the mainstream of development. Someone please tell me >>> if I >>> have that wrong! :grin: >>> >>> Let me reiterate that this seems like very good work and speaking for >>> myself, I certainly want to get it included into Jena. It's just a >>> question >>> of fitting it in correctly, which might take a bit of time. >>> >>> --- >>> A. Soroka >>> The University of Virginia Library >>> >>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suomi...@helsinki.fi> >>>> >>> wrote: >>> >>>> >>>> Hi Anuj! >>>> >>>> I have nothing against modularity in general. However, I cannot see how >>>> >>> your proposal could work in practice for the Fuseki build, due to the >>> reasons I mentioned in my previous message (and Adam seemed to concur). >>> >>>> >>>> In any case, I'll see what I can do to get the Lucene upgrade moving >>>> >>> again. If all current Jena modules (ie jena-text and jena-spatial) were >>> upgraded to Lucene 6.4.1, then you could just add your ES classes to >>> jena-text, right? I think that would be better for everyone than having >>> to >>> maintain your own separate module. >>> >>>> >>>> -Osma >>>> >>>> 01.03.2017, 16:59, anuj kumar kirjoitti: >>>> >>>>> I personally have no preference as to how the code in Jena should be >>>>> structured, as long as I am able to use it :). >>>>> I have personal preference of doing it in a specific way because IMO, >>>>> >>>> it is >>> >>>> modular which makes it much easier to maintain in the long run. But >>>>> >>>> again >>> >>>> it may not be the quickest one. >>>>> >>>>> I already have been given a deadline, by the company to have ES >>>>> >>>> extension >>> >>>> implemented in the next 15 days :). What this means is that I will be >>>>> maintaining the ES code extension to Jena Text at-least locally for a >>>>> coming period of time. I would be more than happy to contribute to Jena >>>>> community whatever is required to have a proper ElasticSearch >>>>> Implementation in place, whether within jena-text module or as a >>>>> >>>> separate >>> >>>> module. Till the time Lucene and Solr is not upgraded to the latest >>>>> version, I will have to maintain a separate module for jena-text-es. >>>>> >>>>> Cheers! >>>>> Anuj Kumar >>>>> >>>>> >>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote: >>>>> >>>>> Osma-- >>>>>> >>>>>> The short answer is that yes, given the right tools you _can_ have >>>>>> different versions of code accessible in different ways. The longer >>>>>> >>>>> answer >>> >>>> is that it's probably not a viable alternative for Jena for this >>>>>> >>>>> problem, >>> >>>> at least not without a lot of other change. >>>>>> >>>>>> You are right to point to the classloader mechanism as being at the >>>>>> >>>>> heart >>> >>>> of this question, but I must alter your remark just slightly. From "the >>>>>> Java classloader only sees a single, flat package/class namespace and >>>>>> >>>>> a set >>> >>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, >>>>>> flat package/class namespace and a set of compiled classes". >>>>>> >>>>>> This is the fact that OSGi uses to make it possible to maintain strict >>>>>> module boundaries (and even dynamic module relationships at run-time). >>>>>> >>>>> Each >>> >>>> OSGi bundle sees its own classloader, and the framework is responsible >>>>>> >>>>> for >>> >>>> connecting bundles up to ensure that every bundle has what it needs in >>>>>> >>>>> the >>> >>>> way of types to function, based on metadata that the bundles provide >>>>>> >>>>> to the >>> >>>> framework. It's an incredibly powerful system (I use it every day and >>>>>> >>>>> enjoy >>> >>>> it enormously) but it's also very "heavy" and requires a good deal of >>>>>> investment to use. In particular, it's probably too large to put >>>>>> >>>>> _inside_ >>> >>>> Jena. (I frequently put Jena inside an OSGi instance, on the other >>>>>> >>>>> hand.) >>> >>>> >>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of >>>>>> this kind, but it's really meant for the JDK itself, not application >>>>>> libraries. In theory, we could "roll our own" classloader management >>>>>> >>>>> for >>> >>>> this problem. That sounds like more than a bit of a rabbit hole to me. >>>>>> There might be another, more lightweight, toolkit out there to this >>>>>> purpose, but I'm not aware of any myself. >>>>>> >>>>>> Otherwise, yes, you get into shading and the like. We have to do that >>>>>> >>>>> for >>> >>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's >>>>>> >>>>> hardly a >>> >>>> thing we want to do any more of than needed, I don't think. >>>>>> >>>>>> --- >>>>>> A. Soroka >>>>>> The University of Virginia Library >>>>>> >>>>>> [1] http://openjdk.java.net/projects/jigsaw/ >>>>>> >>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Hi Anuj! >>>>>>> >>>>>>> Thanks for the clarification. >>>>>>> >>>>>>> However, I'm still not sure I understand the situation completely. I >>>>>>> >>>>>> know Maven can perform a lot of tricks, but Maven modules are just >>>>>> convenient ways to structure a Java project. Maven cannot change the >>>>>> >>>>> fact >>> >>>> that at runtime, module divisions don't really matter (except that they >>>>>> usually correspond to package sub-namespaces) and the Java classloader >>>>>> >>>>> only >>> >>>> sees a single, flat package/class namespace and a set of compiled >>>>>> >>>>> classes >>> >>>> (usually within JARs) in the classpath that it needs to check to find >>>>>> >>>>> the >>> >>>> right classes, and if there are two versions of the same library (eg >>>>>> Lucene) with overlapping class names, that's going to cause trouble. >>>>>> >>>>> The >>> >>>> only way around that is to shade some of the libraries, i.e. rename >>>>>> >>>>> them so >>> >>>> that they end up in another, non-conflicting namespace. Apparently >>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries >>>>>> >>>>> to >>> >>>> avoid it. >>>>>> >>>>>>> >>>>>>> Does your assumption 1 ("At a given point in time, only a single >>>>>>> >>>>>> Indexing Technology is used") imply that in the assembler >>>>>> >>>>> configuration, >>> >>>> you cannot have ja:loadClass declarations for both Lucene and ES >>>>>> >>>>> backends? >>> >>>> Or how do you run something like Fuseki that contains (in a single big >>>>>> >>>>> JAR) >>> >>>> both the jena-text and jena-text-es modules with all their >>>>>> >>>>> dependencies, >>> >>>> one of which requires the Lucene 4.x classes and the other one the >>>>>> >>>>> Lucene >>> >>>> 6.4.1 classes? How do you ensure that only one of them is used at a >>>>>> >>>>> time, >>> >>>> and that the Java classloader, even though it has access to both >>>>>> >>>>> versions >>> >>>> of Lucene, only loads classes from the single, correct one and not the >>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" >>>>>> packages, so that you don't end up with two Lucene versions within the >>>>>> >>>>> same >>> >>>> Fuseki JAR? >>>>>> >>>>>>> >>>>>>> -Osma >>>>>>> >>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade >>>>>>> >>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti: >>>>>>> >>>>>>>> Hi Osma, >>>>>>>> >>>>>>>> I understand what you are saying. There are ways to mitigate risks >>>>>>>> >>>>>>> and >>> >>>> balance the refactoring without affecting the existing modules. But I >>>>>>>> >>>>>>> will >>>>>> >>>>>>> not delve into those now. I am not an expert in Jena to convincingly >>>>>>>> >>>>>>> say >>> >>>> that it is possible, without any hiccups. But I can take a guess and >>>>>>>> >>>>>>> say >>> >>>> that it is indeed possible :) >>>>>>>> >>>>>>>> For the question: "is it even possible to mix modules that depend on >>>>>>>> different versions of the Lucene libraries within the same project?" >>>>>>>> >>>>>>>> I actually do not understand what you mean by mixing modules. I >>>>>>>> >>>>>>> assume >>> >>>> you >>>>>> >>>>>>> mean having jena-text and jena-text-es as dependencies in a build >>>>>>>> >>>>>>> without >>>>>> >>>>>>> causing the build to conflict. If that is what you mean than the >>>>>>>> >>>>>>> answer >>> >>>> is >>>>>> >>>>>>> yes it is possible and quite simple as well. Let me explain how it is >>>>>>>> possible. But before that some assumption which I want to call out >>>>>>>> explicitly. >>>>>>>> >>>>>>>> *Assumption:* >>>>>>>> 1. At a given point in time, only a single Indexing Technology is >>>>>>>> >>>>>>> used >>> >>>> for >>>>>> >>>>>>> text based indexing and searching via Jean. What this means is that >>>>>>>> >>>>>>> we >>> >>>> will >>>>>> >>>>>>> either use Lucene Implementation OR Solr Implementation OR ES >>>>>>>> Implementation at any given point in time. >>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes >>>>>>>> >>>>>>> but >>> >>>> only on jena-text classes, if at all. >>>>>>>> >>>>>>>> Based on these assumptions it is possible to create a build that >>>>>>>> >>>>>>> contains >>>>>> >>>>>>> jena-text based common classes + ES specific classes without any >>>>>>>> compatibility issues. And it is infact quite simple. I did it in the >>>>>>>> current jena-text-es module and ran the entire build which >>>>>>>> succeeded. >>>>>>>> The key is to include the latest Lucene dependencies at the very >>>>>>>> >>>>>>> beginning >>>>>> >>>>>>> in the pom and then include jena-text dependency. Maven will then >>>>>>>> automatically resolve the dependency issues by including the Lucene >>>>>>>> librarires that we included in our es specific pom. Have a look the >>>>>>>> >>>>>>> pom >>> >>>> of >>>>>> >>>>>>> jena-text-es module here to see how it can be done : >>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anuj Kumar >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < >>>>>>>> >>>>>>> osma.suomi...@helsinki.fi> >>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Anuj, >>>>>>>>> >>>>>>>>> I understand your concerns. However, we also need to balance >>>>>>>>> between >>>>>>>>> >>>>>>>> the >>>>>> >>>>>>> needs of individual modules/features and the whole codebase. I'm >>>>>>>>> >>>>>>>> willing to >>>>>> >>>>>>> put in the effort to keep the other modules up to date with newer >>>>>>>>> >>>>>>>> Lucene >>>>>> >>>>>>> versions. Lucene upgrade requirements are well documented, the only >>>>>>>>> >>>>>>>> hitches >>>>>> >>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene >>>>>>>>> features that were dropped from newer versions. >>>>>>>>> >>>>>>>>> A perhaps stupid question to more experienced Java developers: is >>>>>>>>> it >>>>>>>>> >>>>>>>> even >>>>>> >>>>>>> possible to mix modules that depend on different versions of the >>>>>>>>> >>>>>>>> Lucene >>> >>>> libraries within the same project? In my (quite limited) >>>>>>>>> >>>>>>>> understanding >>> >>>> of >>>>>> >>>>>>> Java projects and libraries, this requires special arrangements >>>>>>>>> >>>>>>>> (e.g. >>> >>>> shading) as the Java package/class namespace is shared by all the >>>>>>>>> >>>>>>>> code >>> >>>> running within the same JVM. >>>>>>>>> >>>>>>>>> So can you create, say, a Fuseki build that contains the current >>>>>>>>> >>>>>>>> jena-text >>>>>> >>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module >>>>>>>>> >>>>>>>> (depending >>>>>> >>>>>>> on Lucene 6.4.1) without any compatibility issues? >>>>>>>>> >>>>>>>>> -Osma >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> My 2 Cents : >>>>>>>>>> >>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr >>>>>>>>>> and >>>>>>>>>> >>>>>>>>> ES is >>>>>> >>>>>>> exactly for avoiding the "All or Nothing" approach we need to take >>>>>>>>>> >>>>>>>>> if >>> >>>> we >>>>>> >>>>>>> club them all together. If they stay together and if in the near >>>>>>>>>> >>>>>>>>> future I >>>>>> >>>>>>> want to upgrade ES to another version, I also need to again upgrade >>>>>>>>>> >>>>>>>>> Lucene >>>>>> >>>>>>> and Solr and possibly another implementation that may have been >>>>>>>>>> >>>>>>>>> added >>> >>>> during the time. As we all know, this means weeks of work if not >>>>>>>>>> >>>>>>>>> months to >>>>>> >>>>>>> get the changes released. This will personally de-motivate me to do >>>>>>>>>> anything and I will probably start maintaining my version of >>>>>>>>>> >>>>>>>>> Jena-Text as >>>>>> >>>>>>> that would be much simpler to do than to upgrade and test and in >>>>>>>>>> >>>>>>>>> the >>> >>>> process own(read fix bugs) the upgrade for each and every >>>>>>>>>> >>>>>>>>> technology. >>> >>>> >>>>>>>>>> If they are developed as separate modules, they can evolve >>>>>>>>>> >>>>>>>>> independently >>>>>> >>>>>>> of >>>>>>>>>> each other and we can avoid situations where we cant upgrade to >>>>>>>>>> >>>>>>>>> latest >>> >>>> version of Lucene because we do not know what effect it will have >>>>>>>>>> >>>>>>>>> on >>> >>>> Solr >>>>>> >>>>>>> Implementation. >>>>>>>>>> >>>>>>>>>> We can start with having a separate Module for Jena Text ES and >>>>>>>>>> see >>>>>>>>>> >>>>>>>>> how >>>>>> >>>>>>> things go. If they go well, we could extract out Solr and Lucene >>>>>>>>>> >>>>>>>>> out >>> >>>> of >>>>>> >>>>>>> Jena Text. >>>>>>>>>> >>>>>>>>>> Again this is just a suggestion based on my limited industry >>>>>>>>>> >>>>>>>>> experience. >>>>>> >>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Anuj Kumar >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < >>>>>>>>>> >>>>>>>>> osma.suomi...@helsinki.fi >>>>>> >>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc >>>>>>>>>>> >>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena. >>>>>>>>>>>> >>>>>>>>>>> apache.org >>> >>>> %3E >>>>>> >>>>>>> ? In other words, might it be better to factor out between -text >>>>>>>>>>>> >>>>>>>>>>> and >>> >>>> -spatial and _then_ try to upgrade the Lucene version? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to >>>>>>>>>>> volunteer >>>>>>>>>>> >>>>>>>>>> to do >>>>>> >>>>>>> the actual work! >>>>>>>>>>> >>>>>>>>>>> I don't use the Solr component now, but I could easily see so >>>>>>>>>>> >>>>>>>>>> doing... >>>>>> >>>>>>> >>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any >>>>>>>>>>>> >>>>>>>>>>> work to >>>>>> >>>>>>> maintain it, so consider that just a very small and blurry data >>>>>>>>>>>> >>>>>>>>>>> point. >>>>>> >>>>>>> :) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out >>>>>>>>>>> >>>>>>>>>> how >>> >>>> to >>>>>> >>>>>>> get >>>>>>>>>>> it running... If you could just try that with some toy data, then >>>>>>>>>>> >>>>>>>>>> your >>>>>> >>>>>>> data >>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for >>>>>>>>>>> >>>>>>>>>> anything, so >>>>>> >>>>>>> I'm not very familiar with how to set it up, and the jena-text >>>>>>>>>>> instructions >>>>>>>>>>> are pretty vague unfortunately. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Osma >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Osma Suominen >>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>>>> National Library of Finland >>>>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>>>> Tel. +358 50 3199529 >>>>>>>>>>> osma.suomi...@helsinki.fi >>>>>>>>>>> http://www.nationallibrary.fi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Osma Suominen >>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>> National Library of Finland >>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>> Tel. +358 50 3199529 >>>>>>>>> osma.suomi...@helsinki.fi >>>>>>>>> http://www.nationallibrary.fi >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Osma Suominen >>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>> National Library of Finland >>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>> Tel. +358 50 3199529 >>>>>>> osma.suomi...@helsinki.fi >>>>>>> http://www.nationallibrary.fi >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> Osma Suominen >>>> D.Sc. (Tech), Information Systems Specialist >>>> National Library of Finland >>>> P.O. Box 26 (Kaikukatu 4) >>>> 00014 HELSINGIN YLIOPISTO >>>> Tel. +358 50 3199529 >>>> osma.suomi...@helsinki.fi >>>> http://www.nationallibrary.fi >>>> >>> >>> >>> >> > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > osma.suomi...@helsinki.fi > http://www.nationallibrary.fi > -- *Anuj Kumar*