Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

anuj kumar Wed, 01 Mar 2017 07:34:03 -0800

BTW, I have one more question:

How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?


I tried :

EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());

But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.

Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> I personally have no preference as to how the code in Jena should be
> structured, as long as I am able to use it :).
> I have personal preference of doing it in a specific way because IMO, it
> is modular which makes it much easier to maintain in the long run. But
> again it may not be the quickest one.
>
> I already have been given a deadline, by the company to have ES extension
> implemented in the next 15 days :). What this means is that I will be
> maintaining the ES code extension to Jena Text at-least locally for a
> coming period of time. I would be more than happy to contribute to Jena
> community whatever is required to have a proper ElasticSearch
> Implementation in place, whether within jena-text module or as a separate
> module. Till the time Lucene and Solr is not upgraded to the latest
> version, I will have to maintain a separate module for jena-text-es.
>
> Cheers!
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>
>> Osma--
>>
>> The short answer is that yes, given the right tools you _can_ have
>> different versions of code accessible in different ways. The longer answer
>> is that it's probably not a viable alternative for Jena for this problem,
>> at least not without a lot of other change.
>>
>> You are right to point to the classloader mechanism as being at the heart
>> of this question, but I must alter your remark just slightly. From "the
>> Java classloader only sees a single, flat package/class namespace and a set
>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>> flat package/class namespace and a set of compiled classes".
>>
>> This is the fact that OSGi uses to make it possible to maintain strict
>> module boundaries (and even dynamic module relationships at run-time). Each
>> OSGi bundle sees its own classloader, and the framework is responsible for
>> connecting bundles up to ensure that every bundle has what it needs in the
>> way of types to function, based on metadata that the bundles provide to the
>> framework. It's an incredibly powerful system (I use it every day and enjoy
>> it enormously) but it's also very "heavy" and requires a good deal of
>> investment to use. In particular, it's probably too large to put _inside_
>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>
>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>> this kind, but it's really meant for the JDK itself, not application
>> libraries. In theory, we could "roll our own" classloader management for
>> this problem. That sounds like more than a bit of a rabbit hole to me.
>> There might be another, more lightweight, toolkit out there to this
>> purpose, but I'm not aware of any myself.
>>
>> Otherwise, yes, you get into shading and the like. We have to do that for
>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>> thing we want to do any more of than needed, I don't think.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>> [1] http://openjdk.java.net/projects/jigsaw/
>>
>> > On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi>
>> wrote:
>> >
>> > Hi Anuj!
>> >
>> > Thanks for the clarification.
>> >
>> > However, I'm still not sure I understand the situation completely. I
>> know Maven can perform a lot of tricks, but Maven modules are just
>> convenient ways to structure a Java project. Maven cannot change the fact
>> that at runtime, module divisions don't really matter (except that they
>> usually correspond to package sub-namespaces) and the Java classloader only
>> sees a single, flat package/class namespace and a set of compiled classes
>> (usually within JARs) in the classpath that it needs to check to find the
>> right classes, and if there are two versions of the same library (eg
>> Lucene) with overlapping class names, that's going to cause trouble. The
>> only way around that is to shade some of the libraries, i.e. rename them so
>> that they end up in another, non-conflicting namespace. Apparently
>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>> avoid it.
>> >
>> > Does your assumption 1 ("At a given point in time, only a single
>> Indexing Technology is used") imply that in the assembler configuration,
>> you cannot have ja:loadClass declarations for both Lucene and ES backends?
>> Or how do you run something like Fuseki that contains (in a single big JAR)
>> both the jena-text and jena-text-es modules with all their dependencies,
>> one of which requires the Lucene 4.x classes and the other one the Lucene
>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>> and that the Java classloader, even though it has access to both versions
>> of Lucene, only loads classes from the single, correct one and not the
>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>> packages, so that you don't end up with two Lucene versions within the same
>> Fuseki JAR?
>> >
>> > -Osma
>> >
>> > [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>> >
>> > 01.03.2017, 11:03, anuj kumar kirjoitti:
>> >> Hi Osma,
>> >>
>> >> I understand what you are saying. There are ways to mitigate risks and
>> >> balance the refactoring without affecting the existing modules. But I
>> will
>> >> not delve into those now. I am not an expert in Jena to convincingly
>> say
>> >> that it is possible, without any hiccups. But I can take a guess and
>> say
>> >> that it is indeed possible :)
>> >>
>> >> For the question: "is it even possible to mix modules that depend on
>> >> different versions of the Lucene libraries within the same project?"
>> >>
>> >> I actually do not understand what you mean by mixing modules. I assume
>> you
>> >> mean having jena-text and jena-text-es as dependencies in a build
>> without
>> >> causing the build to conflict. If that is what you mean than the
>> answer is
>> >> yes it is possible and quite simple as well. Let me explain how it is
>> >> possible. But before that some assumption which I want to call out
>> >> explicitly.
>> >>
>> >> *Assumption:*
>> >> 1. At a given point in time, only a single Indexing Technology is used
>> for
>> >> text based indexing and searching via Jean. What this means is that we
>> will
>> >> either use Lucene Implementation OR Solr Implementation OR ES
>> >> Implementation at any given point in time.
>> >> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>> but
>> >> only on jena-text classes, if at all.
>> >>
>> >> Based on these assumptions it is possible to create a build that
>> contains
>> >> jena-text based common classes + ES specific classes without any
>> >> compatibility issues. And it is infact quite simple. I did it in the
>> >> current jena-text-es module and ran the entire build which succeeded.
>> >> The key is to include the latest Lucene dependencies at the very
>> beginning
>> >> in the pom and then include jena-text dependency. Maven will then
>> >> automatically resolve the dependency issues by including the Lucene
>> >> librarires that we included in our es specific pom. Have a look the
>> pom of
>> >> jena-text-es module here to see how it can be done :
>> >> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>> >>
>> >>
>> >> Thanks,
>> >> Anuj Kumar
>> >>
>> >>
>> >> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>> osma.suomi...@helsinki.fi>
>> >> wrote:
>> >>
>> >>> Hi Anuj,
>> >>>
>> >>> I understand your concerns. However, we also need to balance between
>> the
>> >>> needs of individual modules/features and the whole codebase. I'm
>> willing to
>> >>> put in the effort to keep the other modules up to date with newer
>> Lucene
>> >>> versions. Lucene upgrade requirements are well documented, the only
>> hitches
>> >>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>> >>> features that were dropped from newer versions.
>> >>>
>> >>> A perhaps stupid question to more experienced Java developers: is it
>> even
>> >>> possible to mix modules that depend on different versions of the
>> Lucene
>> >>> libraries within the same project? In my (quite limited)
>> understanding of
>> >>> Java projects and libraries, this requires special arrangements (e.g.
>> >>> shading) as the Java package/class namespace is shared by all the code
>> >>> running within the same JVM.
>> >>>
>> >>> So can you create, say, a Fuseki build that contains the current
>> jena-text
>> >>> module (depending on Lucene 4.x) and the new jena-text-es module
>> (depending
>> >>> on Lucene 6.4.1) without any compatibility issues?
>> >>>
>> >>> -Osma
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> My 2 Cents :
>> >>>>
>> >>>> The reason I proposed to have separate modules for Lucene, Solr and
>> ES is
>> >>>> exactly for avoiding the "All or Nothing" approach we need to take
>> if we
>> >>>> club them all together. If they stay together and if in the near
>> future I
>> >>>> want to upgrade ES to another version, I also need to again upgrade
>> Lucene
>> >>>> and Solr and possibly another implementation that may have been added
>> >>>> during the time. As we all know, this means weeks of work if not
>> months to
>> >>>> get the changes released. This will personally de-motivate me to do
>> >>>> anything and I will probably start maintaining my version of
>> Jena-Text as
>> >>>> that would be much simpler to do than to upgrade and test and in the
>> >>>> process own(read fix bugs) the upgrade for each and every technology.
>> >>>>
>> >>>> If they are developed as separate modules, they can evolve
>> independently
>> >>>> of
>> >>>> each other and we can avoid situations where we cant upgrade to
>> latest
>> >>>> version of Lucene because we do not know what effect it will have on
>> Solr
>> >>>> Implementation.
>> >>>>
>> >>>> We can start with having a separate Module for Jena Text ES and see
>> how
>> >>>> things go. If they go well, we could extract out Solr and Lucene out
>> of
>> >>>> Jena Text.
>> >>>>
>> >>>> Again this is just a suggestion based on my limited industry
>> experience.
>> >>>>
>> >>>> Thanks,
>> >>>> Anuj Kumar
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>> osma.suomi...@helsinki.fi
>> >>>>>
>> >>>> wrote:
>> >>>>
>> >>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>> >>>>>
>> >>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>> >>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>> che.org%3E
>> >>>>>> ? In other words, might it be better to factor out between -text
>> and
>> >>>>>> -spatial and _then_ try to upgrade the Lucene version?
>> >>>>>>
>> >>>>>>
>> >>>>> I certainly wouldn't object to that, but somebody has to volunteer
>> to do
>> >>>>> the actual work!
>> >>>>>
>> >>>>> I don't use the Solr component now, but I could easily see so
>> doing...
>> >>>>>
>> >>>>>> that's pretty vague, I know, and I'm not in a position to do any
>> work to
>> >>>>>> maintain it, so consider that just a very small and blurry data
>> point.
>> >>>>>> :)
>> >>>>>>
>> >>>>>>
>> >>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>> to
>> >>>>> get
>> >>>>> it running... If you could just try that with some toy data, then
>> your
>> >>>>> data
>> >>>>> point would be a lot less blurry :) I haven't used Solr for
>> anything, so
>> >>>>> I'm not very familiar with how to set it up, and the jena-text
>> >>>>> instructions
>> >>>>> are pretty vague unfortunately.
>> >>>>>
>> >>>>>
>> >>>>> -Osma
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Osma Suominen
>> >>>>> D.Sc. (Tech), Information Systems Specialist
>> >>>>> National Library of Finland
>> >>>>> P.O. Box 26 (Kaikukatu 4)
>> >>>>> 00014 HELSINGIN YLIOPISTO
>> >>>>> Tel. +358 50 3199529
>> >>>>> osma.suomi...@helsinki.fi
>> >>>>> http://www.nationallibrary.fi
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Osma Suominen
>> >>> D.Sc. (Tech), Information Systems Specialist
>> >>> National Library of Finland
>> >>> P.O. Box 26 (Kaikukatu 4)
>> >>> 00014 HELSINGIN YLIOPISTO
>> >>> Tel. +358 50 3199529
>> >>> osma.suomi...@helsinki.fi
>> >>> http://www.nationallibrary.fi
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Osma Suominen
>> > D.Sc. (Tech), Information Systems Specialist
>> > National Library of Finland
>> > P.O. Box 26 (Kaikukatu 4)
>> > 00014 HELSINGIN YLIOPISTO
>> > Tel. +358 50 3199529
>> > osma.suomi...@helsinki.fi
>> > http://www.nationallibrary.fi
>>
>>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Reply via email to