Just FYI, I was able to index multiple fields in ElasticSearch using Jena Text capability. The issue was in my ElasticSearch code where I was doing insert every time instead of an update :/
Cheers! Anuj Kumar On Wed, Mar 1, 2017 at 7:40 PM, anuj kumar <[email protected]> wrote: > Thanks Osma. I sent my previous email just a minute early. I will try your > suggestion and if it doesn't work will send you the entire example. > > Thanks again. > Anuj > > On 1 Mar 2017 19:36, "Osma Suominen" <[email protected]> wrote: > >> Hi Anuj! >> >> Generally I use assembler descriptions to configure the jena-text index. >> An example with multiple properties (SKOS label properties) is here: >> https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre >> ating-a-text-index >> >> For examples on how to use assembler descriptions from Java code, take a >> look at the jena-text unit tests. They generally contain a snippet of >> assembler definition that configures the text index in a particular way, >> then test that it does what it should when using that configuration. >> >> You didn't provide a full example. What is your data and what query did >> you use? What results did you expect? What happened instead? >> >> One possible problem in your configuration is that you have set the >> primary predicate to rdfs:label, but not set a field for it. Try adding >> this: >> >> entDef.set("label", RDFS.label.asNode()); >> >> For querying everything else but the default field, you need to specify >> the predicate at query time. With your configuration, it should be possible >> to query rdfs:comment values like this: >> >> ?s text:query (rdfs:comment "word") . >> >> Hope this helps! >> >> -Osma >> >> 01.03.2017, 17:33, anuj kumar kirjoitti: >> >>> BTW, I have one more question: >>> >>> How do I add more than one field to be indexed in my Index? >>> Basically, if I want to index rdfs:label , rdfs:comment in the same index >>> document, how do I do it? >>> >>> I tried : >>> >>> EntityDefinition entDef = new EntityDefinition(DOC_TYPE, >>> FIELD_TO_SEARCH); >>> entDef.setPrimaryPredicate(RDFS.label); >>> entDef.setGraphField(GRAPH_FIELD_NAME); >>> entDef.set("comment", RDFS.comment.asNode()); >>> >>> But it doesnt work. Can you please point me on a way to do it please. >>> This >>> is an important piece of functionality I need. >>> >>> Thanks, >>> Anuj Kumar >>> >>> >>> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <[email protected]> >>> wrote: >>> >>> I personally have no preference as to how the code in Jena should be >>>> structured, as long as I am able to use it :). >>>> I have personal preference of doing it in a specific way because IMO, it >>>> is modular which makes it much easier to maintain in the long run. But >>>> again it may not be the quickest one. >>>> >>>> I already have been given a deadline, by the company to have ES >>>> extension >>>> implemented in the next 15 days :). What this means is that I will be >>>> maintaining the ES code extension to Jena Text at-least locally for a >>>> coming period of time. I would be more than happy to contribute to Jena >>>> community whatever is required to have a proper ElasticSearch >>>> Implementation in place, whether within jena-text module or as a >>>> separate >>>> module. Till the time Lucene and Solr is not upgraded to the latest >>>> version, I will have to maintain a separate module for jena-text-es. >>>> >>>> Cheers! >>>> Anuj Kumar >>>> >>>> >>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <[email protected]> wrote: >>>> >>>> Osma-- >>>>> >>>>> The short answer is that yes, given the right tools you _can_ have >>>>> different versions of code accessible in different ways. The longer >>>>> answer >>>>> is that it's probably not a viable alternative for Jena for this >>>>> problem, >>>>> at least not without a lot of other change. >>>>> >>>>> You are right to point to the classloader mechanism as being at the >>>>> heart >>>>> of this question, but I must alter your remark just slightly. From "the >>>>> Java classloader only sees a single, flat package/class namespace and >>>>> a set >>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, >>>>> flat package/class namespace and a set of compiled classes". >>>>> >>>>> This is the fact that OSGi uses to make it possible to maintain strict >>>>> module boundaries (and even dynamic module relationships at run-time). >>>>> Each >>>>> OSGi bundle sees its own classloader, and the framework is responsible >>>>> for >>>>> connecting bundles up to ensure that every bundle has what it needs in >>>>> the >>>>> way of types to function, based on metadata that the bundles provide >>>>> to the >>>>> framework. It's an incredibly powerful system (I use it every day and >>>>> enjoy >>>>> it enormously) but it's also very "heavy" and requires a good deal of >>>>> investment to use. In particular, it's probably too large to put >>>>> _inside_ >>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other >>>>> hand.) >>>>> >>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of >>>>> this kind, but it's really meant for the JDK itself, not application >>>>> libraries. In theory, we could "roll our own" classloader management >>>>> for >>>>> this problem. That sounds like more than a bit of a rabbit hole to me. >>>>> There might be another, more lightweight, toolkit out there to this >>>>> purpose, but I'm not aware of any myself. >>>>> >>>>> Otherwise, yes, you get into shading and the like. We have to do that >>>>> for >>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's >>>>> hardly a >>>>> thing we want to do any more of than needed, I don't think. >>>>> >>>>> --- >>>>> A. Soroka >>>>> The University of Virginia Library >>>>> >>>>> [1] http://openjdk.java.net/projects/jigsaw/ >>>>> >>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <[email protected]> >>>>>> >>>>> wrote: >>>>> >>>>>> >>>>>> Hi Anuj! >>>>>> >>>>>> Thanks for the clarification. >>>>>> >>>>>> However, I'm still not sure I understand the situation completely. I >>>>>> >>>>> know Maven can perform a lot of tricks, but Maven modules are just >>>>> convenient ways to structure a Java project. Maven cannot change the >>>>> fact >>>>> that at runtime, module divisions don't really matter (except that they >>>>> usually correspond to package sub-namespaces) and the Java classloader >>>>> only >>>>> sees a single, flat package/class namespace and a set of compiled >>>>> classes >>>>> (usually within JARs) in the classpath that it needs to check to find >>>>> the >>>>> right classes, and if there are two versions of the same library (eg >>>>> Lucene) with overlapping class names, that's going to cause trouble. >>>>> The >>>>> only way around that is to shade some of the libraries, i.e. rename >>>>> them so >>>>> that they end up in another, non-conflicting namespace. Apparently >>>>> Elasticsearch also did some of that in the past [1] but nowadays tries >>>>> to >>>>> avoid it. >>>>> >>>>>> >>>>>> Does your assumption 1 ("At a given point in time, only a single >>>>>> >>>>> Indexing Technology is used") imply that in the assembler >>>>> configuration, >>>>> you cannot have ja:loadClass declarations for both Lucene and ES >>>>> backends? >>>>> Or how do you run something like Fuseki that contains (in a single big >>>>> JAR) >>>>> both the jena-text and jena-text-es modules with all their >>>>> dependencies, >>>>> one of which requires the Lucene 4.x classes and the other one the >>>>> Lucene >>>>> 6.4.1 classes? How do you ensure that only one of them is used at a >>>>> time, >>>>> and that the Java classloader, even though it has access to both >>>>> versions >>>>> of Lucene, only loads classes from the single, correct one and not the >>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" >>>>> packages, so that you don't end up with two Lucene versions within the >>>>> same >>>>> Fuseki JAR? >>>>> >>>>>> >>>>>> -Osma >>>>>> >>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade >>>>>> >>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti: >>>>>> >>>>>>> Hi Osma, >>>>>>> >>>>>>> I understand what you are saying. There are ways to mitigate risks >>>>>>> and >>>>>>> balance the refactoring without affecting the existing modules. But I >>>>>>> >>>>>> will >>>>> >>>>>> not delve into those now. I am not an expert in Jena to convincingly >>>>>>> >>>>>> say >>>>> >>>>>> that it is possible, without any hiccups. But I can take a guess and >>>>>>> >>>>>> say >>>>> >>>>>> that it is indeed possible :) >>>>>>> >>>>>>> For the question: "is it even possible to mix modules that depend on >>>>>>> different versions of the Lucene libraries within the same project?" >>>>>>> >>>>>>> I actually do not understand what you mean by mixing modules. I >>>>>>> assume >>>>>>> >>>>>> you >>>>> >>>>>> mean having jena-text and jena-text-es as dependencies in a build >>>>>>> >>>>>> without >>>>> >>>>>> causing the build to conflict. If that is what you mean than the >>>>>>> >>>>>> answer is >>>>> >>>>>> yes it is possible and quite simple as well. Let me explain how it is >>>>>>> possible. But before that some assumption which I want to call out >>>>>>> explicitly. >>>>>>> >>>>>>> *Assumption:* >>>>>>> 1. At a given point in time, only a single Indexing Technology is >>>>>>> used >>>>>>> >>>>>> for >>>>> >>>>>> text based indexing and searching via Jean. What this means is that we >>>>>>> >>>>>> will >>>>> >>>>>> either use Lucene Implementation OR Solr Implementation OR ES >>>>>>> Implementation at any given point in time. >>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes >>>>>>> >>>>>> but >>>>> >>>>>> only on jena-text classes, if at all. >>>>>>> >>>>>>> Based on these assumptions it is possible to create a build that >>>>>>> >>>>>> contains >>>>> >>>>>> jena-text based common classes + ES specific classes without any >>>>>>> compatibility issues. And it is infact quite simple. I did it in the >>>>>>> current jena-text-es module and ran the entire build which succeeded. >>>>>>> The key is to include the latest Lucene dependencies at the very >>>>>>> >>>>>> beginning >>>>> >>>>>> in the pom and then include jena-text dependency. Maven will then >>>>>>> automatically resolve the dependency issues by including the Lucene >>>>>>> librarires that we included in our es specific pom. Have a look the >>>>>>> >>>>>> pom of >>>>> >>>>>> jena-text-es module here to see how it can be done : >>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Anuj Kumar >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < >>>>>>> >>>>>> [email protected]> >>>>> >>>>>> wrote: >>>>>>> >>>>>>> Hi Anuj, >>>>>>>> >>>>>>>> I understand your concerns. However, we also need to balance between >>>>>>>> >>>>>>> the >>>>> >>>>>> needs of individual modules/features and the whole codebase. I'm >>>>>>>> >>>>>>> willing to >>>>> >>>>>> put in the effort to keep the other modules up to date with newer >>>>>>>> >>>>>>> Lucene >>>>> >>>>>> versions. Lucene upgrade requirements are well documented, the only >>>>>>>> >>>>>>> hitches >>>>> >>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene >>>>>>>> features that were dropped from newer versions. >>>>>>>> >>>>>>>> A perhaps stupid question to more experienced Java developers: is it >>>>>>>> >>>>>>> even >>>>> >>>>>> possible to mix modules that depend on different versions of the >>>>>>>> >>>>>>> Lucene >>>>> >>>>>> libraries within the same project? In my (quite limited) >>>>>>>> >>>>>>> understanding of >>>>> >>>>>> Java projects and libraries, this requires special arrangements (e.g. >>>>>>>> shading) as the Java package/class namespace is shared by all the >>>>>>>> code >>>>>>>> running within the same JVM. >>>>>>>> >>>>>>>> So can you create, say, a Fuseki build that contains the current >>>>>>>> >>>>>>> jena-text >>>>> >>>>>> module (depending on Lucene 4.x) and the new jena-text-es module >>>>>>>> >>>>>>> (depending >>>>> >>>>>> on Lucene 6.4.1) without any compatibility issues? >>>>>>>> >>>>>>>> -Osma >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: >>>>>>>> >>>>>>>> Hi, >>>>>>>>> >>>>>>>>> My 2 Cents : >>>>>>>>> >>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and >>>>>>>>> >>>>>>>> ES is >>>>> >>>>>> exactly for avoiding the "All or Nothing" approach we need to take >>>>>>>>> >>>>>>>> if we >>>>> >>>>>> club them all together. If they stay together and if in the near >>>>>>>>> >>>>>>>> future I >>>>> >>>>>> want to upgrade ES to another version, I also need to again upgrade >>>>>>>>> >>>>>>>> Lucene >>>>> >>>>>> and Solr and possibly another implementation that may have been added >>>>>>>>> during the time. As we all know, this means weeks of work if not >>>>>>>>> >>>>>>>> months to >>>>> >>>>>> get the changes released. This will personally de-motivate me to do >>>>>>>>> anything and I will probably start maintaining my version of >>>>>>>>> >>>>>>>> Jena-Text as >>>>> >>>>>> that would be much simpler to do than to upgrade and test and in the >>>>>>>>> process own(read fix bugs) the upgrade for each and every >>>>>>>>> technology. >>>>>>>>> >>>>>>>>> If they are developed as separate modules, they can evolve >>>>>>>>> >>>>>>>> independently >>>>> >>>>>> of >>>>>>>>> each other and we can avoid situations where we cant upgrade to >>>>>>>>> >>>>>>>> latest >>>>> >>>>>> version of Lucene because we do not know what effect it will have on >>>>>>>>> >>>>>>>> Solr >>>>> >>>>>> Implementation. >>>>>>>>> >>>>>>>>> We can start with having a separate Module for Jena Text ES and see >>>>>>>>> >>>>>>>> how >>>>> >>>>>> things go. If they go well, we could extract out Solr and Lucene out >>>>>>>>> >>>>>>>> of >>>>> >>>>>> Jena Text. >>>>>>>>> >>>>>>>>> Again this is just a suggestion based on my limited industry >>>>>>>>> >>>>>>>> experience. >>>>> >>>>>> >>>>>>>>> Thanks, >>>>>>>>> Anuj Kumar >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < >>>>>>>>> >>>>>>>> [email protected] >>>>> >>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc >>>>>>>>>> >>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa >>>>>>>>>>> >>>>>>>>>> che.org%3E >>>>> >>>>>> ? In other words, might it be better to factor out between -text >>>>>>>>>>> >>>>>>>>>> and >>>>> >>>>>> -spatial and _then_ try to upgrade the Lucene version? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I certainly wouldn't object to that, but somebody has to >>>>>>>>>> volunteer >>>>>>>>>> >>>>>>>>> to do >>>>> >>>>>> the actual work! >>>>>>>>>> >>>>>>>>>> I don't use the Solr component now, but I could easily see so >>>>>>>>>> >>>>>>>>> doing... >>>>> >>>>>> >>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any >>>>>>>>>>> >>>>>>>>>> work to >>>>> >>>>>> maintain it, so consider that just a very small and blurry data >>>>>>>>>>> >>>>>>>>>> point. >>>>> >>>>>> :) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out >>>>>>>>>> how >>>>>>>>>> >>>>>>>>> to >>>>> >>>>>> get >>>>>>>>>> it running... If you could just try that with some toy data, then >>>>>>>>>> >>>>>>>>> your >>>>> >>>>>> data >>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for >>>>>>>>>> >>>>>>>>> anything, so >>>>> >>>>>> I'm not very familiar with how to set it up, and the jena-text >>>>>>>>>> instructions >>>>>>>>>> are pretty vague unfortunately. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Osma >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Osma Suominen >>>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>>> National Library of Finland >>>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>>> Tel. +358 50 3199529 >>>>>>>>>> [email protected] >>>>>>>>>> http://www.nationallibrary.fi >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Osma Suominen >>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>> National Library of Finland >>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>> Tel. +358 50 3199529 >>>>>>>> [email protected] >>>>>>>> http://www.nationallibrary.fi >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Osma Suominen >>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>> National Library of Finland >>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>> 00014 HELSINGIN YLIOPISTO >>>>>> Tel. +358 50 3199529 >>>>>> [email protected] >>>>>> http://www.nationallibrary.fi >>>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> *Anuj Kumar* >>>> >>>> >>> >>> >>> >> >> -- >> Osma Suominen >> D.Sc. (Tech), Information Systems Specialist >> National Library of Finland >> P.O. Box 26 (Kaikukatu 4) >> 00014 HELSINGIN YLIOPISTO >> Tel. +358 50 3199529 >> [email protected] >> http://www.nationallibrary.fi >> > -- *Anuj Kumar*
