Thanks Osma. I sent my previous email just a minute early. I will try your suggestion and if it doesn't work will send you the entire example.
Thanks again. Anuj On 1 Mar 2017 19:36, "Osma Suominen" <[email protected]> wrote: > Hi Anuj! > > Generally I use assembler descriptions to configure the jena-text index. > An example with multiple properties (SKOS label properties) is here: > https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre > ating-a-text-index > > For examples on how to use assembler descriptions from Java code, take a > look at the jena-text unit tests. They generally contain a snippet of > assembler definition that configures the text index in a particular way, > then test that it does what it should when using that configuration. > > You didn't provide a full example. What is your data and what query did > you use? What results did you expect? What happened instead? > > One possible problem in your configuration is that you have set the > primary predicate to rdfs:label, but not set a field for it. Try adding > this: > > entDef.set("label", RDFS.label.asNode()); > > For querying everything else but the default field, you need to specify > the predicate at query time. With your configuration, it should be possible > to query rdfs:comment values like this: > > ?s text:query (rdfs:comment "word") . > > Hope this helps! > > -Osma > > 01.03.2017, 17:33, anuj kumar kirjoitti: > >> BTW, I have one more question: >> >> How do I add more than one field to be indexed in my Index? >> Basically, if I want to index rdfs:label , rdfs:comment in the same index >> document, how do I do it? >> >> I tried : >> >> EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH); >> entDef.setPrimaryPredicate(RDFS.label); >> entDef.setGraphField(GRAPH_FIELD_NAME); >> entDef.set("comment", RDFS.comment.asNode()); >> >> But it doesnt work. Can you please point me on a way to do it please. This >> is an important piece of functionality I need. >> >> Thanks, >> Anuj Kumar >> >> >> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <[email protected]> >> wrote: >> >> I personally have no preference as to how the code in Jena should be >>> structured, as long as I am able to use it :). >>> I have personal preference of doing it in a specific way because IMO, it >>> is modular which makes it much easier to maintain in the long run. But >>> again it may not be the quickest one. >>> >>> I already have been given a deadline, by the company to have ES extension >>> implemented in the next 15 days :). What this means is that I will be >>> maintaining the ES code extension to Jena Text at-least locally for a >>> coming period of time. I would be more than happy to contribute to Jena >>> community whatever is required to have a proper ElasticSearch >>> Implementation in place, whether within jena-text module or as a separate >>> module. Till the time Lucene and Solr is not upgraded to the latest >>> version, I will have to maintain a separate module for jena-text-es. >>> >>> Cheers! >>> Anuj Kumar >>> >>> >>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <[email protected]> wrote: >>> >>> Osma-- >>>> >>>> The short answer is that yes, given the right tools you _can_ have >>>> different versions of code accessible in different ways. The longer >>>> answer >>>> is that it's probably not a viable alternative for Jena for this >>>> problem, >>>> at least not without a lot of other change. >>>> >>>> You are right to point to the classloader mechanism as being at the >>>> heart >>>> of this question, but I must alter your remark just slightly. From "the >>>> Java classloader only sees a single, flat package/class namespace and a >>>> set >>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, >>>> flat package/class namespace and a set of compiled classes". >>>> >>>> This is the fact that OSGi uses to make it possible to maintain strict >>>> module boundaries (and even dynamic module relationships at run-time). >>>> Each >>>> OSGi bundle sees its own classloader, and the framework is responsible >>>> for >>>> connecting bundles up to ensure that every bundle has what it needs in >>>> the >>>> way of types to function, based on metadata that the bundles provide to >>>> the >>>> framework. It's an incredibly powerful system (I use it every day and >>>> enjoy >>>> it enormously) but it's also very "heavy" and requires a good deal of >>>> investment to use. In particular, it's probably too large to put >>>> _inside_ >>>> Jena. (I frequently put Jena inside an OSGi instance, on the other >>>> hand.) >>>> >>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of >>>> this kind, but it's really meant for the JDK itself, not application >>>> libraries. In theory, we could "roll our own" classloader management for >>>> this problem. That sounds like more than a bit of a rabbit hole to me. >>>> There might be another, more lightweight, toolkit out there to this >>>> purpose, but I'm not aware of any myself. >>>> >>>> Otherwise, yes, you get into shading and the like. We have to do that >>>> for >>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly >>>> a >>>> thing we want to do any more of than needed, I don't think. >>>> >>>> --- >>>> A. Soroka >>>> The University of Virginia Library >>>> >>>> [1] http://openjdk.java.net/projects/jigsaw/ >>>> >>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <[email protected]> >>>>> >>>> wrote: >>>> >>>>> >>>>> Hi Anuj! >>>>> >>>>> Thanks for the clarification. >>>>> >>>>> However, I'm still not sure I understand the situation completely. I >>>>> >>>> know Maven can perform a lot of tricks, but Maven modules are just >>>> convenient ways to structure a Java project. Maven cannot change the >>>> fact >>>> that at runtime, module divisions don't really matter (except that they >>>> usually correspond to package sub-namespaces) and the Java classloader >>>> only >>>> sees a single, flat package/class namespace and a set of compiled >>>> classes >>>> (usually within JARs) in the classpath that it needs to check to find >>>> the >>>> right classes, and if there are two versions of the same library (eg >>>> Lucene) with overlapping class names, that's going to cause trouble. The >>>> only way around that is to shade some of the libraries, i.e. rename >>>> them so >>>> that they end up in another, non-conflicting namespace. Apparently >>>> Elasticsearch also did some of that in the past [1] but nowadays tries >>>> to >>>> avoid it. >>>> >>>>> >>>>> Does your assumption 1 ("At a given point in time, only a single >>>>> >>>> Indexing Technology is used") imply that in the assembler configuration, >>>> you cannot have ja:loadClass declarations for both Lucene and ES >>>> backends? >>>> Or how do you run something like Fuseki that contains (in a single big >>>> JAR) >>>> both the jena-text and jena-text-es modules with all their dependencies, >>>> one of which requires the Lucene 4.x classes and the other one the >>>> Lucene >>>> 6.4.1 classes? How do you ensure that only one of them is used at a >>>> time, >>>> and that the Java classloader, even though it has access to both >>>> versions >>>> of Lucene, only loads classes from the single, correct one and not the >>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" >>>> packages, so that you don't end up with two Lucene versions within the >>>> same >>>> Fuseki JAR? >>>> >>>>> >>>>> -Osma >>>>> >>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade >>>>> >>>>> 01.03.2017, 11:03, anuj kumar kirjoitti: >>>>> >>>>>> Hi Osma, >>>>>> >>>>>> I understand what you are saying. There are ways to mitigate risks and >>>>>> balance the refactoring without affecting the existing modules. But I >>>>>> >>>>> will >>>> >>>>> not delve into those now. I am not an expert in Jena to convincingly >>>>>> >>>>> say >>>> >>>>> that it is possible, without any hiccups. But I can take a guess and >>>>>> >>>>> say >>>> >>>>> that it is indeed possible :) >>>>>> >>>>>> For the question: "is it even possible to mix modules that depend on >>>>>> different versions of the Lucene libraries within the same project?" >>>>>> >>>>>> I actually do not understand what you mean by mixing modules. I assume >>>>>> >>>>> you >>>> >>>>> mean having jena-text and jena-text-es as dependencies in a build >>>>>> >>>>> without >>>> >>>>> causing the build to conflict. If that is what you mean than the >>>>>> >>>>> answer is >>>> >>>>> yes it is possible and quite simple as well. Let me explain how it is >>>>>> possible. But before that some assumption which I want to call out >>>>>> explicitly. >>>>>> >>>>>> *Assumption:* >>>>>> 1. At a given point in time, only a single Indexing Technology is used >>>>>> >>>>> for >>>> >>>>> text based indexing and searching via Jean. What this means is that we >>>>>> >>>>> will >>>> >>>>> either use Lucene Implementation OR Solr Implementation OR ES >>>>>> Implementation at any given point in time. >>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes >>>>>> >>>>> but >>>> >>>>> only on jena-text classes, if at all. >>>>>> >>>>>> Based on these assumptions it is possible to create a build that >>>>>> >>>>> contains >>>> >>>>> jena-text based common classes + ES specific classes without any >>>>>> compatibility issues. And it is infact quite simple. I did it in the >>>>>> current jena-text-es module and ran the entire build which succeeded. >>>>>> The key is to include the latest Lucene dependencies at the very >>>>>> >>>>> beginning >>>> >>>>> in the pom and then include jena-text dependency. Maven will then >>>>>> automatically resolve the dependency issues by including the Lucene >>>>>> librarires that we included in our es specific pom. Have a look the >>>>>> >>>>> pom of >>>> >>>>> jena-text-es module here to see how it can be done : >>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Anuj Kumar >>>>>> >>>>>> >>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < >>>>>> >>>>> [email protected]> >>>> >>>>> wrote: >>>>>> >>>>>> Hi Anuj, >>>>>>> >>>>>>> I understand your concerns. However, we also need to balance between >>>>>>> >>>>>> the >>>> >>>>> needs of individual modules/features and the whole codebase. I'm >>>>>>> >>>>>> willing to >>>> >>>>> put in the effort to keep the other modules up to date with newer >>>>>>> >>>>>> Lucene >>>> >>>>> versions. Lucene upgrade requirements are well documented, the only >>>>>>> >>>>>> hitches >>>> >>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene >>>>>>> features that were dropped from newer versions. >>>>>>> >>>>>>> A perhaps stupid question to more experienced Java developers: is it >>>>>>> >>>>>> even >>>> >>>>> possible to mix modules that depend on different versions of the >>>>>>> >>>>>> Lucene >>>> >>>>> libraries within the same project? In my (quite limited) >>>>>>> >>>>>> understanding of >>>> >>>>> Java projects and libraries, this requires special arrangements (e.g. >>>>>>> shading) as the Java package/class namespace is shared by all the >>>>>>> code >>>>>>> running within the same JVM. >>>>>>> >>>>>>> So can you create, say, a Fuseki build that contains the current >>>>>>> >>>>>> jena-text >>>> >>>>> module (depending on Lucene 4.x) and the new jena-text-es module >>>>>>> >>>>>> (depending >>>> >>>>> on Lucene 6.4.1) without any compatibility issues? >>>>>>> >>>>>>> -Osma >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: >>>>>>> >>>>>>> Hi, >>>>>>>> >>>>>>>> My 2 Cents : >>>>>>>> >>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and >>>>>>>> >>>>>>> ES is >>>> >>>>> exactly for avoiding the "All or Nothing" approach we need to take >>>>>>>> >>>>>>> if we >>>> >>>>> club them all together. If they stay together and if in the near >>>>>>>> >>>>>>> future I >>>> >>>>> want to upgrade ES to another version, I also need to again upgrade >>>>>>>> >>>>>>> Lucene >>>> >>>>> and Solr and possibly another implementation that may have been added >>>>>>>> during the time. As we all know, this means weeks of work if not >>>>>>>> >>>>>>> months to >>>> >>>>> get the changes released. This will personally de-motivate me to do >>>>>>>> anything and I will probably start maintaining my version of >>>>>>>> >>>>>>> Jena-Text as >>>> >>>>> that would be much simpler to do than to upgrade and test and in the >>>>>>>> process own(read fix bugs) the upgrade for each and every >>>>>>>> technology. >>>>>>>> >>>>>>>> If they are developed as separate modules, they can evolve >>>>>>>> >>>>>>> independently >>>> >>>>> of >>>>>>>> each other and we can avoid situations where we cant upgrade to >>>>>>>> >>>>>>> latest >>>> >>>>> version of Lucene because we do not know what effect it will have on >>>>>>>> >>>>>>> Solr >>>> >>>>> Implementation. >>>>>>>> >>>>>>>> We can start with having a separate Module for Jena Text ES and see >>>>>>>> >>>>>>> how >>>> >>>>> things go. If they go well, we could extract out Solr and Lucene out >>>>>>>> >>>>>>> of >>>> >>>>> Jena Text. >>>>>>>> >>>>>>>> Again this is just a suggestion based on my limited industry >>>>>>>> >>>>>>> experience. >>>> >>>>> >>>>>>>> Thanks, >>>>>>>> Anuj Kumar >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < >>>>>>>> >>>>>>> [email protected] >>>> >>>>> >>>>>>>>> wrote: >>>>>>>> >>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: >>>>>>>> >>>>>>>>> >>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc >>>>>>>>> >>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa >>>>>>>>>> >>>>>>>>> che.org%3E >>>> >>>>> ? In other words, might it be better to factor out between -text >>>>>>>>>> >>>>>>>>> and >>>> >>>>> -spatial and _then_ try to upgrade the Lucene version? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer >>>>>>>>> >>>>>>>> to do >>>> >>>>> the actual work! >>>>>>>>> >>>>>>>>> I don't use the Solr component now, but I could easily see so >>>>>>>>> >>>>>>>> doing... >>>> >>>>> >>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any >>>>>>>>>> >>>>>>>>> work to >>>> >>>>> maintain it, so consider that just a very small and blurry data >>>>>>>>>> >>>>>>>>> point. >>>> >>>>> :) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out >>>>>>>>> how >>>>>>>>> >>>>>>>> to >>>> >>>>> get >>>>>>>>> it running... If you could just try that with some toy data, then >>>>>>>>> >>>>>>>> your >>>> >>>>> data >>>>>>>>> point would be a lot less blurry :) I haven't used Solr for >>>>>>>>> >>>>>>>> anything, so >>>> >>>>> I'm not very familiar with how to set it up, and the jena-text >>>>>>>>> instructions >>>>>>>>> are pretty vague unfortunately. >>>>>>>>> >>>>>>>>> >>>>>>>>> -Osma >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Osma Suominen >>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>> National Library of Finland >>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>> Tel. +358 50 3199529 >>>>>>>>> [email protected] >>>>>>>>> http://www.nationallibrary.fi >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Osma Suominen >>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>> National Library of Finland >>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>> Tel. +358 50 3199529 >>>>>>> [email protected] >>>>>>> http://www.nationallibrary.fi >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Osma Suominen >>>>> D.Sc. (Tech), Information Systems Specialist >>>>> National Library of Finland >>>>> P.O. Box 26 (Kaikukatu 4) >>>>> 00014 HELSINGIN YLIOPISTO >>>>> Tel. +358 50 3199529 >>>>> [email protected] >>>>> http://www.nationallibrary.fi >>>>> >>>> >>>> >>>> >>> >>> -- >>> *Anuj Kumar* >>> >>> >> >> >> > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > [email protected] > http://www.nationallibrary.fi >
