I second Osma's congrats! Do we want to take this into account:
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbcbb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E ? In other words, might it be better to factor out between -text and -spatial and _then_ try to upgrade the Lucene version? I don't use the Solr component now, but I could easily see so doing... that's pretty vague, I know, and I'm not in a position to do any work to maintain it, so consider that just a very small and blurry data point. :) --- A. Soroka The University of Virginia Library > On Feb 28, 2017, at 3:20 AM, Osma Suominen <[email protected]> wrote: > > Hi Anuj! > > Congratulations for getting the PoC working! > > I'm not sure I like the idea of having a separate jena-text-es module. > > Am I right that your main concern with creating a separate module is that the > Elasticsearch client library requires a newer Lucene version than what > jena-text currently uses? In that case, I think the solution should be > upgrading the Lucene version everywhere, i.e. the current jena-text and > jena-spatial modules. This work has already started (see JENA-1250) but it > has recently stalled and has not yet been merged. > > I don't think it should be a problem to have multiple implementations > (Lucene, Solr, ES) within the same module. Ideally a lot of the > infrastructure could be shared (which is of course possible also with > separate modules, as you have done), and I would hope that also the unit > tests could be reused for the different implementations, although that is > currently not the case (the unit tests only target Lucene). > > The Solr side of jena-text has unfortunately bitrotted even more than the > Lucene support. I've previously suggested that it should be removed entirely > [1], but there were no responses to my suggestion at the time. > > -Osma > > [1] https://www.mail-archive.com/[email protected]/msg16380.html > > 27.02.2017, 14:08, anuj kumar kirjoitti: >> Hi All, >> >> *Apologies for the long email.* >> >> As some of you know, I have been working on extending Jena to Support >> ElasticSearch for Text Indexing (in addition to Lucene and Solr). >> >> I have come to a point where I have a basic (read non-prod) code that can >> index RDFS:label text data into ElasticSearch 5.2.1 >> The code is working and testable. You simply have to download elasticsearch >> 5.2.1 and run it locally for executing the test within the ES >> implementation. >> The code is NOT production Ready but just a PoC code. You can find the >> first cut of the code here: https://github.com/EaseTech/jena (look inside >> the module jena-text-es) >> >> I need feedback from Jena maintainers and community, in terms of the >> structuring of the code as this is important for me to finalize before I >> move to implement the full blown Production Ready code for Jean Text >> ElasticSearch Integration. >> >> Here is the short description of what I did and the reasoning behind it: >> >> 1. Created a separate module : *jena-text-es *that extends from *jena-text* >> AND excludes all the Lucene related and Solr related dependencies. The >> reason I had to do it was that* jena-text* module depends on Lucene version >> 4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was >> resulting in the conflicts of Lucene version if I created the code for >> ElasticSearch support within the *jena-text *module. Thus the need to >> create a separate module. >> 2. A side effect of creating a separate module meant, I had to extend the >> TextDataSetFactory.java class present in the *jena-text *module to include >> methods for creating ElasticSearch index objects. I named it >> ESTextDataSetFactory. At this point in time I do not know if this is the >> right approach or if Jena ALWAYS instantiates Index objects using the >> TextDataSetFactory.java class. My initial investigation showed it is fine, >> but I want the people who are experts in Jena to please confirm. >> 3. I have tested a simple integration with ElasticSearch by defining a test >> class under >> src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can >> run this test by first starting an instance of Elasticsearch 5.2.1 locally. >> >> *My Queries* >> 1. Is it acceptable by the Jena community that I create a separate module >> for support of ElasticSearch and call it *jena-text-es*? >> 2. Is it fine if I extend the TextDataSetFactory.java class within the >> *jena-text-es >> *module? >> >> *Food for Thought* >> >> While implementing the ElasticSearch Integration, I could not help but >> notice that the module *jena-text *not only contains the core classes for >> performing text queries, but also contains technology specific (for eg. >> Lucene and Solr) classes. >> IMO, these should be separate and defined in their own modules to enable >> separation of concerns. >> This will also help in easier maintenance and extensions to be added later >> on. >> >> I think we should have the following modules: >> >> jena-text - Containing core Jena text specific classes that are technology >> agnostic. >> jena-text-lucene - Lucene specific implementation of Jena-Text >> jena-text-solr - Solr specific implementation of Jena-Text >> jena-text-es - ElasticSearch specific implementation of Jena-Text >> >> What does everyone think? >> >> Thanks, >> Anuj Kumar >> >> >> On Tue, Feb 14, 2017 at 2:27 PM, anuj kumar <[email protected]> wrote: >> >>> My saviour Osma. It worked :) >>> Thanks for pointing that out. Really appreciate it. >>> I am now to my next task. Implementing the actual code for ElasticSearch >>> integration with Jena. >>> >>> Thanks once again. >>> >>> Anuj Kumar >>> >>> On Tue, Feb 14, 2017 at 2:22 PM, Osma Suominen <[email protected]> >>> wrote: >>> >>>> 14.02.2017, 15:15, anuj kumar kirjoitti: >>>> >>>>> I will do it. But I need to first get the simple test working in order to >>>>> move forward. I hope I someone here can help me. >>>>> >>>> >>>> Maybe you need to add an implementWith declaration to TextAssembler.java? >>>> >>>> >>>> -Osma >>>> >>>> -- >>>> Osma Suominen >>>> D.Sc. (Tech), Information Systems Specialist >>>> National Library of Finland >>>> P.O. Box 26 (Kaikukatu 4) >>>> 00014 HELSINGIN YLIOPISTO >>>> Tel. +358 50 3199529 >>>> [email protected] >>>> http://www.nationallibrary.fi >>>> >>> >>> >>> >>> -- >>> *Anuj Kumar* >>> >> >> >> > > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > [email protected] > http://www.nationallibrary.fi
