Hi Anuj!

I have nothing against modularity in general. However, I cannot see how your proposal could work in practice for the Fuseki build, due to the reasons I mentioned in my previous message (and Adam seemed to concur).

In any case, I'll see what I can do to get the Lucene upgrade moving again. If all current Jena modules (ie jena-text and jena-spatial) were upgraded to Lucene 6.4.1, then you could just add your ES classes to jena-text, right? I think that would be better for everyone than having to maintain your own separate module.

-Osma

01.03.2017, 16:59, anuj kumar kirjoitti:
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.

I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.

Cheers!
Anuj Kumar


On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:

Osma--

The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.

You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".

This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)

Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.

Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.

---
A. Soroka
The University of Virginia Library

[1] http://openjdk.java.net/projects/jigsaw/

On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

Hi Anuj!

Thanks for the clarification.

However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.

Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?

-Osma

[1] https://www.elastic.co/blog/to-shade-or-not-to-shade

01.03.2017, 11:03, anuj kumar kirjoitti:
Hi Osma,

I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)

For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"

I actually do not understand what you mean by mixing modules. I assume
you
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the answer
is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.

*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
text based indexing and searching via Jean. What this means is that we
will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.

Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom
of
jena-text-es module here to see how it can be done :
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml


Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
osma.suomi...@helsinki.fi>
wrote:

Hi Anuj,

I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
willing to
put in the effort to keep the other modules up to date with newer
Lucene
versions. Lucene upgrade requirements are well documented, the only
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.

A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding
of
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.

So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
on Lucene 6.4.1) without any compatibility issues?

-Osma




01.03.2017, 00:47, anuj kumar kirjoitti:

Hi,

My 2 Cents :

The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take if
we
club them all together. If they stay together and if in the near
future I
want to upgrade ES to another version, I also need to again upgrade
Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.

If they are developed as separate modules, they can evolve
independently
of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on
Solr
Implementation.

We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene out
of
Jena Text.

Again this is just a suggestion based on my limited industry
experience.

Thanks,
Anuj Kumar



On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
osma.suomi...@helsinki.fi

wrote:

28.02.2017, 17:12, A. Soroka kirjoitti:

https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org
%3E
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?


I certainly wouldn't object to that, but somebody has to volunteer
to do
the actual work!

I don't use the Solr component now, but I could easily see so
doing...

that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
point.
:)


Last time I tried it (it was a while ago) I couldn't figure out how
to
get
it running... If you could just try that with some toy data, then
your
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
instructions
are pretty vague unfortunately.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi






--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi






--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi






--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

Reply via email to