There’s something enticing when thinking of Lucene and Solr as independent
codebases. I’ve always thought of Lucene as core search (indexing,
analysis, tokenization, etc…) and Solr as a search experience. Lucene is
more a library (or set of libraries) used by applications providing search
Hi Christine!
> * After a while (perhaps with Lucene 10.0 or perhaps at some other natural
> point) we re-arrive at the "together or separate" question. If splitting
> worked well then Solr promotion to TLP could be a natural next step
My whole point is that I think the split is by large
Perhaps Christine! That's a nice idea!
On naming, it would have to be probably something snazzier than "Search" as
you get at. It would probably not be a good trademark, and would imply that
Lucene & Solr are the only things in ASF that could be "Search". Who knows,
one day Vespa or something
Thanks Christine!
I genuinely like this idea.
This actually gets us what we want without having to handle everything at
the same time, and also giving us time to see if the split is working or
not. This process also ensures that both, Lucene and Solr maintain the
symbiotic relationship at least
Perhaps a bit of a wildcard question or thought ... would any split out
top-level project necessarily be called "Apache Solr" or could the split out
project be called "Apache " with "Apache Solr" as its initial
sub-project and over time there may be other sub-projects added? No particular
name
Hello.
The discussion subject here has two parts i.e. "Lucene-Solr split" and "Solr
promoted to TLP" and I'd be curious what doing the former separately ahead of
the latter might look like and/or if consensus around that would be different?
Thinking aloud, as a hypothetical scenario like.
*
> Would this not be eased to some extent if the initial committer base of both
> the projects was the same?
"Who has commit karma to a project" is a separate question from "Who
will make commits in practice". Having Lucene committers retain their
status as Solr committers only helps if they're
> This might sound a bit harsh, but maybe Lucene devs helping with Solr has let
> Solr off the hook a bit too much? I actually like the fact that the split
> causes Solr to figure out it's own situation and focus on its problems.
Well said.
> Take our ongoing test flakiness woes and SolrCloud
> Would this not be eased to some extent if the initial committer base
> of both the projects was the same?
This is what I originally suggested; somebody (can't remember who) said
it should be voluntary. I'm really open to either option.
D.
Would this not be eased to some extent if the initial committer base
of both the projects was the same?
On Wed, May 13, 2020 at 10:44 PM Jason Gerlowski wrote:
>
> There's nothing wrong with a harsh "sink or swim" approach if the
> risks are bearable. If the worst case risk here is that we have
There's nothing wrong with a harsh "sink or swim" approach if the
risks are bearable. If the worst case risk here is that we have a few
rough releases as we smooth out the process, I'm all on board with
"sink or swim". But by the same token - "sink or swim" gets less
appealing as the risks
I choose to be more optimistic wrt «Solr committers» ability to integrate new
and changed Lucene APIs in Solr. You do not need to be a Lucene committer in
order to learn how to USE the Lucene APIs, and I believe there are several
«Solr committers» who already posess those skills and are pretty
Jason, I hear your arguments and think of them FOR a split
This might sound a bit harsh, but maybe Lucene devs helping with Solr has
let Solr off the hook a bit too much? I actually like the fact that the
split causes Solr to figure out it's own situation and focus on
its problems.
Regardless of
Wanted to add my two cents to the mix, though I'm a little late as the
vote has already progressed pretty far.
I'm against a split. From the points raised, I agree that Lucene has
much to gain. But Solr has a lot to lose.
Lucene devs would be freed from keeping Solr usage up to date. That's
a
It's hard to make a decision because it seems to have pros and cons.
Basically, I agree to separate but there are some questions.
So I don't not vote right now.
1) Release version
Currently, versions of Lucene and Solr are aligned, how will they be
managed in the future?
Other people took
I'll give a perspective that comes more from the user's / "market" point
of view as at OSC we onboard lots of new organizations into Solr.
- Most new users incorrectly think of Solr as an independent Apache
project, and many will have little knowledge or awareness of Lucene itself
until given the
One advantage I find with the way Elasticsearch and Lucene interact is that ES
doesn’t depend on the master branch. We upgrade our master branch frequently
to keep up to date with the latest release branch, and that lets us find
regressions or API problems pretty quickly, but it also insulates
My two cents:
As a Lucene heavy developer, I have several found maintaining Solr
dependencies while making large changes a bit cumbersome. I believe
Lucene and Solr should exist in a symbiotic relationship but not
tightly coupled with each other.
On Mon, May 11, 2020 at 7:22 PM Erik Hatcher
Without reading much or replying to any specific points made on this thread,
here's my raw thoughts on this age-old topic (finally coming out of my
cocoon after taking things in for a bit)
Solr is a search -server- with distributed capabilities, that leverages the
magic of Lucene
On 11/05/2020 15:09, Dawid Weiss wrote:
>> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
>> reasons why our company chose Solr over ElasticSearch.
>
> I fail to see anything supporting superior Lucene
> compatibility of one vs. another.
Yeah you're right. It's since
> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
> reasons why our company chose Solr over ElasticSearch.
There are a number of Elasticsearch developers working on Lucene core
(or maybe rather Lucene developers working at Elasticsearch?). And
there are Solr developers
On Sun, May 10, 2020 at 3:41 PM Bram Van Dam wrote:
>
> On 10/05/2020 08:20, David Smiley wrote:
> > An idea just occurred to me that may help make a split nicer for Solr
> > than it is today. Solr could use a branch of the Lucene project that's
> > used for the Solr project.
>
> Maybe I'm alone
On Mon, May 11, 2020 at 1:17 AM Shawn Heisey wrote:
> I think the presence of Solr in the codebase
> has diluted Lucene's releases, making them come far too quickly. I
> would bet that without Solr, Lucene would probably be somewhere in 6.x,
> not 8.x.
>
Actually I think that Lucene would be
On 5/10/2020 3:41 PM, Michael McCandless wrote:
I think the costs (I agree: they are high) are a one-time thing, while
the benefits are long term, and accrue/multiply with time. We should
make decisions like this with the long-term benefits in mind.
I expect Lucene and Solr to have long
On Tue, May 5, 2020 at 4:59 PM Tomás Fernández Löbbe
wrote:
On Tue, May 5, 2020 at 12:37 PM Dawid Weiss wrote:
>
>> > I read “promotion to TLP” as if this was some achievement that needs to
>> be celebrated now.
>>
>> I honestly believe it is an achievement for a project to receive
>> top-level
On Wed, May 6, 2020 at 4:24 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:
Offtopic: Gus, I'm looking at Vespa core for Solr someday
>
+1, Vespa looks really fascinating! Plus it is released under ASL 2 as
well. And the world clearly needs more open-source search engines.
Mike
On Thu, May 7, 2020 at 1:07 PM Bram Van Dam wrote:
> The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
>
> Sounds like there are quite a few tasks to complete to get this done.
> Splitting the build and codebase. Presumably a bunch of
Better stick to Lucene snapshot versions. Commit the Lucene change first, then
solr. If the Lucene change is not mature enough to commit to Lucene, it is
probably not mature enough for Solr either. Avoid hacks or forks, spend some
longer time to get it right.
If things get removed from Lucene
On Sun, May 10, 2020 at 11:55 AM Mike Drob wrote:
> Solr maintaining a fork of Lucene sounds like exactly the situation that
> let to the original merge, where there are two sets of divergent development
>
Exactly
Solr maintaining a fork of Lucene sounds like exactly the situation that
let to the original merge, where there are two sets of divergent development
On Sun, May 10, 2020 at 1:20 AM David Smiley
wrote:
> I agree with Doug that the burden of proof is on keeping the codebases
> together instead
On 10/05/2020 08:20, David Smiley wrote:
> An idea just occurred to me that may help make a split nicer for Solr
> than it is today. Solr could use a branch of the Lucene project that's
> used for the Solr project.
Maybe I'm alone in this, but (better) Lucene compatibility is one of the
reasons
On Sun, May 10, 2020 at 8:20 AM David Smiley
wrote:
> I wonder if ElasticSearch tries to do this on their side too; does it?
>
Yes, Elasticsearch regularly upgrades to new snapshots of Lucene[1][2],
often multiple times per minor version. It helps give Lucene more test and
performance coverage,
I agree with Doug that the burden of proof is on keeping the codebases
together instead of the reverse. I liken it to a marriage; it has to work
well for both parties.It seems to be mostly beneficial for Solr but
much less so for Lucene.
BTW an even better example than the huge FuzzyQuery
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
Sounds like there are quite a few tasks to complete to get this done.
Splitting the build and codebase. Presumably a bunch of administration
within Apache/the PMC. Setting up
There are definitely pros and cons of splitting vs. being a single project.
The bigger pains for me until now have been the following ones:
Digging Solr failures
The theory is that Solr failures can help find Lucene bugs that Lucene bugs
wouldn't catch, and while this occurred a couple times, I
I personally feel that all the current issues can be solved by actually
working on those problems instead of splitting and calling it a day. I
don't really think that splitting provides any benefit to the Solr side of
things at all, however I also completely agree that it would make things
easier
Offtopic: Gus, I'm looking at Vespa core for Solr someday. I used to work
at Yahoo! and I was a strong advocate of replacing Vespa with Solr for our
team. A quick benchmark back then proved to me that Vespa was faster than
Solr for our usecase and I rested my case. Both Lucene and Vespa have
>
> IMO, if we need to say “we can’t release X because it breaks Y”, or “we
> need to release X to be able to release Y”, the projects are not really
> independent, and “the PMCs will overlap” won’t take us very far.
>
This. I don't think the two really can be separated. Any separation will
I can speak from experience that working with a snapshot is much
cleaner than working with submodules. We do this in elasticsearch for
a very long time now and our process here works just fine. It has a
bunch of advantages over a direct / source dependency like solr has
right now. I recall that
Except the logistics of enacting the split, I see no valid reason of
keeping the projects together. Git submodule is the magic that we have to
ease any potential discomfort. However, the effort needed to split feels
absolutely massive, so I'm not sure if it is worth the hassle.
On Wed, 6 May,
> If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene
> with all it's modules), Solr and PyLucene. That's what I mean.
Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
1. Lucene Core (Lucene with all it's modules)
2. Solr
3. PyLucene
The thing is:
On Tue, May 5, 2020 at 12:37 PM Dawid Weiss wrote:
> > I read “promotion to TLP” as if this was some achievement that needs to
> be celebrated now.
>
> I honestly believe it is an achievement for a project to receive
> top-level status. It's a sign of having a community of users,
> committers
> I read “promotion to TLP” as if this was some achievement that needs to be
> celebrated now.
I honestly believe it is an achievement for a project to receive
top-level status. It's a sign of having a community of users,
committers and processes mature enough to empower its further
development.
> Question: When Lucene no longer has the Solr test suite to help catch bugs,
> how long time would it take from a Lucene commit, before Solr/ES Jenkins
> instances would have had time to produce a build and run tests? Would it be
> possible to setup a trigger in Solr Jenkins?
It depends how
I don’t agree with the argument “Solr outgrew being a subproject of
Lucene”. I read “promotion to TLP” as if this was some achievement that
needs to be celebrated now. Solr didn’t become a TLP years ago because the
decision then was to merge with Lucene development, thinking they would
progress
On Tue, May 5, 2020 at 11:41 AM Jan Høydahl wrote:
As it is today, deveopers have had to do necessary Solr changes at the same
> time when doing changes in Lucene. This is not really fair to the (mainly)
> Lucene developers. It is not fair to Solr either, as such work might be
> done in a hasty
Thanks for bringing it up Dawid.
I’ve asked myself the same question several times over the last couple of
years, and have kind of been waiting for someone to make the proposal :)
In my head, Solr has out-grown being a sub project of Lucene, like hadoop,
mahout, nutch and tika before it.
The
Personally I feel the burden of proof should not be why they should be
split up, but the other way - "what arguments can be made for keeping them
together?"
I would be curious if people can make the argument for keeping them
together...
-Doug
On Tue, May 5, 2020 at 10:29 AM Michael McCandless <
On Mon, May 4, 2020 at 2:13 PM Dawid Weiss wrote:
> This sounds like a decision has already been made.
>
> No. I plan to send a VOTE thread nonetheless. A vote thread is just
> that -- a vote. If majority decides both projects
> should stay together it's still a decision. A discussion without
On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh wrote:
I think separating the git repository and even the release schedules could
> be done under the same TLP.
>
It would solve most of the technical issues reflected in the first mail and
> there would be more time and data to
>
Hmm that is
I think separating the git repository and even the release schedules could
be done under the same TLP.
It would solve most of the technical issues reflected in the first mail and
there would be more time and data to see if creating Apache Solr again is
something the PMC would want to do
gp
On
Perhaps I didn't clarify this so far: my own interests (personal and
business) are shared equally between Solr and Lucene (we have products
that have plain Lucene underneath and we maintain products and systems
that use Solr). So I am going to have a foot in both worlds no matter
the outcome. I
> This sounds like a decision has already been made.
No. I plan to send a VOTE thread nonetheless. A vote thread is just
that -- a vote. If majority decides both projects
should stay together it's still a decision. A discussion without any
resolution is going to dissolve over time into no
This is an interesting approach, Michael. I took it a bit further by
excluding all authors with only a single commit[1], since I think GitHub
PRs tend to highlight that kind of contribution more. Since 2012 I found 24
lucene-only, 31 solr-only, and 77 (about 58%) contributing to both. Since
2018,
Interesting data Michael. I am not sure though that the shared commits tell
us that there are people that contribute to both projects. Eventually, an
API change/update in Lucene will require a change in Solr (but not vice
versa). Those commits will still occur in both projects, only on the Solr
I always like to look at data when making a big decision, so I
gathered some statistics about authors and commits to git over the
history of the project. I wanted to see what these statistics could
tell us about the degree of overlap between the two projects and
whether it has changed over time.
Dear Lucene and Solr developers!
A few days ago, I initiated a discussion among PMC members about
potential pros and cons of splitting the project into separate Lucene
and Solr entities by promoting Solr to its own top-level Apache
project (TLP). Let me share with you the motivation for such an
57 matches
Mail list logo