Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-16 Thread Dennis Gove
There’s something enticing when thinking of Lucene and Solr as independent
codebases. I’ve always thought of Lucene as core search (indexing,
analysis, tokenization, etc…) and Solr as a search experience. Lucene is
more a library (or set of libraries) used by applications providing search
experiences. Solr is just one of those applications - it provides the
experience of search as a service, and feels focused on making search
approachable and palatable to search novices.

The work I put into streaming expressions was born out of a desire to more
widely expose search functionality. The streaming API drew me in because it
exposed a new way to interact with core search functionality, and
expressions came out of wanting to make it all easy to use for end users. I
didn’t, and don’t, give a whole lot of thought to the internals of Lucene.
I like a good user experience and I see Solr as an application trying to
provide that.

I do, however, have concerns about the long-term impact of a split. Lucene
is able to set a very explicit N-1 backward compatibility policy because it
can have less immediate concern for the downstream user. And this is not to
denigrate Lucene at all - in fact I agree with that policy for core search
functionality. If and when incompatible changes lead to significant gains
they can and are made. Inefficient older ways are not brought forward
further than necessary. Solr has to be concerned with their end users, who
may be relative search novices, when considering backward incompatible
changes. More thought is given to the experience and impact of upgrades.
How are those issues dealt with across replicas and shards? What will
happen in a cloud made up of lucene indexes of varying versions? My concern
revolves around what happens if (when) Solr falls behind Lucene. Will it
ever be able to catch up? There’s an argument to be made that Solr being a
consistent N versions behind Lucene has some value to the Solr project.
But, what happens if Solr gets a slower release cadence? Will it fall
further and further behind? Will its inability to use the latest and
greatest in Lucene be the impetus for a community splitting fork? Will a
new search application come along without the legacy concerns of Solr and
become a more enticing option? Perhaps, to all of that. I can’t really say.

What I can say is I don’t think it’s appropriate to stifle the growth, or
in this case the change, of a community because of fear of the unknown.
Yes, I am worried that a project split will lead to trouble and issues for
Solr, and some of those fears are born out of how I know my company uses
Solr. But I also think a lot of good could come out of a split. It’d be
exciting to see how a Lucene community advances the state of the art of
core search, and how a Solr community provides a clean and easily
digestible search experience to end users. Will Lucene become more
embeddable? Will Solr become more plug-n-play?

I’m a fan of Christine’s suggestion of first executing a code and release
split and later, after seeing the impact of such a split, decide on a
project split. Full disclosure, Christine and I work at the same company. I
think independent codebases will in the end benefit both, though I do agree
there is more inherent and immediate risk to Solr.
- Dennis

On Fri, May 15, 2020 at 4:03 AM Dawid Weiss  wrote:

> Hi Christine!
>
> > * After a while (perhaps with Lucene 10.0 or perhaps at some other
> natural point) we re-arrive at the "together or separate" question. If
> splitting worked well then Solr promotion to TLP could be a natural next
> step
>
> My whole point is that I think the split is by large already there:
> the mailing lists, the issues, the codebase (git constitutes common
> storage but the build system and nearly anything else pretty much
> independent with Solr consuming Lucene artifacts). I also believe the
> will to separate the projects has been with (some of) us for a long
> time and postponing this decision won't change anything.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-15 Thread Dawid Weiss
Hi Christine!

> * After a while (perhaps with Lucene 10.0 or perhaps at some other natural 
> point) we re-arrive at the "together or separate" question. If splitting 
> worked well then Solr promotion to TLP could be a natural next step

My whole point is that I think the split is by large already there:
the mailing lists, the issues, the codebase (git constitutes common
storage but the build system and nearly anything else pretty much
independent with Solr consuming Lucene artifacts). I also believe the
will to separate the projects has been with (some of) us for a long
time and postponing this decision won't change anything.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Doug Turnbull
Perhaps Christine! That's a nice idea!

On naming, it would have to be probably something snazzier than "Search" as
you get at. It would probably not be a good trademark, and would imply that
Lucene & Solr are the only things in ASF that could be "Search". Who knows,
one day Vespa or something else search related could become an ASF project
with different governance & committers.

-Doug

On Thu, May 14, 2020 at 2:16 PM Anshum Gupta  wrote:

> Thanks Christine!
>
> I genuinely like this idea.
>
> This actually gets us what we want without having to handle everything at
> the same time, and also giving us time to see if the split is working or
> not. This process also ensures that both, Lucene and Solr maintain the
> symbiotic relationship at least at the beginning.
>
>
> On Thu, May 14, 2020 at 9:35 AM Christine Poerschke 
> wrote:
>
>> Hello.
>>
>> The discussion subject here has two parts i.e. "Lucene-Solr split" and
>> "Solr promoted to TLP" and I'd be curious what doing the former separately
>> ahead of the latter might look like and/or if consensus around that would
>> be different?
>>
>> Thinking aloud, as a hypothetical scenario like.
>> * For the 8.x series Lucene and Solr release together as before.
>> * With 9.0 the releases begin to split: Lucene has 9.0 release and Solr
>> has a release that uses Lucene 9.0 (and which may be called Solr 9.0 or
>> which may be called something else like Solr 2021.0 or something). Both
>> releases happen at the same time and it being a 8-to-9 major release might
>> help with user communications clarity.
>> * Lucene and Solr now live in separate repos, development progresses,
>> there's releases for one or other or both. We adapt to the split approach
>> and still being one project and one dev mailing list and community helps,
>> hopefully, with that adjustment.
>> * After a while (perhaps with Lucene 10.0 or perhaps at some other
>> natural point) we re-arrive at the "together or separate" question. If
>> splitting worked well then Solr promotion to TLP could be a natural next
>> step, or if getting back together might be better for both parties then
>> from the next major release things would be combined again.
>>
>> Christine
>>
>> On 2020/05/04 09:10:35, Dawid Weiss  wrote:
>> > Dear Lucene and Solr developers!
>> >
>> > A few days ago, I initiated a discussion among PMC members about
>> > potential pros and cons of splitting the project into separate Lucene
>> > and Solr entities by promoting Solr to its own top-level Apache
>> > project (TLP). Let me share with you the motivation for such an action
>> > and some follow-up thoughts I heard from other PMC members so far.
>> >
>> > Please read this e-mail carefully. Both the PMC and I look forward to
>> > hearing your opinion. This is a DISCUSS thread and it will be followed
>> > next week by a VOTE thread. This is our shared project and we should
>> > all shape its future responsibly.
>> >
>> > The big question is this: “Is this the right time to split Solr and
>> > Lucene into two independent projects?”.
>> >
>> > Here are several technical considerations that drove me to ask the
>> > question above (in no order of priorities):
>> >
>> > 1) Precommit/ test times. These are crazy high. If we split into two
>> > projects we can pretty much cut all of Lucene testing out of Solr (and
>> > likewise), making development a bit more fun again.
>> >
>> > 2) Build system itself and source release packaging. The current
>> > combined codebase is a *beast* to maintain. Working with gradle on
>> > both projects at once made me realise how little the two have in
>> > common. The code layout, the dependencies, even the workflow of people
>> >
>> > working on these projects... The build (both ant and gradle) is full
>> > of Solr and Lucene-specific exceptions and hooks that could be more
>> > elegantly solved if moved to each project independently.
>> >
>> > 3) Packaging. There is no single source distribution package for
>> > Solr+Lucene. They are already "independent" there. Why should Lucene
>> > and Solr always be released at the same pace? Does it always make
>> > sense?
>> >
>> > 4) Solr is essentially taking in Lucene and its dependencies as a
>> > whole (so is Elasticsearch and many other projects). In my opinion
>> > this makes Lucene eligible for refactoring and
>> >
>> > maintenance as a separate component. The learning curve for people
>> > coming to each project separately is going to be gentler than trying
>> > to dive into the combined codebase.
>> >
>> > 5) Mailing lists, build servers. Mailing lists for users are already
>> > separated. I think this is yet another indication that Solr is
>> > something more than a component within Lucene. It is perceived as an
>> > independent entity and used as an independent product. I would really
>> > like to have separate mailing lists for these two projects (this
>> > includes build and test results) as it would make life easier: if your
>> > focus is more on Lucene 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Anshum Gupta
Thanks Christine!

I genuinely like this idea.

This actually gets us what we want without having to handle everything at
the same time, and also giving us time to see if the split is working or
not. This process also ensures that both, Lucene and Solr maintain the
symbiotic relationship at least at the beginning.


On Thu, May 14, 2020 at 9:35 AM Christine Poerschke 
wrote:

> Hello.
>
> The discussion subject here has two parts i.e. "Lucene-Solr split" and
> "Solr promoted to TLP" and I'd be curious what doing the former separately
> ahead of the latter might look like and/or if consensus around that would
> be different?
>
> Thinking aloud, as a hypothetical scenario like.
> * For the 8.x series Lucene and Solr release together as before.
> * With 9.0 the releases begin to split: Lucene has 9.0 release and Solr
> has a release that uses Lucene 9.0 (and which may be called Solr 9.0 or
> which may be called something else like Solr 2021.0 or something). Both
> releases happen at the same time and it being a 8-to-9 major release might
> help with user communications clarity.
> * Lucene and Solr now live in separate repos, development progresses,
> there's releases for one or other or both. We adapt to the split approach
> and still being one project and one dev mailing list and community helps,
> hopefully, with that adjustment.
> * After a while (perhaps with Lucene 10.0 or perhaps at some other natural
> point) we re-arrive at the "together or separate" question. If splitting
> worked well then Solr promotion to TLP could be a natural next step, or if
> getting back together might be better for both parties then from the next
> major release things would be combined again.
>
> Christine
>
> On 2020/05/04 09:10:35, Dawid Weiss  wrote:
> > Dear Lucene and Solr developers!
> >
> > A few days ago, I initiated a discussion among PMC members about
> > potential pros and cons of splitting the project into separate Lucene
> > and Solr entities by promoting Solr to its own top-level Apache
> > project (TLP). Let me share with you the motivation for such an action
> > and some follow-up thoughts I heard from other PMC members so far.
> >
> > Please read this e-mail carefully. Both the PMC and I look forward to
> > hearing your opinion. This is a DISCUSS thread and it will be followed
> > next week by a VOTE thread. This is our shared project and we should
> > all shape its future responsibly.
> >
> > The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
> >
> > Here are several technical considerations that drove me to ask the
> > question above (in no order of priorities):
> >
> > 1) Precommit/ test times. These are crazy high. If we split into two
> > projects we can pretty much cut all of Lucene testing out of Solr (and
> > likewise), making development a bit more fun again.
> >
> > 2) Build system itself and source release packaging. The current
> > combined codebase is a *beast* to maintain. Working with gradle on
> > both projects at once made me realise how little the two have in
> > common. The code layout, the dependencies, even the workflow of people
> >
> > working on these projects... The build (both ant and gradle) is full
> > of Solr and Lucene-specific exceptions and hooks that could be more
> > elegantly solved if moved to each project independently.
> >
> > 3) Packaging. There is no single source distribution package for
> > Solr+Lucene. They are already "independent" there. Why should Lucene
> > and Solr always be released at the same pace? Does it always make
> > sense?
> >
> > 4) Solr is essentially taking in Lucene and its dependencies as a
> > whole (so is Elasticsearch and many other projects). In my opinion
> > this makes Lucene eligible for refactoring and
> >
> > maintenance as a separate component. The learning curve for people
> > coming to each project separately is going to be gentler than trying
> > to dive into the combined codebase.
> >
> > 5) Mailing lists, build servers. Mailing lists for users are already
> > separated. I think this is yet another indication that Solr is
> > something more than a component within Lucene. It is perceived as an
> > independent entity and used as an independent product. I would really
> > like to have separate mailing lists for these two projects (this
> > includes build and test results) as it would make life easier: if your
> > focus is more on Lucene (or Solr), you would only need to track half
> > of the current traffic.
> >
> >
> > As I already mentioned, the discussion among PMC members highlighted
> > some initial concerns and reasons why the project should perhaps
> > remain glued together. These are outlined below with some of the
> > counter-arguments presented under each concern to avoid repetition of
> > the same content from the PMC mailing list (they’re copied from the
> > private discussion list).
> >
> > 1) Both projects may gradually split their ways after the separation
> 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Christine Poerschke
Perhaps a bit of a wildcard question or thought ... would any split out 
top-level project necessarily be called "Apache Solr" or could the split out 
project be called "Apache " with "Apache Solr" as its initial 
sub-project and over time there may be other sub-projects added? No particular 
name in mind, "Apache Search" might be too obvious, just wondering in principle.

Christine

On 2020/05/04 09:10:35, Dawid Weiss  wrote: 
> Dear Lucene and Solr developers!
> 
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
> 
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
> 
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
> 
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
> 
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
> 
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
> 
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
> 
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
> 
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
> 
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
> 
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
> 
> 
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
> 
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
> 
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
> 
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr version uses
> which version of Lucene.
> 
> This can still be done on Solr side (it is Solr’s decision to adapt
> any versioning scheme the project feels comfortable with). I
> personally (DW) think this kind of versioning is actually more
> confusing than helpful; Solr should have its own cadence of releases
> driven by features, not sub-component changes. If the “backwards
> compatibility” is a factor then a solution might be to sync on major
> version releases only (e.g., this is how Elasticsearch is handling
> this).
> 
> 3) Solr tests are the first “battlefield” test zone for Lucene changes
> - if it becomes TLP this part will be gone.
> 
> Yes, true. But realistically Solr will have to adopt some kind of
> snapshot-based dependency on Lucene anyway (whether as a git submodule
> or a maven snapshot dependency). So if there are bugs in Lucene they
> will still be detected by Solr tests (and fairly 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Christine Poerschke
Hello.

The discussion subject here has two parts i.e. "Lucene-Solr split" and "Solr 
promoted to TLP" and I'd be curious what doing the former separately ahead of 
the latter might look like and/or if consensus around that would be different?

Thinking aloud, as a hypothetical scenario like.
* For the 8.x series Lucene and Solr release together as before.
* With 9.0 the releases begin to split: Lucene has 9.0 release and Solr has a 
release that uses Lucene 9.0 (and which may be called Solr 9.0 or which may be 
called something else like Solr 2021.0 or something). Both releases happen at 
the same time and it being a 8-to-9 major release might help with user 
communications clarity.
* Lucene and Solr now live in separate repos, development progresses, there's 
releases for one or other or both. We adapt to the split approach and still 
being one project and one dev mailing list and community helps, hopefully, with 
that adjustment.
* After a while (perhaps with Lucene 10.0 or perhaps at some other natural 
point) we re-arrive at the "together or separate" question. If splitting worked 
well then Solr promotion to TLP could be a natural next step, or if getting 
back together might be better for both parties then from the next major release 
things would be combined again.

Christine

On 2020/05/04 09:10:35, Dawid Weiss  wrote: 
> Dear Lucene and Solr developers!
> 
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
> 
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
> 
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
> 
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
> 
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
> 
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
> 
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
> 
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
> 
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
> 
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
> 
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
> 
> 
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
> 
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
> 
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
> 
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
> Would this not be eased to some extent if the initial committer base of both 
> the projects was the same?

"Who has commit karma to a project" is a separate question from "Who
will make commits in practice".  Having Lucene committers retain their
status as Solr committers only helps if they're willing and interested
in keeping Solr up to date.  From discussion on this thread so far,
I'm not sure how much of that interest exists.  After all, avoiding
this solr-update-burden was one of the arguments cited in favor of a
split.

> Contrary to Jason I don't think keeping Solr and Lucene code together helps 
> anybody in tackling those issues now or in the future.
That's not the point I was making.  I wasn't saying that the split (or
lack thereof) affects our ability to address test-flakiness (etc.).  I
was citing test-flakiness as an example of how bad us Solr folks have
been historically at prioritizing this sort of work that's crucial for
project health but not tied to a specific feature.  I brought this
historical example up as a parallel or a prediction to how we might do
with a similar task: managing to stay up to date on Lucene.  My whole
point was: "We historically don't do well at getting this sort of work
done; therefore I expect we're going to have some level of lag behind
Lucene"

On Wed, May 13, 2020 at 2:11 PM Dawid Weiss  wrote:
>
> > This might sound a bit harsh, but maybe Lucene devs helping with Solr has 
> > let Solr off the hook a bit too much? I actually like the fact that the 
> > split causes Solr to figure out it's own situation and focus on its 
> > problems.
>
> Well said.
>
> > Take our ongoing test flakiness woes and SolrCloud instability issues as 
> > examples: both are serious threats to the project, both have been around 
> > for years, and both are here to stay for the foreseeable future.
>
> Contrary to Jason I don't think keeping Solr and Lucene code together
> helps anybody in tackling those issues now or in the future. The first
> thing Mark (Miller) did when he started cleaning up the codebase for
> gradle was to *disable* nearly all randomizations and fix certain
> parameters to bring back stability and speed up Solr tests. I bet it would be
> a tad easier if he only had Solr (or Lucene) side to take care of (rather than
> both Lucene AND Solr).
>
> What is good for Lucene may not be as good for Solr. Maybe removing
> randomizations that
> currently happen in LuceneTestCase will calm down tests? Who knows. 
> Sincerely, I
> think a split project may bring a clean slate for more drastic
> refactorings and cleanups. A combined
> codebase keeps the status quo we've been in for years.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Dawid Weiss
> This might sound a bit harsh, but maybe Lucene devs helping with Solr has let 
> Solr off the hook a bit too much? I actually like the fact that the split 
> causes Solr to figure out it's own situation and focus on its problems.

Well said.

> Take our ongoing test flakiness woes and SolrCloud instability issues as 
> examples: both are serious threats to the project, both have been around for 
> years, and both are here to stay for the foreseeable future.

Contrary to Jason I don't think keeping Solr and Lucene code together
helps anybody in tackling those issues now or in the future. The first
thing Mark (Miller) did when he started cleaning up the codebase for
gradle was to *disable* nearly all randomizations and fix certain
parameters to bring back stability and speed up Solr tests. I bet it would be
a tad easier if he only had Solr (or Lucene) side to take care of (rather than
both Lucene AND Solr).

What is good for Lucene may not be as good for Solr. Maybe removing
randomizations that
currently happen in LuceneTestCase will calm down tests? Who knows. Sincerely, I
think a split project may bring a clean slate for more drastic
refactorings and cleanups. A combined
codebase keeps the status quo we've been in for years.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Dawid Weiss
> Would this not be eased to some extent if the initial committer base
> of both the projects was the same?

This is what I originally suggested; somebody (can't remember who) said
it should be voluntary. I'm really open to either option.

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Atri Sharma
Would this not be eased to some extent if the initial committer base
of both the projects was the same?

On Wed, May 13, 2020 at 10:44 PM Jason Gerlowski  wrote:
>
> There's nothing wrong with a harsh "sink or swim" approach if the
> risks are bearable.  If the worst case risk here is that we have a few
> rough releases as we smooth out the process, I'm all on board with
> "sink or swim".  But by the same token - "sink or swim" gets less
> appealing as the risks increase.   No sane person would toss their PFD
> after a shipwreck because they always meant to learn to backstroke.
> So maybe we just disagree on what the worst case harm to Solr looks
> like.  I see the harm being pretty serious: if Solr stagnates its
> Lucene version relative to other offerings users could go elsewhere
> and the project would lose out on adoption and community.  A Very Bad
> Thing.  But if you don't see this as even a remote possibility, well
> then "sink or swim" makes sense.
>
> > I'd be OK with a stable, robust Solr that got 1-2 major versions behind 
> > Lucene, but was rock-solid with a lower barrier to entry...
>
> If that's an option, I might be too.  But I'm not sure how a
> Lucene-Solr split (or an older Lucene version) does anything to make
> Solr more solid, lower its barrier to entry, etc.  Anecdotally, Solr
> bugs rooted in Lucene seem the minority by far.  And Solr committers
> can put effort into stability/barrier-to-entry as easily now as they
> can in a post-split world.  Is there some connection between the split
> and the those -ilities that I'm missing?
>
> > I choose to be more optimistic wrt «Solr committers» ability to integrate 
> > new and changed Lucene APIs in Solr
> I agree that Solr committers _can_ do this work, and that there are
> some awesome committers who straddle the fence and know Lucene very
> well.  I wasn't trying to impugn anyone's efforts, interest or
> expertise.  My point was just that at the end of the day a split
> leaves fewer people around Solr with knowledge of the Lucene APIs and
> their perf implications.  And a split is going to burden those
> remaining people heavily until the roster of Lucene-literate Solr
> committers re-populates.
>
> On Wed, May 13, 2020 at 10:29 AM Jan Høydahl  wrote:
> >
> > I choose to be more optimistic wrt «Solr committers» ability to integrate 
> > new and changed Lucene APIs in Solr. You do not need to be a Lucene 
> > committer in order to learn how to USE the Lucene APIs, and I believe there 
> > are several «Solr committers» who already posess those skills and are 
> > pretty deep in Lucene already. Hopefully they are interested in doing 
> > lucene upgrades for Solr, even if that some times includes implementing 
> > support for a new fieldType (points vs trie), getting rid of 
> > index-time-boost features etc. I may even attempt some of those tasks 
> > myself for the areas of Lucene API I am comfortable with.
> >
> > Jan
> >
> > 13. mai 2020 kl. 16:24 skrev Doug Turnbull 
> > :
> >
> > Jason, I hear your arguments and think of them FOR a split
> >
> > This might sound a bit harsh, but maybe Lucene devs helping with Solr has 
> > let Solr off the hook a bit too much? I actually like the fact that the 
> > split causes Solr to figure out it's own situation and focus on its 
> > problems.
> >
> > Regardless of the split or not, Solr is going to sink or swim based on the 
> > efforts of Solr committers, not Lucene committers. I don't think Lucene 
> > committers are going to be the ones to really address the systemic issues 
> > with Solr. If anything, I imagine they are "let me fix this so the code 
> > compiles" level of maintenance.
> >
> > "Falling behind Lucene" is counterbalanced to me with "Should Solr be on 
> > cutting-edge Lucene?"
> >
> > I'd be OK with a stable, robust Solr that got 1-2 major versions behind 
> > Lucene, but was rock-solid with a lower barrier to entry...
> >
> > On Wed, May 13, 2020 at 10:07 AM Jason Gerlowski  
> > wrote:
> >>
> >> Wanted to add my two cents to the mix, though I'm a little late as the
> >> vote has already progressed pretty far.
> >>
> >> I'm against a split.  From the points raised, I agree that Lucene has
> >> much to gain.  But Solr has a lot to lose.
> >>
> >> Lucene devs would be freed from keeping Solr usage up to date.  That's
> >> a great improvement for Lucene itself.  But that burden doesn't
> >> disappear - it's just being moved to a different (smaller) group of
> >> committers - who by definition don't know Lucene as well, and are less
> >> suited to the task.  (Lucene devs still might help post-split, but
> >> given that avoiding this burden is one of the arguments made above for
> >> a split, it seems unwise to assume how much this generosity will
> >> continue.)
> >>
> >> One likely result is that Solr will fall behind Lucene. Possibly
> >> permanently behind.  Lucene folks are doing great work to improve
> >> perf, add features etc. so falling behind is a Very Bad Thing.  To
> >> 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
There's nothing wrong with a harsh "sink or swim" approach if the
risks are bearable.  If the worst case risk here is that we have a few
rough releases as we smooth out the process, I'm all on board with
"sink or swim".  But by the same token - "sink or swim" gets less
appealing as the risks increase.   No sane person would toss their PFD
after a shipwreck because they always meant to learn to backstroke.
So maybe we just disagree on what the worst case harm to Solr looks
like.  I see the harm being pretty serious: if Solr stagnates its
Lucene version relative to other offerings users could go elsewhere
and the project would lose out on adoption and community.  A Very Bad
Thing.  But if you don't see this as even a remote possibility, well
then "sink or swim" makes sense.

> I'd be OK with a stable, robust Solr that got 1-2 major versions behind 
> Lucene, but was rock-solid with a lower barrier to entry...

If that's an option, I might be too.  But I'm not sure how a
Lucene-Solr split (or an older Lucene version) does anything to make
Solr more solid, lower its barrier to entry, etc.  Anecdotally, Solr
bugs rooted in Lucene seem the minority by far.  And Solr committers
can put effort into stability/barrier-to-entry as easily now as they
can in a post-split world.  Is there some connection between the split
and the those -ilities that I'm missing?

> I choose to be more optimistic wrt «Solr committers» ability to integrate new 
> and changed Lucene APIs in Solr
I agree that Solr committers _can_ do this work, and that there are
some awesome committers who straddle the fence and know Lucene very
well.  I wasn't trying to impugn anyone's efforts, interest or
expertise.  My point was just that at the end of the day a split
leaves fewer people around Solr with knowledge of the Lucene APIs and
their perf implications.  And a split is going to burden those
remaining people heavily until the roster of Lucene-literate Solr
committers re-populates.

On Wed, May 13, 2020 at 10:29 AM Jan Høydahl  wrote:
>
> I choose to be more optimistic wrt «Solr committers» ability to integrate new 
> and changed Lucene APIs in Solr. You do not need to be a Lucene committer in 
> order to learn how to USE the Lucene APIs, and I believe there are several 
> «Solr committers» who already posess those skills and are pretty deep in 
> Lucene already. Hopefully they are interested in doing lucene upgrades for 
> Solr, even if that some times includes implementing support for a new 
> fieldType (points vs trie), getting rid of index-time-boost features etc. I 
> may even attempt some of those tasks myself for the areas of Lucene API I am 
> comfortable with.
>
> Jan
>
> 13. mai 2020 kl. 16:24 skrev Doug Turnbull 
> :
>
> Jason, I hear your arguments and think of them FOR a split
>
> This might sound a bit harsh, but maybe Lucene devs helping with Solr has let 
> Solr off the hook a bit too much? I actually like the fact that the split 
> causes Solr to figure out it's own situation and focus on its problems.
>
> Regardless of the split or not, Solr is going to sink or swim based on the 
> efforts of Solr committers, not Lucene committers. I don't think Lucene 
> committers are going to be the ones to really address the systemic issues 
> with Solr. If anything, I imagine they are "let me fix this so the code 
> compiles" level of maintenance.
>
> "Falling behind Lucene" is counterbalanced to me with "Should Solr be on 
> cutting-edge Lucene?"
>
> I'd be OK with a stable, robust Solr that got 1-2 major versions behind 
> Lucene, but was rock-solid with a lower barrier to entry...
>
> On Wed, May 13, 2020 at 10:07 AM Jason Gerlowski  
> wrote:
>>
>> Wanted to add my two cents to the mix, though I'm a little late as the
>> vote has already progressed pretty far.
>>
>> I'm against a split.  From the points raised, I agree that Lucene has
>> much to gain.  But Solr has a lot to lose.
>>
>> Lucene devs would be freed from keeping Solr usage up to date.  That's
>> a great improvement for Lucene itself.  But that burden doesn't
>> disappear - it's just being moved to a different (smaller) group of
>> committers - who by definition don't know Lucene as well, and are less
>> suited to the task.  (Lucene devs still might help post-split, but
>> given that avoiding this burden is one of the arguments made above for
>> a split, it seems unwise to assume how much this generosity will
>> continue.)
>>
>> One likely result is that Solr will fall behind Lucene. Possibly
>> permanently behind.  Lucene folks are doing great work to improve
>> perf, add features etc. so falling behind is a Very Bad Thing.  To
>> Solr, Lucene is not the same as Jetty or Jackson which Solr can fall
>> behind on without significant detriment.  Lucene and the core search
>> functionality it offers is what brings people to Solr (or Elastic).
>> Putting ourselves in a position to fall behind on Lucene does a huge
>> disservice to our users, and loses Solr one of its greatest
>> 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jan Høydahl
I choose to be more optimistic wrt «Solr committers» ability to integrate new 
and changed Lucene APIs in Solr. You do not need to be a Lucene committer in 
order to learn how to USE the Lucene APIs, and I believe there are several 
«Solr committers» who already posess those skills and are pretty deep in Lucene 
already. Hopefully they are interested in doing lucene upgrades for Solr, even 
if that some times includes implementing support for a new fieldType (points vs 
trie), getting rid of index-time-boost features etc. I may even attempt some of 
those tasks myself for the areas of Lucene API I am comfortable with.

Jan

> 13. mai 2020 kl. 16:24 skrev Doug Turnbull 
> :
> 
> Jason, I hear your arguments and think of them FOR a split
> 
> This might sound a bit harsh, but maybe Lucene devs helping with Solr has let 
> Solr off the hook a bit too much? I actually like the fact that the split 
> causes Solr to figure out it's own situation and focus on its problems. 
> 
> Regardless of the split or not, Solr is going to sink or swim based on the 
> efforts of Solr committers, not Lucene committers. I don't think Lucene 
> committers are going to be the ones to really address the systemic issues 
> with Solr. If anything, I imagine they are "let me fix this so the code 
> compiles" level of maintenance. 
> 
> "Falling behind Lucene" is counterbalanced to me with "Should Solr be on 
> cutting-edge Lucene?" 
> 
> I'd be OK with a stable, robust Solr that got 1-2 major versions behind 
> Lucene, but was rock-solid with a lower barrier to entry... 
> 
> On Wed, May 13, 2020 at 10:07 AM Jason Gerlowski  > wrote:
> Wanted to add my two cents to the mix, though I'm a little late as the
> vote has already progressed pretty far.
> 
> I'm against a split.  From the points raised, I agree that Lucene has
> much to gain.  But Solr has a lot to lose.
> 
> Lucene devs would be freed from keeping Solr usage up to date.  That's
> a great improvement for Lucene itself.  But that burden doesn't
> disappear - it's just being moved to a different (smaller) group of
> committers - who by definition don't know Lucene as well, and are less
> suited to the task.  (Lucene devs still might help post-split, but
> given that avoiding this burden is one of the arguments made above for
> a split, it seems unwise to assume how much this generosity will
> continue.)
> 
> One likely result is that Solr will fall behind Lucene. Possibly
> permanently behind.  Lucene folks are doing great work to improve
> perf, add features etc. so falling behind is a Very Bad Thing.  To
> Solr, Lucene is not the same as Jetty or Jackson which Solr can fall
> behind on without significant detriment.  Lucene and the core search
> functionality it offers is what brings people to Solr (or Elastic).
> Putting ourselves in a position to fall behind on Lucene does a huge
> disservice to our users, and loses Solr one of its greatest
> advantages.
> 
> I hope that in the case of a split, the Solr community would rise to
> the occasion and prevent this.  But my personal judgement is that it's
> unlikely.  I hate to be negative, and I hope to be proven wrong, but
> that's how things look to me.  We (Solr folks) have a bad track record
> of addressing things with less-tangible, less-sellable benefits.  Take
> our ongoing test flakiness woes and SolrCloud instability issues as
> examples: both are serious threats to the project, both have been
> around for years, and both are here to stay for the foreseeable
> future.
> 
> If conditions were different in a way that made "falling behind" less
> likely, I'd be all for a split.  But given (1) our recent track record
> of addressing these sort of issues, (2) our test flakiness which will
> make identifying "Lucene snapshot upgrade" bugs exceedingly difficult,
> and (3) the current economic conditions which may make it harder for
> committers to negotiate time from their employers to work on Lucene
> updates...now seems like a bad time to attempt a split.  It will harm
> Solr more than it helps Lucene.
> 
> On Tue, May 12, 2020 at 3:37 PM Namgyu Kim  > wrote:
> >
> > It's hard to make a decision because it seems to have pros and cons.
> > Basically, I agree to separate but there are some questions.
> > So I don't not vote right now.
> >
> > 1) Release version
> > Currently, versions of Lucene and Solr are aligned, how will they be 
> > managed in the future?
> > Other people took Elasticsearch as an example... But it was an independent 
> > project from the beginning.
> > So there is no problem with the Lucene version. (Elasticsearch 7.7 and 
> > Lucene 8.5.1)
> > I'm sure if we make solr as an independent project, it will make cracks 
> > about the version structure. (like Lucene 8.6.2 and Solr 8.9.1)
> > But it's also strange to suddenly start a new version of the Solr. (Solr 
> > 1.0)
> > Of course it's a matter of adaption, but it's likely to cause some 
> > 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Doug Turnbull
Jason, I hear your arguments and think of them FOR a split

This might sound a bit harsh, but maybe Lucene devs helping with Solr has
let Solr off the hook a bit too much? I actually like the fact that the
split causes Solr to figure out it's own situation and focus on
its problems.

Regardless of the split or not, Solr is going to sink or swim based on the
efforts of Solr committers, not Lucene committers. I don't think Lucene
committers are going to be the ones to really address the systemic issues
with Solr. If anything, I imagine they are "let me fix this so the code
compiles" level of maintenance.

"Falling behind Lucene" is counterbalanced to me with "Should Solr be on
cutting-edge Lucene?"

I'd be OK with a stable, robust Solr that got 1-2 major versions behind
Lucene, but was rock-solid with a lower barrier to entry...

On Wed, May 13, 2020 at 10:07 AM Jason Gerlowski 
wrote:

> Wanted to add my two cents to the mix, though I'm a little late as the
> vote has already progressed pretty far.
>
> I'm against a split.  From the points raised, I agree that Lucene has
> much to gain.  But Solr has a lot to lose.
>
> Lucene devs would be freed from keeping Solr usage up to date.  That's
> a great improvement for Lucene itself.  But that burden doesn't
> disappear - it's just being moved to a different (smaller) group of
> committers - who by definition don't know Lucene as well, and are less
> suited to the task.  (Lucene devs still might help post-split, but
> given that avoiding this burden is one of the arguments made above for
> a split, it seems unwise to assume how much this generosity will
> continue.)
>
> One likely result is that Solr will fall behind Lucene. Possibly
> permanently behind.  Lucene folks are doing great work to improve
> perf, add features etc. so falling behind is a Very Bad Thing.  To
> Solr, Lucene is not the same as Jetty or Jackson which Solr can fall
> behind on without significant detriment.  Lucene and the core search
> functionality it offers is what brings people to Solr (or Elastic).
> Putting ourselves in a position to fall behind on Lucene does a huge
> disservice to our users, and loses Solr one of its greatest
> advantages.
>
> I hope that in the case of a split, the Solr community would rise to
> the occasion and prevent this.  But my personal judgement is that it's
> unlikely.  I hate to be negative, and I hope to be proven wrong, but
> that's how things look to me.  We (Solr folks) have a bad track record
> of addressing things with less-tangible, less-sellable benefits.  Take
> our ongoing test flakiness woes and SolrCloud instability issues as
> examples: both are serious threats to the project, both have been
> around for years, and both are here to stay for the foreseeable
> future.
>
> If conditions were different in a way that made "falling behind" less
> likely, I'd be all for a split.  But given (1) our recent track record
> of addressing these sort of issues, (2) our test flakiness which will
> make identifying "Lucene snapshot upgrade" bugs exceedingly difficult,
> and (3) the current economic conditions which may make it harder for
> committers to negotiate time from their employers to work on Lucene
> updates...now seems like a bad time to attempt a split.  It will harm
> Solr more than it helps Lucene.
>
> On Tue, May 12, 2020 at 3:37 PM Namgyu Kim  wrote:
> >
> > It's hard to make a decision because it seems to have pros and cons.
> > Basically, I agree to separate but there are some questions.
> > So I don't not vote right now.
> >
> > 1) Release version
> > Currently, versions of Lucene and Solr are aligned, how will they be
> managed in the future?
> > Other people took Elasticsearch as an example... But it was an
> independent project from the beginning.
> > So there is no problem with the Lucene version. (Elasticsearch 7.7 and
> Lucene 8.5.1)
> > I'm sure if we make solr as an independent project, it will make cracks
> about the version structure. (like Lucene 8.6.2 and Solr 8.9.1)
> > But it's also strange to suddenly start a new version of the Solr. (Solr
> 1.0)
> > Of course it's a matter of adaption, but it's likely to cause some
> confusion for existing users.
> >
> > 2) Complementary relationship
> > When Lucene and Solr are built together, Solr can always maintain the
> latest Lucene.
> > In my personal opinion, it's a great advantage of Solr.
> > Because Solr doesn't have to suffer from Lucene API changes and has
> latest library.
> > But it will be difficult if Solr becomes independent.
> > If Solr tracks the master branch of Lucene on separate
> repository(project), can it always check and reflect Lucene's API changes?
> >
> > On Tue, May 12, 2020 at 10:12 PM Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
> >>
> >> I'll give a perspective that comes more from the user's / "market"
> point of view as at OSC we onboard lots of new organizations into Solr.
> >>
> >> - Most new users incorrectly think of Solr as an 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
Wanted to add my two cents to the mix, though I'm a little late as the
vote has already progressed pretty far.

I'm against a split.  From the points raised, I agree that Lucene has
much to gain.  But Solr has a lot to lose.

Lucene devs would be freed from keeping Solr usage up to date.  That's
a great improvement for Lucene itself.  But that burden doesn't
disappear - it's just being moved to a different (smaller) group of
committers - who by definition don't know Lucene as well, and are less
suited to the task.  (Lucene devs still might help post-split, but
given that avoiding this burden is one of the arguments made above for
a split, it seems unwise to assume how much this generosity will
continue.)

One likely result is that Solr will fall behind Lucene. Possibly
permanently behind.  Lucene folks are doing great work to improve
perf, add features etc. so falling behind is a Very Bad Thing.  To
Solr, Lucene is not the same as Jetty or Jackson which Solr can fall
behind on without significant detriment.  Lucene and the core search
functionality it offers is what brings people to Solr (or Elastic).
Putting ourselves in a position to fall behind on Lucene does a huge
disservice to our users, and loses Solr one of its greatest
advantages.

I hope that in the case of a split, the Solr community would rise to
the occasion and prevent this.  But my personal judgement is that it's
unlikely.  I hate to be negative, and I hope to be proven wrong, but
that's how things look to me.  We (Solr folks) have a bad track record
of addressing things with less-tangible, less-sellable benefits.  Take
our ongoing test flakiness woes and SolrCloud instability issues as
examples: both are serious threats to the project, both have been
around for years, and both are here to stay for the foreseeable
future.

If conditions were different in a way that made "falling behind" less
likely, I'd be all for a split.  But given (1) our recent track record
of addressing these sort of issues, (2) our test flakiness which will
make identifying "Lucene snapshot upgrade" bugs exceedingly difficult,
and (3) the current economic conditions which may make it harder for
committers to negotiate time from their employers to work on Lucene
updates...now seems like a bad time to attempt a split.  It will harm
Solr more than it helps Lucene.

On Tue, May 12, 2020 at 3:37 PM Namgyu Kim  wrote:
>
> It's hard to make a decision because it seems to have pros and cons.
> Basically, I agree to separate but there are some questions.
> So I don't not vote right now.
>
> 1) Release version
> Currently, versions of Lucene and Solr are aligned, how will they be managed 
> in the future?
> Other people took Elasticsearch as an example... But it was an independent 
> project from the beginning.
> So there is no problem with the Lucene version. (Elasticsearch 7.7 and Lucene 
> 8.5.1)
> I'm sure if we make solr as an independent project, it will make cracks about 
> the version structure. (like Lucene 8.6.2 and Solr 8.9.1)
> But it's also strange to suddenly start a new version of the Solr. (Solr 1.0)
> Of course it's a matter of adaption, but it's likely to cause some confusion 
> for existing users.
>
> 2) Complementary relationship
> When Lucene and Solr are built together, Solr can always maintain the latest 
> Lucene.
> In my personal opinion, it's a great advantage of Solr.
> Because Solr doesn't have to suffer from Lucene API changes and has latest 
> library.
> But it will be difficult if Solr becomes independent.
> If Solr tracks the master branch of Lucene on separate repository(project), 
> can it always check and reflect Lucene's API changes?
>
> On Tue, May 12, 2020 at 10:12 PM Doug Turnbull 
>  wrote:
>>
>> I'll give a perspective that comes more from the user's / "market" point of 
>> view as at OSC we onboard lots of new organizations into Solr.
>>
>> - Most new users incorrectly think of Solr as an independent Apache project, 
>> and many will have little knowledge or awareness of Lucene itself until 
>> given the full history of Lucene, Solr, Elasticsearch... or they have to 
>> dive into the code/write a plugin
>>
>> - Most orgs / managers think in terms of "Solr" (as in "Solr" vs 
>> "Elasticsearch" vs "Vespa, etc). So the starting point for new devs / folks 
>> is from the Solr angle
>>
>> - Lucene, when discussed, is understood more colloquially as a Solr 
>> dependency
>>
>> - If someone brings down the code to do some kind of work or investigation, 
>> there's typically surprise that Lucene and Solr are bundled together.
>>
>> - There's further surprise as the projects are indeed so different: Lucene 
>> and Solr tests, for example look little alike. They seem to have different 
>> coding syles / practices. One has more server-like and distributed system 
>> concerns; the other is clearly a low-level library for doing search work...
>>
>> I personally have a hard time explaining to new users the rationale for 
>> keeping these together, 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Namgyu Kim
It's hard to make a decision because it seems to have pros and cons.
Basically, I agree to separate but there are some questions.
So I don't not vote right now.

1) Release version
Currently, versions of Lucene and Solr are aligned, how will they be
managed in the future?
Other people took Elasticsearch as an example... But it was an independent
project from the beginning.
So there is no problem with the Lucene version. (Elasticsearch 7.7 and
Lucene 8.5.1)
I'm sure if we make solr as an independent project, it will make cracks
about the version structure. (like Lucene 8.6.2 and Solr 8.9.1)
But it's also strange to suddenly start a new version of the Solr. (Solr
1.0)
Of course it's a matter of adaption, but it's likely to cause some
confusion for existing users.

2) Complementary relationship
When Lucene and Solr are built together, Solr can always maintain the
latest Lucene.
In my personal opinion, it's a great advantage of Solr.
Because Solr doesn't have to suffer from Lucene API changes and has latest
library.
But it will be difficult if Solr becomes independent.
If Solr tracks the master branch of Lucene on separate repository(project),
can it always check and reflect Lucene's API changes?

On Tue, May 12, 2020 at 10:12 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> I'll give a perspective that comes more from the user's / "market" point
> of view as at OSC we onboard lots of new organizations into Solr.
>
> - Most new users incorrectly think of Solr as an independent Apache
> project, and many will have little knowledge or awareness of Lucene itself
> until given the full history of Lucene, Solr, Elasticsearch... or they have
> to dive into the code/write a plugin
>
> - Most orgs / managers think in terms of "Solr" (as in "Solr" vs
> "Elasticsearch" vs "Vespa, etc). So the starting point for new devs / folks
> is from the Solr angle
>
> - Lucene, when discussed, is understood more colloquially as a Solr
> dependency
>
> - If someone brings down the code to do some kind of work or
> investigation, there's typically surprise that Lucene and Solr are bundled
> together.
>
> - There's further surprise as the projects are indeed so different: Lucene
> and Solr tests, for example look little alike. They seem to have different
> coding syles / practices. One has more server-like and distributed system
> concerns; the other is clearly a low-level library for doing search work...
>
> I personally have a hard time explaining to new users the rationale for
> keeping these together, and it only increases the barrier to entry (to both
> projects) to have this added complexity of two very different code bases
> munged together.
>
> Just my 2 cents...
> -Doug
>
> On Tue, May 12, 2020 at 7:30 AM Alan Woodward 
> wrote:
>
>> One advantage I find with the way Elasticsearch and Lucene interact is
>> that ES doesn’t depend on the master branch.  We upgrade our master branch
>> frequently to keep up to date with the latest release branch, and that lets
>> us find regressions or API problems pretty quickly, but it also insulates
>> us from having to make big changes immediately.  I find this really useful
>> for things like deprecations.  Let’s say we deprecate a particular API in
>> the release branch, and remove it entirely in master.  Currently, that
>> means Solr needs to immediately switch over to the new API in its master
>> branch.  But the whole point of doing deprecations first is that it gives
>> users time to find issues with the replacements - if we find that the
>> replacement API doesn’t quite fit in ES, we have time to work out either
>> how to change our code, or to improve the new API, but because the
>> deprecated version is still there we’re not blocked from upgrading and
>> getting other improvements.  Solr, meanwhile, may end up with a hacky
>> workaround because that’s what got tests passing for the Lucene developer;
>> or worse, we end up just copying the deprecated API wholesale into Solr and
>> abandoning it there - witness TrieField or UninvertingReader.
>>
>> > On 11 May 2020, at 19:05, Atri Sharma  wrote:
>> >
>> > My two cents:
>> >
>> > As a Lucene heavy developer, I have several found maintaining Solr
>> > dependencies while making large changes a bit cumbersome. I believe
>> > Lucene and Solr should exist in a symbiotic relationship but not
>> > tightly coupled with each other.
>> >
>> >
>> > On Mon, May 11, 2020 at 7:22 PM Erik Hatcher 
>> wrote:
>> >>
>> >> Without reading much or replying to any specific points made on this
>> thread, here's my raw thoughts on this age-old topic (finally  coming
>> out of my cocoon after taking things in for a bit)
>> >>
>> >> Solr is a search -server- with distributed capabilities, that
>> leverages the magic of Lucene underneath.  Solr depends on Lucene, is a
>> consumer of it.  Lucene is a tight search -library- with little to no
>> external dependencies.  Their purposes and end-users are different.
>> >>
>> >> I was never really 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Doug Turnbull
I'll give a perspective that comes more from the user's / "market" point
of view as at OSC we onboard lots of new organizations into Solr.

- Most new users incorrectly think of Solr as an independent Apache
project, and many will have little knowledge or awareness of Lucene itself
until given the full history of Lucene, Solr, Elasticsearch... or they have
to dive into the code/write a plugin

- Most orgs / managers think in terms of "Solr" (as in "Solr" vs
"Elasticsearch" vs "Vespa, etc). So the starting point for new devs / folks
is from the Solr angle

- Lucene, when discussed, is understood more colloquially as a Solr
dependency

- If someone brings down the code to do some kind of work or investigation,
there's typically surprise that Lucene and Solr are bundled together.

- There's further surprise as the projects are indeed so different: Lucene
and Solr tests, for example look little alike. They seem to have different
coding syles / practices. One has more server-like and distributed system
concerns; the other is clearly a low-level library for doing search work...

I personally have a hard time explaining to new users the rationale for
keeping these together, and it only increases the barrier to entry (to both
projects) to have this added complexity of two very different code bases
munged together.

Just my 2 cents...
-Doug

On Tue, May 12, 2020 at 7:30 AM Alan Woodward  wrote:

> One advantage I find with the way Elasticsearch and Lucene interact is
> that ES doesn’t depend on the master branch.  We upgrade our master branch
> frequently to keep up to date with the latest release branch, and that lets
> us find regressions or API problems pretty quickly, but it also insulates
> us from having to make big changes immediately.  I find this really useful
> for things like deprecations.  Let’s say we deprecate a particular API in
> the release branch, and remove it entirely in master.  Currently, that
> means Solr needs to immediately switch over to the new API in its master
> branch.  But the whole point of doing deprecations first is that it gives
> users time to find issues with the replacements - if we find that the
> replacement API doesn’t quite fit in ES, we have time to work out either
> how to change our code, or to improve the new API, but because the
> deprecated version is still there we’re not blocked from upgrading and
> getting other improvements.  Solr, meanwhile, may end up with a hacky
> workaround because that’s what got tests passing for the Lucene developer;
> or worse, we end up just copying the deprecated API wholesale into Solr and
> abandoning it there - witness TrieField or UninvertingReader.
>
> > On 11 May 2020, at 19:05, Atri Sharma  wrote:
> >
> > My two cents:
> >
> > As a Lucene heavy developer, I have several found maintaining Solr
> > dependencies while making large changes a bit cumbersome. I believe
> > Lucene and Solr should exist in a symbiotic relationship but not
> > tightly coupled with each other.
> >
> >
> > On Mon, May 11, 2020 at 7:22 PM Erik Hatcher 
> wrote:
> >>
> >> Without reading much or replying to any specific points made on this
> thread, here's my raw thoughts on this age-old topic (finally  coming
> out of my cocoon after taking things in for a bit)
> >>
> >> Solr is a search -server- with distributed capabilities, that leverages
> the magic of Lucene underneath.  Solr depends on Lucene, is a consumer of
> it.  Lucene is a tight search -library- with little to no external
> dependencies.  Their purposes and end-users are different.
> >>
> >> I was never really for the grand unification of Lucene and Solr back in
> the day because:
> >>
> >> - Solr's developer experience would be greatly streamlined, faster,
> cleaner, leaner, and focused
> >> - Having Lucene change when Solr doesni't (yet) adapt to those changes
> leads to confusion and inconsistency, loose wires hanging out of the wall
> unconnected or duct taped together
> >> - It simply makes sense to keep Lucene versioned and tightly controlled
> for upgrades, various testing configurations varying Lucene versions,
> within Solr
> >> - Solr could have a very concerted upgrade effort for Lucene capability
> jumps, with a focused upgrade effort at the changed/improved/added touch
> points just like other dependencies within Solr (like Tika and Jetty)
> >>
> >> Those points all kinda say the same thing Solr depends on
> "lucene.jar" and I'm in the camp that thinks Solr and Lucene development,
> communities, and end-users/consumers would all greatly benefit from a fancy
> new TLP and focused community for solr.apache.org and a tight(er)
> relationship with the Lucene community as an involved and vested consumer.
> >>
> >> Erik
> >>
> >
> >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Alan Woodward
One advantage I find with the way Elasticsearch and Lucene interact is that ES 
doesn’t depend on the master branch.  We upgrade our master branch frequently 
to keep up to date with the latest release branch, and that lets us find 
regressions or API problems pretty quickly, but it also insulates us from 
having to make big changes immediately.  I find this really useful for things 
like deprecations.  Let’s say we deprecate a particular API in the release 
branch, and remove it entirely in master.  Currently, that means Solr needs to 
immediately switch over to the new API in its master branch.  But the whole 
point of doing deprecations first is that it gives users time to find issues 
with the replacements - if we find that the replacement API doesn’t quite fit 
in ES, we have time to work out either how to change our code, or to improve 
the new API, but because the deprecated version is still there we’re not 
blocked from upgrading and getting other improvements.  Solr, meanwhile, may 
end up with a hacky workaround because that’s what got tests passing for the 
Lucene developer; or worse, we end up just copying the deprecated API wholesale 
into Solr and abandoning it there - witness TrieField or UninvertingReader.

> On 11 May 2020, at 19:05, Atri Sharma  wrote:
> 
> My two cents:
> 
> As a Lucene heavy developer, I have several found maintaining Solr
> dependencies while making large changes a bit cumbersome. I believe
> Lucene and Solr should exist in a symbiotic relationship but not
> tightly coupled with each other.
> 
> 
> On Mon, May 11, 2020 at 7:22 PM Erik Hatcher  wrote:
>> 
>> Without reading much or replying to any specific points made on this thread, 
>> here's my raw thoughts on this age-old topic (finally  coming out of my 
>> cocoon after taking things in for a bit)
>> 
>> Solr is a search -server- with distributed capabilities, that leverages the 
>> magic of Lucene underneath.  Solr depends on Lucene, is a consumer of it.  
>> Lucene is a tight search -library- with little to no external dependencies.  
>> Their purposes and end-users are different.
>> 
>> I was never really for the grand unification of Lucene and Solr back in the 
>> day because:
>> 
>> - Solr's developer experience would be greatly streamlined, faster, cleaner, 
>> leaner, and focused
>> - Having Lucene change when Solr doesni't (yet) adapt to those changes leads 
>> to confusion and inconsistency, loose wires hanging out of the wall 
>> unconnected or duct taped together
>> - It simply makes sense to keep Lucene versioned and tightly controlled for 
>> upgrades, various testing configurations varying Lucene versions, within Solr
>> - Solr could have a very concerted upgrade effort for Lucene capability 
>> jumps, with a focused upgrade effort at the changed/improved/added touch 
>> points just like other dependencies within Solr (like Tika and Jetty)
>> 
>> Those points all kinda say the same thing Solr depends on "lucene.jar" 
>> and I'm in the camp that thinks Solr and Lucene development, communities, 
>> and end-users/consumers would all greatly benefit from a fancy new TLP and 
>> focused community for solr.apache.org and a tight(er) relationship with the 
>> Lucene community as an involved and vested consumer.
>> 
>> Erik
>> 
> 
> 
> -- 
> Regards,
> 
> Atri
> Apache Concerted
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Atri Sharma
My two cents:

As a Lucene heavy developer, I have several found maintaining Solr
dependencies while making large changes a bit cumbersome. I believe
Lucene and Solr should exist in a symbiotic relationship but not
tightly coupled with each other.


On Mon, May 11, 2020 at 7:22 PM Erik Hatcher  wrote:
>
> Without reading much or replying to any specific points made on this thread, 
> here's my raw thoughts on this age-old topic (finally  coming out of my 
> cocoon after taking things in for a bit)
>
> Solr is a search -server- with distributed capabilities, that leverages the 
> magic of Lucene underneath.  Solr depends on Lucene, is a consumer of it.  
> Lucene is a tight search -library- with little to no external dependencies.  
> Their purposes and end-users are different.
>
> I was never really for the grand unification of Lucene and Solr back in the 
> day because:
>
>   - Solr's developer experience would be greatly streamlined, faster, 
> cleaner, leaner, and focused
>   - Having Lucene change when Solr doesni't (yet) adapt to those changes 
> leads to confusion and inconsistency, loose wires hanging out of the wall 
> unconnected or duct taped together
>   - It simply makes sense to keep Lucene versioned and tightly controlled for 
> upgrades, various testing configurations varying Lucene versions, within Solr
>   - Solr could have a very concerted upgrade effort for Lucene capability 
> jumps, with a focused upgrade effort at the changed/improved/added touch 
> points just like other dependencies within Solr (like Tika and Jetty)
>
> Those points all kinda say the same thing Solr depends on "lucene.jar" 
> and I'm in the camp that thinks Solr and Lucene development, communities, and 
> end-users/consumers would all greatly benefit from a fancy new TLP and 
> focused community for solr.apache.org and a tight(er) relationship with the 
> Lucene community as an involved and vested consumer.
>
> Erik
>


-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Erik Hatcher
Without reading much or replying to any specific points made on this thread, 
here's my raw thoughts on this age-old topic (finally  coming out of my 
cocoon after taking things in for a bit)

Solr is a search -server- with distributed capabilities, that leverages the 
magic of Lucene underneath.  Solr depends on Lucene, is a consumer of it.  
Lucene is a tight search -library- with little to no external dependencies.  
Their purposes and end-users are different.

I was never really for the grand unification of Lucene and Solr back in the day 
because:

  - Solr's developer experience would be greatly streamlined, faster, cleaner, 
leaner, and focused
  - Having Lucene change when Solr doesni't (yet) adapt to those changes leads 
to confusion and inconsistency, loose wires hanging out of the wall unconnected 
or duct taped together
  - It simply makes sense to keep Lucene versioned and tightly controlled for 
upgrades, various testing configurations varying Lucene versions, within Solr
  - Solr could have a very concerted upgrade effort for Lucene capability 
jumps, with a focused upgrade effort at the changed/improved/added touch points 
just like other dependencies within Solr (like Tika and Jetty)

Those points all kinda say the same thing Solr depends on "lucene.jar" and 
I'm in the camp that thinks Solr and Lucene development, communities, and 
end-users/consumers would all greatly benefit from a fancy new TLP and focused 
community for solr.apache.org  and a tight(er) 
relationship with the Lucene community as an involved and vested consumer.

Erik



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Bram Van Dam
On 11/05/2020 15:09, Dawid Weiss wrote:
>> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
>> reasons why our company chose Solr over ElasticSearch.
> 
> I fail to see anything supporting superior Lucene
> compatibility of one vs. another.

Yeah you're right. It's since been pointed out to me that ES uses
vanilla Lucene these days. I hope Solr can remain in the same boat, it
would be a shame to fork Lucene and lose the benefits of compatibility.

 - Bram

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Dawid Weiss
> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
> reasons why our company chose Solr over ElasticSearch.

There are a number of Elasticsearch developers working on Lucene core
(or maybe rather Lucene developers working at Elasticsearch?). And
there are Solr developers working on Lucene features which cascade to
Elasticsearch functionality. Both projects follow Lucene snapshots
releases. I fail to see anything supporting superior Lucene
compatibility of one vs. another.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Simon Willnauer
On Sun, May 10, 2020 at 3:41 PM Bram Van Dam  wrote:
>
> On 10/05/2020 08:20, David Smiley wrote:
> > An idea just occurred to me that may help make a split nicer for Solr
> > than it is today.  Solr could use a branch of the Lucene project that's
> > used for the Solr project.
>
> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
> reasons why our company chose Solr over ElasticSearch.

I though about this for a while and I do wonder if you could elaborate
on what makes Solr  have a better compatibility with Lucene. That's
certainly something elasticsearch would want to catch up on since it
sounds like a clear benefit for users. Maybe I just misunderstood what
you meant hence couldn't make much sense out of it.

simon

>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Adrien Grand
On Mon, May 11, 2020 at 1:17 AM Shawn Heisey  wrote:

> I think the presence of Solr in the codebase
> has diluted Lucene's releases, making them come far too quickly.  I
> would bet that without Solr, Lucene would probably be somewhere in 6.x,
> not 8.x.
>

Actually I think that Lucene would be on 8.x without Solr too. We did:
 - 5.0 to drop support for 3.x indices,
 - 6.0 to introduce points and require Java 8+,
 - 7.0 to introduce doc-value iterators and change how norms are encoded,
 - 8.0 for impacts / block-max WAND, which required breaking changes on the
Similarity API.

It would have been challenging to expose these changes with fewer major
releases without significantly delaying some very appealing features, which
has downsides too.

-- 
Adrien


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Shawn Heisey

On 5/10/2020 3:41 PM, Michael McCandless wrote:
I think the costs (I agree: they are high) are a one-time thing, while 
the benefits are long term, and accrue/multiply with time.  We should 
make decisions like this with the long-term benefits in mind.


I expect Lucene and Solr to have long healthy lives ahead, and that 
means this one-time cost will eventually be amortized and made 
miniscule/negligible compared to the long-term benefits to both projects.


+1

I think that those with a primary interest in Lucene would be strongly 
in favor of this split.  I think the presence of Solr in the codebase 
has diluted Lucene's releases, making them come far too quickly.  I 
would bet that without Solr, Lucene would probably be somewhere in 6.x, 
not 8.x.


My personal interests are with Solr.  I have little interest in Lucene 
code.  I think it would be beneficial in both directions to have them be 
separate.


To protect against the divergence which prompted the joining of the two 
codebases, I do think it would be a good idea for a few committers to 
remain with both projects, but I can say unequivocally that if the 
projects split, I will only want to keep those privileges on a new Solr TLP.


Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Tue, May 5, 2020 at 4:59 PM Tomás Fernández Löbbe 
wrote:

On Tue, May 5, 2020 at 12:37 PM Dawid Weiss  wrote:
>
>> > I read “promotion to TLP” as if this was some achievement that needs to
>> be celebrated now.
>>
>> I honestly believe it is an achievement for a project to receive
>> top-level status. It's a sign of having a community of users,
>> committers and processes mature enough to empower its further
>> development.
>>
>
+1


> My point is that this is not something new. Solr is a mature product and
> has had the community and process in place for a long time.
>

I agree it's not new.  But that, to me, means that we have already waited
too long to promote Solr up to its own top-level Apache project.

This should have been done long ago.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Wed, May 6, 2020 at 4:24 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

Offtopic: Gus, I'm looking at Vespa core for Solr someday
>

+1, Vespa looks really fascinating!  Plus it is released under ASL 2 as
well.  And the world clearly needs more open-source search engines.

Mike McCandless

http://blog.mikemccandless.com

>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:

> The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
>
> Sounds like there are quite a few tasks to complete to get this done.
> Splitting the build and codebase. Presumably a bunch of administration
> within Apache/the PMC. Setting up infrastructure etc.
>

+1

These are the costs, to be paid up front in the currency of someone's
> time. The benefits are less clear. Faster build times and easier
> maintenance sound attractive, but when will those benefits be visible?
> Next month? Or in a year?
>

I think the costs (I agree: they are high) are a one-time thing, while the
benefits are long term, and accrue/multiply with time.  We should make
decisions like this with the long-term benefits in mind.

I expect Lucene and Solr to have long healthy lives ahead, and that means
this one-time cost will eventually be amortized and made
miniscule/negligible compared to the long-term benefits to both projects.


> Whoever will be doing this work should probably ask themselves the
> questions: is this the best use of their time?
>

+1

Also, since we "just" completed the Gradle migration in master, hopefully
that is still fresh on people's minds, and separating the Lucene and Solr
builds will then be easier.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Jan Høydahl
Better stick to Lucene snapshot versions. Commit the Lucene change first, then 
solr. If the Lucene change is not mature enough to commit to Lucene, it is 
probably not mature enough for Solr either. Avoid hacks or forks, spend some 
longer time to get it right.

If things get removed from Lucene and we want to support it in Solr for another 
major version, consider moving the code to solr git, under the Lucene package 
namespace. But it should not be something we do often.

Jan

>> 10. mai 2020 kl. 21:23 skrev Gus Heck :
> 
> 
>>> On Sun, May 10, 2020 at 11:55 AM Mike Drob  wrote:
>> Solr maintaining a fork of Lucene sounds like exactly the situation that let 
>> to the original merge, where there are two sets of divergent development
> 
> Exactly


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Gus Heck
On Sun, May 10, 2020 at 11:55 AM Mike Drob  wrote:

> Solr maintaining a fork of Lucene sounds like exactly the situation that
> let to the original merge, where there are two sets of divergent development
>

Exactly


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Mike Drob
Solr maintaining a fork of Lucene sounds like exactly the situation that
let to the original merge, where there are two sets of divergent development

On Sun, May 10, 2020 at 1:20 AM David Smiley 
wrote:

> I agree with Doug that the burden of proof is on keeping the codebases
> together instead of the reverse.  I liken it to a marriage; it has to work
> well for both parties.It seems to be mostly beneficial for Solr but
> much less so for Lucene.
>
> BTW an even better example than the huge FuzzyQuery case was the loss of
> an entire postings format that Solr was using -- LUCENE-9116
> .  That one was caught
> thanks to Solr tests and prevented the release.  The huge FuzzyQuery, on
> the other hand, was released.  I hope that with a split project, we're able
> to do Solr side tests quickly enough prior to Lucene doing releases.  I
> wonder if ElasticSearch tries to do this on their side too; does it?
>
> An idea just occurred to me that may help make a split nicer for Solr than
> it is today.  Solr could use a branch of the Lucene project that's used for
> the Solr project.  That's just impossible today due to the single
> codebase.  This affords the possibility of changes that are not endorsed on
> the Lucene side (i.e. that would not make it into a real Lucene release).
> An example of this are API changes like LUCENE-8159
>  or perhaps making
> some classes public so that Solr can access them without awkward hacks.
> Put differently, like some companies maintain forks of Lucene/Solr, in the
> future, Solr should be able to have its fork of Lucene likewise.  Should
> this approach be adopted, Solr would want to keep this to a minimum to keep
> upkeep of the branch low, and the branch _would_ need upkeep (e.g. running
> tests), so it's not a total panacea.  On the other hand, if Solr strictly
> only releases with released Lucene versions, then this is way nicer from a
> versioning and artifact management (i.e. publishing to Maven) point of
> view.  It's nice to have options.
>
>
> ~ David
>
>
> On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:
>
>> > The big question is this: “Is this the right time to split Solr and
>> > Lucene into two independent projects?”.
>>
>> Sounds like there are quite a few tasks to complete to get this done.
>> Splitting the build and codebase. Presumably a bunch of administration
>> within Apache/the PMC. Setting up infrastructure etc.
>>
>> These are the costs, to be paid up front in the currency of someone's
>> time. The benefits are less clear. Faster build times and easier
>> maintenance sound attractive, but when will those benefits be visible?
>> Next month? Or in a year?
>>
>> Whoever will be doing this work should probably ask themselves the
>> questions: is this the best use of their time?
>>
>>  - Bram
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Bram Van Dam
On 10/05/2020 08:20, David Smiley wrote:
> An idea just occurred to me that may help make a split nicer for Solr
> than it is today.  Solr could use a branch of the Lucene project that's
> used for the Solr project.

Maybe I'm alone in this, but (better) Lucene compatibility is one of the
reasons why our company chose Solr over ElasticSearch.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Adrien Grand
On Sun, May 10, 2020 at 8:20 AM David Smiley 
wrote:

> I wonder if ElasticSearch tries to do this on their side too; does it?
>

Yes, Elasticsearch regularly upgrades to new snapshots of Lucene[1][2],
often multiple times per minor version. It helps give Lucene more test and
performance coverage, and also makes it easier for us to identify which
particular Lucene change contributed to an improvement or regression in
Elasticsearch.

[1]
https://github.com/elastic/elasticsearch/search?o=desc=lucene+snapshot=author-date=Commits
[2] https://github.com/elastic/elasticsearch/pull/56175

-- 
Adrien


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread David Smiley
I agree with Doug that the burden of proof is on keeping the codebases
together instead of the reverse.  I liken it to a marriage; it has to work
well for both parties.It seems to be mostly beneficial for Solr but
much less so for Lucene.

BTW an even better example than the huge FuzzyQuery case was the loss of an
entire postings format that Solr was using -- LUCENE-9116
.  That one was caught
thanks to Solr tests and prevented the release.  The huge FuzzyQuery, on
the other hand, was released.  I hope that with a split project, we're able
to do Solr side tests quickly enough prior to Lucene doing releases.  I
wonder if ElasticSearch tries to do this on their side too; does it?

An idea just occurred to me that may help make a split nicer for Solr than
it is today.  Solr could use a branch of the Lucene project that's used for
the Solr project.  That's just impossible today due to the single
codebase.  This affords the possibility of changes that are not endorsed on
the Lucene side (i.e. that would not make it into a real Lucene release).
An example of this are API changes like LUCENE-8159
 or perhaps making some
classes public so that Solr can access them without awkward hacks.  Put
differently, like some companies maintain forks of Lucene/Solr, in the
future, Solr should be able to have its fork of Lucene likewise.  Should
this approach be adopted, Solr would want to keep this to a minimum to keep
upkeep of the branch low, and the branch _would_ need upkeep (e.g. running
tests), so it's not a total panacea.  On the other hand, if Solr strictly
only releases with released Lucene versions, then this is way nicer from a
versioning and artifact management (i.e. publishing to Maven) point of
view.  It's nice to have options.

~ David


On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:

> > The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
>
> Sounds like there are quite a few tasks to complete to get this done.
> Splitting the build and codebase. Presumably a bunch of administration
> within Apache/the PMC. Setting up infrastructure etc.
>
> These are the costs, to be paid up front in the currency of someone's
> time. The benefits are less clear. Faster build times and easier
> maintenance sound attractive, but when will those benefits be visible?
> Next month? Or in a year?
>
> Whoever will be doing this work should probably ask themselves the
> questions: is this the best use of their time?
>
>  - Bram
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-07 Thread Bram Van Dam
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.

Sounds like there are quite a few tasks to complete to get this done.
Splitting the build and codebase. Presumably a bunch of administration
within Apache/the PMC. Setting up infrastructure etc.

These are the costs, to be paid up front in the currency of someone's
time. The benefits are less clear. Faster build times and easier
maintenance sound attractive, but when will those benefits be visible?
Next month? Or in a year?

Whoever will be doing this work should probably ask themselves the
questions: is this the best use of their time?

 - Bram

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-07 Thread Adrien Grand
There are definitely pros and cons of splitting vs. being a single project.
The bigger pains for me until now have been the following ones:

Digging Solr failures

The theory is that Solr failures can help find Lucene bugs that Lucene bugs
wouldn't catch, and while this occurred a couple times, I found the
benefit-cost ratio to not be interesting as Solr tests can be especially
hard to debug being integration tests that use threading and usually don't
reproduce failing seeds.

Synchronized releases

Releasing at the same time has created interesting situations in the past.
For instance, we have had several Lucene patch releases with empty
changelogs. Another problem is that the more changes go into a release, the
more likely you would find last-minute blockers and need to respin.
Splitting would help keep the scope of each release smaller and reduce
chances of needing to respin.

The argument has been made that we would still need to coordinate major
releases because of backward compatibility guarantees, but to be clear
we're talking about something that would be much more lightweight than the
coordination that we require today. It would be totally fine if Solr
released new major versions several months after Lucene, the only
requirement is to have the same cadence.

Adapting Solr to Lucene changes

Lucene embraces its N-1 backward compatibility policy to move forward, but
Solr is reluctant to. For instance uninverting and numeric fields have been
deprecated more than 4 years ago (in favor of doc values and points
respectively), but they are still used in Solr, and the task of finding way
to keep the functionality working after removing the feature from Lucene
fell on the plate of the person who drove the deprecation/removal in
Lucene. While some might argue that a full split might move the cursor too
far in the other direction, I feel that this is something we should work on
addressing, even if the final decision is to not split.

In the end, I think that Lucene and Solr should keep a close relationship,
but reducing coupling would help. I wish the two projects had a
relationship that looked more like the relationship that we have with
OpenJDK, testing early-access builds, embracing new features and
deprecations, but without forcing tight coupling. I don't have strong
feelings about splitting PMCs and making Solr a TLP vs. remaining the same
project, but I wish we would at least make Solr depend on Lucene JARs and
decouple builds/releases.


On Mon, May 4, 2020 at 11:11 AM Dawid Weiss  wrote:

> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Anshum Gupta
I personally feel that all the current issues can be solved by actually
working on those problems instead of splitting and calling it a day. I
don't really think that splitting provides any benefit to the Solr side of
things at all, however I also completely agree that it would make things
easier in the Lucene world.

- Test times: We can always run the tests separately and I think that's
what everyone does most of the time. Splitting the project wouldn't really
get us much here.
- Builds: I agree that the build system is interlinked and non-trivial to
manage, especially when you have a task like switching over to a new system
(thanks for all the effort around that) but most of it is behind us at this
point. Both, Lucene and Solr do have different ways of handling things, but
the difference in workflow for people wouldn't really work here specially
as we would retain the committer bits across the newly split projects, if
that happens. If we think there is a unified or common way to work with
both projects, let's instead move in that direction and make it easier for
folks.
- Packaging: Another thing that is almost completely disconnected at the
moment. Solr and Lucene releases don't have much in common other than the
release numbers and so the back-compat requirements.
- About the dependency bit - I wouldn't really compare Solr with
Elasticsearch only because they are fundamentally different in terms of one
being an Apache project, and the other not being one. Moreover, right now,
Solr and Lucene are the same project.

Personally I feel the burden of proof should not be why they should be
> split up, but the other way - "what arguments can be made for keeping them
> together?"


I disagree with that statement. The reasons that were highlighted, while
certainly are points of concerns, I don't think that splitting the project
is a way to solve the problem. Suggesting that the responsibility of
justifying undertaking such a task should lie on folks who support the
current state doesn't seem reasonable to me.

The reason for keeping the project unified overlap with a lot of what has
been already mentioned w.r.t. the reasons to combine them.

To reiterate, I completely agree with the concerns and that we should do
something about them to make it easier and attractive for contributors, but
splitting the project isn't really a solution.

Will splitting do good for Solr? I don't think so. Not objectively at
least. This will harm Solr in my opinion and not solve any of the above
problems for the new Solr TLP.

Will splitting be good for Lucene? Absolutely. It will make things easier
for Lucene as the complexity around a distributed system will reduce, in
addition to a lot of other things, technically and community wise.

Like Simon, I am completely pro doing the right thing, irrespective of the
difficulty involved, but we need to all be aware of the pros and the cons
of this step, and make an informed decision.


-Anshum



On Wed, May 6, 2020 at 11:21 AM Gus Heck  wrote:

> IMO, if we need to say “we can’t release X because it breaks Y”, or “we
>> need to release X to be able to release Y”, the projects are not really
>> independent, and “the PMCs will overlap” won’t take us very far.
>>
>
> This. I don't think the two really can be separated. Any separation will
> merely be artificial, and/or an excuse for throwing stuff over the wall.
> The sooner incompatibilities or difficulties are identified the better.
> Definitely not in favor of splitting.
>
> Really, we are effectively "search.apache.org" (or I suppose "
> java-search.apache.org") and the lucene name as the TLP is just a
> legacy thing. We can have components (as does hc.apache.org) but Solr
> can't live without Lucene, so fostering a sense of separation is going to
> be bad for Solr.
>
> If someday we reach a point where some other library could swap into Solr
> to replace Lucene, then maybe.
>
> My opinion, YMMV :)
>
>
> On Wed, May 6, 2020 at 5:40 AM Simon Willnauer 
> wrote:
>
>> I can speak from experience that working with a snapshot is much
>> cleaner than working with submodules. We do this in elasticsearch for
>> a very long time now and our process here works just fine. It has a
>> bunch of advantages over a direct / source dependency like solr has
>> right now. I recall that someone else already mentioned some of them
>> like working on somewhat more stable codebase etc. do refactorings and
>> integration when there are people dedicated to it and have enough time
>> to do it properly.
>>
>> Regarding the effort of a split, I think that not doing something
>> because it's a lot of work will just cause a ton of issues down the
>> road. Doing the right thing is a lot of work that's for sure but we
>> can start working on this in baby steps an we can all help. Like we
>> can gradually do this, start with website, lists then build system
>> etc. or start with build first and do website last. It's ok to apply
>> progress over perfection here. We all 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Ishan Chattopadhyaya
Offtopic: Gus, I'm looking at Vespa core for Solr someday. I used to work
at Yahoo! and I was a strong advocate of replacing Vespa with Solr for our
team. A quick benchmark back then proved to me that Vespa was faster than
Solr for our usecase and I rested my case. Both Lucene and Vespa have
improved, so I'd love to do that experiment again.

On Wed, 6 May, 2020, 11:51 pm Gus Heck,  wrote:

> IMO, if we need to say “we can’t release X because it breaks Y”, or “we
>> need to release X to be able to release Y”, the projects are not really
>> independent, and “the PMCs will overlap” won’t take us very far.
>>
>
> This. I don't think the two really can be separated. Any separation will
> merely be artificial, and/or an excuse for throwing stuff over the wall.
> The sooner incompatibilities or difficulties are identified the better.
> Definitely not in favor of splitting.
>
> Really, we are effectively "search.apache.org" (or I suppose "
> java-search.apache.org") and the lucene name as the TLP is just a
> legacy thing. We can have components (as does hc.apache.org) but Solr
> can't live without Lucene, so fostering a sense of separation is going to
> be bad for Solr.
>
> If someday we reach a point where some other library could swap into Solr
> to replace Lucene, then maybe.
>
> My opinion, YMMV :)
>
>
> On Wed, May 6, 2020 at 5:40 AM Simon Willnauer 
> wrote:
>
>> I can speak from experience that working with a snapshot is much
>> cleaner than working with submodules. We do this in elasticsearch for
>> a very long time now and our process here works just fine. It has a
>> bunch of advantages over a direct / source dependency like solr has
>> right now. I recall that someone else already mentioned some of them
>> like working on somewhat more stable codebase etc. do refactorings and
>> integration when there are people dedicated to it and have enough time
>> to do it properly.
>>
>> Regarding the effort of a split, I think that not doing something
>> because it's a lot of work will just cause a ton of issues down the
>> road. Doing the right thing is a lot of work that's for sure but we
>> can start working on this in baby steps an we can all help. Like we
>> can gradually do this, start with website, lists then build system
>> etc. or start with build first and do website last. It's ok to apply
>> progress over perfection here. We all want this to be done properly
>> and we are all here to help, at least I am.
>>
>> simon
>>
>> On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
>>  wrote:
>> >
>> > Except the logistics of enacting the split, I see no valid reason of
>> keeping the projects together. Git submodule is the magic that we have to
>> ease any potential discomfort. However, the effort needed to split feels
>> absolutely massive, so I'm not sure if it is worth the hassle.
>> >
>> > On Wed, 6 May, 2020, 1:31 pm Dawid Weiss, 
>> wrote:
>> >>
>> >> > If you go to lucene.apache.org, you'll see three things: Lucene
>> Core (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>> >>
>> >> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>> >>
>> >> 1. Lucene Core (Lucene with all it's modules)
>> >> 2. Solr
>> >> 3. PyLucene
>> >>
>> >> The thing is: (1) is already a TLP - that's just Lucene. My call is to
>> >> make (2) a TLP. (3) I can't tell much about because I don't know
>> >> PyLucene as well as I do Solr and Lucene... But it seems to me that
>> >> PyLucene fits much better under "Lucene" umbrella, even the name
>> >> suggests that.
>> >>
>> >>
>> >>
>> >> Dawid
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Gus Heck
>
> IMO, if we need to say “we can’t release X because it breaks Y”, or “we
> need to release X to be able to release Y”, the projects are not really
> independent, and “the PMCs will overlap” won’t take us very far.
>

This. I don't think the two really can be separated. Any separation will
merely be artificial, and/or an excuse for throwing stuff over the wall.
The sooner incompatibilities or difficulties are identified the better.
Definitely not in favor of splitting.

Really, we are effectively "search.apache.org" (or I suppose "
java-search.apache.org") and the lucene name as the TLP is just a
legacy thing. We can have components (as does hc.apache.org) but Solr can't
live without Lucene, so fostering a sense of separation is going to be bad
for Solr.

If someday we reach a point where some other library could swap into Solr
to replace Lucene, then maybe.

My opinion, YMMV :)


On Wed, May 6, 2020 at 5:40 AM Simon Willnauer 
wrote:

> I can speak from experience that working with a snapshot is much
> cleaner than working with submodules. We do this in elasticsearch for
> a very long time now and our process here works just fine. It has a
> bunch of advantages over a direct / source dependency like solr has
> right now. I recall that someone else already mentioned some of them
> like working on somewhat more stable codebase etc. do refactorings and
> integration when there are people dedicated to it and have enough time
> to do it properly.
>
> Regarding the effort of a split, I think that not doing something
> because it's a lot of work will just cause a ton of issues down the
> road. Doing the right thing is a lot of work that's for sure but we
> can start working on this in baby steps an we can all help. Like we
> can gradually do this, start with website, lists then build system
> etc. or start with build first and do website last. It's ok to apply
> progress over perfection here. We all want this to be done properly
> and we are all here to help, at least I am.
>
> simon
>
> On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
>  wrote:
> >
> > Except the logistics of enacting the split, I see no valid reason of
> keeping the projects together. Git submodule is the magic that we have to
> ease any potential discomfort. However, the effort needed to split feels
> absolutely massive, so I'm not sure if it is worth the hassle.
> >
> > On Wed, 6 May, 2020, 1:31 pm Dawid Weiss,  wrote:
> >>
> >> > If you go to lucene.apache.org, you'll see three things: Lucene Core
> (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
> >>
> >> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
> >>
> >> 1. Lucene Core (Lucene with all it's modules)
> >> 2. Solr
> >> 3. PyLucene
> >>
> >> The thing is: (1) is already a TLP - that's just Lucene. My call is to
> >> make (2) a TLP. (3) I can't tell much about because I don't know
> >> PyLucene as well as I do Solr and Lucene... But it seems to me that
> >> PyLucene fits much better under "Lucene" umbrella, even the name
> >> suggests that.
> >>
> >>
> >>
> >> Dawid
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Simon Willnauer
I can speak from experience that working with a snapshot is much
cleaner than working with submodules. We do this in elasticsearch for
a very long time now and our process here works just fine. It has a
bunch of advantages over a direct / source dependency like solr has
right now. I recall that someone else already mentioned some of them
like working on somewhat more stable codebase etc. do refactorings and
integration when there are people dedicated to it and have enough time
to do it properly.

Regarding the effort of a split, I think that not doing something
because it's a lot of work will just cause a ton of issues down the
road. Doing the right thing is a lot of work that's for sure but we
can start working on this in baby steps an we can all help. Like we
can gradually do this, start with website, lists then build system
etc. or start with build first and do website last. It's ok to apply
progress over perfection here. We all want this to be done properly
and we are all here to help, at least I am.

simon

On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
 wrote:
>
> Except the logistics of enacting the split, I see no valid reason of keeping 
> the projects together. Git submodule is the magic that we have to ease any 
> potential discomfort. However, the effort needed to split feels absolutely 
> massive, so I'm not sure if it is worth the hassle.
>
> On Wed, 6 May, 2020, 1:31 pm Dawid Weiss,  wrote:
>>
>> > If you go to lucene.apache.org, you'll see three things: Lucene Core 
>> > (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>>
>> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>>
>> 1. Lucene Core (Lucene with all it's modules)
>> 2. Solr
>> 3. PyLucene
>>
>> The thing is: (1) is already a TLP - that's just Lucene. My call is to
>> make (2) a TLP. (3) I can't tell much about because I don't know
>> PyLucene as well as I do Solr and Lucene... But it seems to me that
>> PyLucene fits much better under "Lucene" umbrella, even the name
>> suggests that.
>>
>>
>>
>> Dawid
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Ishan Chattopadhyaya
Except the logistics of enacting the split, I see no valid reason of
keeping the projects together. Git submodule is the magic that we have to
ease any potential discomfort. However, the effort needed to split feels
absolutely massive, so I'm not sure if it is worth the hassle.

On Wed, 6 May, 2020, 1:31 pm Dawid Weiss,  wrote:

> > If you go to lucene.apache.org, you'll see three things: Lucene Core
> (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>
> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>
> 1. Lucene Core (Lucene with all it's modules)
> 2. Solr
> 3. PyLucene
>
> The thing is: (1) is already a TLP - that's just Lucene. My call is to
> make (2) a TLP. (3) I can't tell much about because I don't know
> PyLucene as well as I do Solr and Lucene... But it seems to me that
> PyLucene fits much better under "Lucene" umbrella, even the name
> suggests that.
>
>
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Dawid Weiss
> If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene 
> with all it's modules), Solr and PyLucene. That's what I mean.

Hmm... Maybe I'm dim but that's essentially what I want to do. Look:

1. Lucene Core (Lucene with all it's modules)
2. Solr
3. PyLucene

The thing is: (1) is already a TLP - that's just Lucene. My call is to
make (2) a TLP. (3) I can't tell much about because I don't know
PyLucene as well as I do Solr and Lucene... But it seems to me that
PyLucene fits much better under "Lucene" umbrella, even the name
suggests that.



Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Tomás Fernández Löbbe
On Tue, May 5, 2020 at 12:37 PM Dawid Weiss  wrote:

> > I read “promotion to TLP” as if this was some achievement that needs to
> be celebrated now.
>
> I honestly believe it is an achievement for a project to receive
> top-level status. It's a sign of having a community of users,
> committers and processes mature enough to empower its further
> development.
>

My point is that this is not something new. Solr is a mature product and
has had the community and process in place for a long time.


>
> > It’s technically true that Solr is a subproject of Lucene, but so is
> Lucene Core, and I don’t see Lucene Core being promoted to TLP
>
> I don't think these are same magnitude components, sorry. I can name
> at least a few projects that depend on Lucene alone (core + extras)
> and I can name companies using Solr as a product but I can't name a
> single project that would depend on lucene-core alone (without any
> other lucene-* dependency). Maybe there is something like this but
> it's definitely an outlier example of a typical use case.
>

If you go to lucene.apache.org, you'll see three things: Lucene Core
(Lucene with all it's modules), Solr and PyLucene. That's what I mean.


>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Dawid Weiss
> I read “promotion to TLP” as if this was some achievement that needs to be 
> celebrated now.

I honestly believe it is an achievement for a project to receive
top-level status. It's a sign of having a community of users,
committers and processes mature enough to empower its further
development.

> It’s technically true that Solr is a subproject of Lucene, but so is Lucene 
> Core, and I don’t see Lucene Core being promoted to TLP

I don't think these are same magnitude components, sorry. I can name
at least a few projects that depend on Lucene alone (core + extras)
and I can name companies using Solr as a product but I can't name a
single project that would depend on lucene-core alone (without any
other lucene-* dependency). Maybe there is something like this but
it's definitely an outlier example of a typical use case.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Dawid Weiss
> Question: When Lucene no longer has the Solr test suite to help catch bugs, 
> how long time would it take from a Lucene commit, before Solr/ES Jenkins 
> instances would have had time to produce a build and run tests? Would it be 
> possible to setup a trigger in Solr Jenkins?

It depends how the code is organized after Lucene becomes a
subcomponent. If it's a regular dependency (on a *-SNAPSHOT version)
then the trigger would have to be dual (any commit on Solr or any
commit on Lucene). If the code is organized around a git submodule
with Lucene then bumping a version on a submodule would effectively
trigger a CI build. This "bumping" can be automated on certain
branches (such as master) so effectively it'd be immediately ready for
testing...

It's not really that relevant to this discussion but if you're curious
what this looks like I created an example submodule setup reflecting
current master here, try it:

git clone g...@github.com:dweiss/lucene-solr.git -b split/solr
cd lucene-solr/

you'll see the 'lucene/ folder is empty. It is a  submodule. When you issue:

git submodule status

you'll see which git revision that submodule is on:

-e5092db7915ac49d0ade0591e7b52176657c380c lucene

You can get the sub repositories in their respective versions by doing:

git submodule init
git submodule update

When you cd into lucene now you'll see it is a separate repository
(that things can be committed to, branches switched, etc.).

git status
HEAD detached at e5092db791
nothing to commit, working tree clean

Submodules in git have an extra advantage over snapshot dependencies:
they always point at a given revision of a submodule *exactly* so each
and every commit in the parent repository has exact versions of each
submodule recorded in git history. Of course not everything is rosy -
working with submodule-organized repositories does have a darker side
too (new git workflows to be learned, switching incompatible branches
can be tricky, etc.).

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Tomás Fernández Löbbe
I don’t agree with the argument “Solr outgrew being a subproject of
Lucene”. I read “promotion to TLP” as if this was some achievement that
needs to be celebrated now. Solr didn’t become a TLP years ago because the
decision then was to merge with Lucene development, thinking they would
progress better together than separated. It’s technically true that Solr is
a subproject of Lucene, but so is Lucene Core, and I don’t see Lucene Core
being promoted to TLP. They are both part of the same Apache project, which
for historical reasons is called Lucene.

> I would be curious if people can make the argument for keeping them
together...

I think the same arguments that were used 10 years ago to merge the
projects are as valid now, some of them presented in Dawid’s email. Faster
development, better coverage, code in the right places.[1]

IMO, if we need to say “we can’t release X because it breaks Y”, or “we
need to release X to be able to release Y”, the projects are not really
independent, and “the PMCs will overlap” won’t take us very far.

> The big question is this: “Is this the right time to split Solr and
Lucene into two independent projects?”.
This is not the question we should be asking ourselves right now. It
assumes the split is happening, and that’s what we are trying to discuss
here. The question in my mind is “Is splitting Lucene and Solr into
different project beneficial for them? Is this going to make them both
better?"

> As it is today, deveopers have had to do necessary Solr changes at the
same time when doing changes in Lucene. This is not really fair to the
(mainly) Lucene developers. It is not fair to Solr either, as such work
might be done in a hasty fashion and/or in a sub optimal way due to lack of
familiarity with Solr code base; like we unfortunately have seen a couple
of times in the past (not trying to blame anyone).

This, I agree, is a pain point for keeping them together. That said, while
not all, most currently active committers joined the project while this was
already a thing, it’s not something that was imposed later to the majority
of us.

> With Lucene as a dependency, Solr can choose to stay on same Lucene
version for a couple of releases while taking the time to work out the
proper way to adapt to changed Lucene APIs or to sort out performance
issues.

I agree with this and I believe it’s a point in favor of keeping them
together (and in part discussed 10 years ago when projects merged). Keeping
them on the same repo forces Solr to use the latest Lucene, helping find
issues/bugs soon, hopefully before they are released.


[1]
https://mail-archives.apache.org/mod_mbox/lucene-general/201002.mbox/%3c9ac0c6aa1002240832x1a8e3309k6799d75b8d19d...@mail.gmail.com%3e

On Tue, May 5, 2020 at 8:56 AM Michael McCandless 
wrote:

> On Tue, May 5, 2020 at 11:41 AM Jan Høydahl  wrote:
>
> As it is today, deveopers have had to do necessary Solr changes at the
>> same time when doing changes in Lucene. This is not really fair to the
>> (mainly) Lucene developers. It is not fair to Solr either, as such work
>> might be done in a hasty fashion and/or in a sub optimal way due to lack of
>> familiarity with Solr code base; like we unfortunately have seen a couple
>> of times in the past (not trying to blame anyone). With Lucene as a
>> dependency, Solr can choose to stay on same Lucene version for a couple of
>> releases while taking the time to work out the proper way to adapt to
>> changed Lucene APIs or to sort out performance issues.
>>
>
> +1, that is a great point, Jan.
>
> This will mean that the (any) necessary Solr source code changes that go
> along with a Lucene change will (sometimes) be done with higher quality,
> more thought, better expertise, etc., which I agree will be good for
> ongoing Solr development, help prevent accidental performance regressions,
> etc.  Net/net that's a big positive for Solr, in addition to having a
> stronger independent identity (https://solr.apache.org).
>
>
>> Question: When Lucene no longer has the Solr test suite to help catch
>> bugs, how long time would it take from a Lucene commit, before Solr/ES
>> Jenkins instances would have had time to produce a build and run tests?
>> Would it be possible to setup a trigger in Solr Jenkins?
>>
>
> That's a great question!
>
> Maybe Elasticsearch developers could chime in, since this already happened
> for them many times by now :)  I would think there are technical solutions
> to let the Solr CI build pull the latest Lucene snapshot build, to keep the
> latency lowish, but I do not know the details.
>
> Mike
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Tue, May 5, 2020 at 11:41 AM Jan Høydahl  wrote:

As it is today, deveopers have had to do necessary Solr changes at the same
> time when doing changes in Lucene. This is not really fair to the (mainly)
> Lucene developers. It is not fair to Solr either, as such work might be
> done in a hasty fashion and/or in a sub optimal way due to lack of
> familiarity with Solr code base; like we unfortunately have seen a couple
> of times in the past (not trying to blame anyone). With Lucene as a
> dependency, Solr can choose to stay on same Lucene version for a couple of
> releases while taking the time to work out the proper way to adapt to
> changed Lucene APIs or to sort out performance issues.
>

+1, that is a great point, Jan.

This will mean that the (any) necessary Solr source code changes that go
along with a Lucene change will (sometimes) be done with higher quality,
more thought, better expertise, etc., which I agree will be good for
ongoing Solr development, help prevent accidental performance regressions,
etc.  Net/net that's a big positive for Solr, in addition to having a
stronger independent identity (https://solr.apache.org).


> Question: When Lucene no longer has the Solr test suite to help catch
> bugs, how long time would it take from a Lucene commit, before Solr/ES
> Jenkins instances would have had time to produce a build and run tests?
> Would it be possible to setup a trigger in Solr Jenkins?
>

That's a great question!

Maybe Elasticsearch developers could chime in, since this already happened
for them many times by now :)  I would think there are technical solutions
to let the Solr CI build pull the latest Lucene snapshot build, to keep the
latency lowish, but I do not know the details.

Mike


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Jan Høydahl
Thanks for bringing it up Dawid.

I’ve asked myself the same question several times over the last couple of 
years, and have kind of been waiting for someone to make the proposal :)
In my head, Solr has out-grown being a sub project of Lucene, like hadoop, 
mahout, nutch and tika before it.

The move will promote Solr as a separate TLP with better visibility and more 
autonomy.
Simply put Solr will go from https://lucene.apache.org/solr/ to 
https://solr.apache.org/

Splitting will be lots of work for sure, but I am not worried about the future 
relationship between the two. The last couple of years most og us have already 
done LUCENE and SOLR changes in separate Jiras and separate patches, first 
committing changes to LUCENE before the related SOLR change. It will be more or 
less the same approach after the split, just that there will be a couple of 
days between the Lucene release and the next Solr release depending on it.

As it is today, deveopers have had to do necessary Solr changes at the same 
time when doing changes in Lucene. This is not really fair to the (mainly) 
Lucene developers. It is not fair to Solr either, as such work might be done in 
a hasty fashion and/or in a sub optimal way due to lack of familiarity with 
Solr code base; like we unfortunately have seen a couple of times in the past 
(not trying to blame anyone). With Lucene as a dependency, Solr can choose to 
stay on same Lucene version for a couple of releases while taking the time to 
work out the proper way to adapt to changed Lucene APIs or to sort out 
performance issues.

Question: When Lucene no longer has the Solr test suite to help catch bugs, how 
long time would it take from a Lucene commit, before Solr/ES Jenkins instances 
would have had time to produce a build and run tests? Would it be possible to 
setup a trigger in Solr Jenkins?

Jan

> 4. mai 2020 kl. 11:10 skrev Dawid Weiss :
> 
> Dear Lucene and Solr developers!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Doug Turnbull
Personally I feel the burden of proof should not be why they should be
split up, but the other way - "what arguments can be made for keeping them
together?"

I would be curious if people can make the argument for keeping them
together...

-Doug

On Tue, May 5, 2020 at 10:29 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh  wrote:
>
> I think separating the git repository and even the release schedules could
>> be done under the same TLP.
>>
> It would solve most of the technical issues reflected in the first mail
>> and there would be more time and data to
>>
>
> Hmm that is technically true, and in fact that is the way it was before 10
> years ago: Solr was a sub-project of Apache Lucene.
>
> But that is not the proposal here.
>
> Lucene and Solr have become such major efforts, in developers and users
> eyes and keyboard effort/time, that they really are very different entities
> now.  TLP makes sense to me for each project.
>
>>
>
>> see if creating Apache Solr again is something the PMC would want to do
>>
>
> Hmm, just to clarify, this is not an "again" sort of situation: Solr was
> not a top-level project before.  It was and still is a sub-project of
> Apache Lucene.
>
> And the proposal is to now split it out as its own (new) top-level
> project, Apache Solr.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
*Doug Turnbull **| CTO* | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search ; Contributor: *AI
Powered Search *
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Mon, May 4, 2020 at 2:13 PM Dawid Weiss  wrote:

> This sounds like a decision has already been made.
>
> No. I plan to send a VOTE thread nonetheless. A vote thread is just
> that -- a vote. If majority decides both projects
> should stay together it's still a decision. A discussion without any
> resolution is going to dissolve over time into no resolution at all.
>

No decision has been made.

The point of a DISCUSS thread, prior to a VOTE thread, is for all
interested parties to voice their diverse reactions to this proposal, and
help the binding voters (Lucene/Solr committers) make up their minds about
how to vote on the VOTE thread.  We have a delightfully diverse community
here who will all contribute in choosing our path forward.


> Separately from that I think Solr has become older, larger and is an
> industry standard search component. It is time for it to mature and
> just be a top-level Apache project even from public-relations point of
> view.
>

+1

I feel Solr as its own Apache top-level project is actually long overdue:
Solr has clearly been a leading standard open-search distributed search
engine for quite some time already, with its own strong user and developer
identities and culture.  We long ago achieved the goals (paying down
open-source tech debt) of merging the two projects a decade ago.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh  wrote:

I think separating the git repository and even the release schedules could
> be done under the same TLP.
>
It would solve most of the technical issues reflected in the first mail and
> there would be more time and data to
>

Hmm that is technically true, and in fact that is the way it was before 10
years ago: Solr was a sub-project of Apache Lucene.

But that is not the proposal here.

Lucene and Solr have become such major efforts, in developers and users
eyes and keyboard effort/time, that they really are very different entities
now.  TLP makes sense to me for each project.

>

> see if creating Apache Solr again is something the PMC would want to do
>

Hmm, just to clarify, this is not an "again" sort of situation: Solr was
not a top-level project before.  It was and still is a sub-project of
Apache Lucene.

And the proposal is to now split it out as its own (new) top-level project,
Apache Solr.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Gézapeti Cseh
I think separating the git repository and even the release schedules could
be done under the same TLP.
It would solve most of the technical issues reflected in the first mail and
there would be more time and data to see if creating Apache Solr again is
something the PMC would want to do

gp


On Mon, May 4, 2020 at 8:20 PM Dawid Weiss  wrote:

> Perhaps I didn't clarify this so far: my own interests (personal and
> business) are shared equally between Solr and Lucene (we have products
> that have plain Lucene underneath and we maintain products and systems
> that use Solr).  So I am going to have a foot in both worlds no matter
> the outcome. I really do feel confident both Lucene and Solr would
> have a breath of fresh air if they were independent (smaller).
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Dawid Weiss
Perhaps I didn't clarify this so far: my own interests (personal and
business) are shared equally between Solr and Lucene (we have products
that have plain Lucene underneath and we maintain products and systems
that use Solr).  So I am going to have a foot in both worlds no matter
the outcome. I really do feel confident both Lucene and Solr would
have a breath of fresh air if they were independent (smaller).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Dawid Weiss
> This sounds like a decision has already been made.

No. I plan to send a VOTE thread nonetheless. A vote thread is just
that -- a vote. If majority decides both projects
should stay together it's still a decision. A discussion without any
resolution is going to dissolve over time into no resolution at all.

> Additionally, all of the counterarguments presented come with rebuttals 
> attached, so I'm not sure if this is supposed to be a persuasive case or an 
> expositional one.

This thread is for discussion, please expand the counterarguments with
any point of view you like. I did include counterarguments I collected
from private mailing list.

> I think I have an initial reaction that I'm opposed to a split, but I'm not 
> yet concretely sure why.

Like I said multiple times I see this as a reasonable technical
decision and I don't think the community (communities?) will suffer
much because of this. This is not a hostile code fork or an attempt to
hijack developers. Whoever has interest in Solr and Lucene will still
be a Solr and Lucene developer. I really don't think that much will
change.

My point of view crystallised because of the build system work - I
admit this freely. The ant one is hair-bending. The gradle one is
inconvenient like hell when you have effectively two "top-level"
projects to handle within the same configuration. When I started
looking at other aspects I became convinced this is the right way to
go.

Separately from that I think Solr has become older, larger and is an
industry standard search component. It is time for it to mature and
just be a top-level Apache project even from public-relations point of
view.

> This seems like an argument for fixing the tests and making them faster, I'm 
> not sure how we get to splitting the projects from here. If you're doing Solr 
> only changes, it's pretty easy to run "./gradlew -p solr test" and skip the 
> lucene tests, similar for lucene only development.

Nah, this isn't true. All CI jobs, github, etc. - everything is
checked and verified and extends things twice more than it should.

> > Mailing lists, build servers
> This is probably a good idea and I think this is easy enough to do without 
> splitting the project as well.

They are already separated to a large degree. The only thing in common
is dev list an even there threads are really split between discussions
concerning Solr and Lucene functionality.

> > Solr tests are the first “battlefield” test zone for Lucene changes
> I think https://issues.apache.org/jira/browse/SOLR-14428 is a great example 
> of the kind of collaboration that we can see, and a good hint of what to 
> expect if the projects are split. To summarize, there was a Lucene change 
> which caused some issues in Solr. The fix is likely going to end up being 
> another Lucene change, but just as easily could have been a kind of ugly 
> workaround on the Solr side.

Maybe. There are a lot of maybes. I still think a split would make
things easier. For example the ugly workaround could go into an
immediate bugfix release for Solr, followed by a patch to Lucene and a
proper fix later on. Now you can't do an immediate bugfix/ workaround
Solr release without a corresponding Lucene release (which doesn't
make sense to me at all).

Oh, and don't get me wrong - I understand you can have doubts. I am
prepared to defend my position because it's been growing in me for a
few months now; I have been digesting this for a longer time and it
probably makes a difference.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Mike Drob
This is an interesting approach, Michael. I took it a bit further by
excluding all authors with only a single commit[1], since I think GitHub
PRs tend to highlight that kind of contribution more. Since 2012 I found 24
lucene-only, 31 solr-only, and 77 (about 58%) contributing to both. Since
2018, excluding authors with a single commit, the number went down to 51%
of the authors with commits to both projects. But, I think that speaks to a
very high degree of collaboration in my opinion.


Dawid, thank you for putting this together. It has obviously been carefully
thought over, and there's a lot of content, so I'm not going to try to
comment on everything, but will highlight a few things that caught my
attention.


> This is a DISCUSS thread and it will be followed next week by a VOTE
thread.
This sounds like a decision has already been made. Additionally, all of the
counterarguments presented come with rebuttals attached, so I'm not sure if
this is supposed to be a persuasive case or an expositional one.
I think I have an initial reaction that I'm opposed to a split, but I'm not
yet concretely sure why.

> Precommit/ test times. These are crazy high.
This seems like an argument for fixing the tests and making them faster,
I'm not sure how we get to splitting the projects from here. If you're
doing Solr only changes, it's pretty easy to run "./gradlew -p solr test"
and skip the lucene tests, similar for lucene only development.

> Mailing lists, build servers
This is probably a good idea and I think this is easy enough to do without
splitting the project as well.

> Solr should have its own cadence of releases driven by features, not
sub-component changes
Yea, I think this is very likely to happen, where new Lucene versions may
not immediately get integrated into the next Solr version, or perhaps not
at all, unless somebody is specifically interested in a feature that it
offers. I think developers are busy, and incrementing a dependency version
is not something that happens unless there is a tangible reason. Which
leads directly into the next point...

> Solr tests are the first “battlefield” test zone for Lucene changes
I think https://issues.apache.org/jira/browse/SOLR-14428 is a great example
of the kind of collaboration that we can see, and a good hint of what to
expect if the projects are split. To summarize, there was a Lucene change
which caused some issues in Solr. The fix is likely going to end up being
another Lucene change, but just as easily could have been a kind of ugly
workaround on the Solr side.

I think the points and counterpoints are essentially correct, but the
opening statement appears to undersell the counterarguments as a matter of
degree, in my view. I'll continue to think on this, and post more as ideas
solidify in my head.

[1]: git shortlog -s -n --since=2018 | grep -v '\s1\s' | cut -c7-

On Mon, May 4, 2020 at 9:49 AM Michael Sokolov  wrote:

> I always like to look at data when making a big decision, so I
> gathered some statistics about authors and commits to git over the
> history of the project. I wanted to see what these statistics could
> tell us about the degree of overlap between the two projects and
> whether it has changed over time. Using commands like
>
>  git log --pretty=%an --since=2012 --lucene
>  git log --pretty=%an --since=2012 --solr
>
> I looked at the authors of commits in the lucene and solr top-level
> folders of the project. I think this makes a reasonable proxy for
> contributors to the two projects. From there I found that since 2012,
> there are 60 Lucene-only authors, 71 Solr-only authors, and 101
> authors (or 43%) contributing at least one commit to each project.
> Since 2018, the percentage of both-project authors is somewhat lower:
> 36%.
>
> I also looked at commits spanning both projects. I'm not sure this
> captures all the work that touches both projects, but it's a window
> into that, at least. I found that since 2012, 1387/19063 (6.8%) of
> commits spanned both project folders. Since 2018, 7.4% did.
>
> I don't think you can really draw very many meaningful conclusions
> from this, but a few things jump out: First, it is clear that these
> projects are not completely separate today. A substantial number of
> people commit to both, over time, although most people do not. Also,
> relatively few commits span both projects. Some do though, and it's
> certainly worth considering what the workflow for such changes would
> be like in the split world. Maybe a majority of these are
> build-related; it's hard to tell from this coarse analysis.
>
>
> On Mon, May 4, 2020 at 5:11 AM Dawid Weiss  wrote:
> >
> > Dear Lucene and Solr developers!
> >
> > A few days ago, I initiated a discussion among PMC members about
> > potential pros and cons of splitting the project into separate Lucene
> > and Solr entities by promoting Solr to its own top-level Apache
> > project (TLP). Let me share with you the motivation for such an action
> > and some 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Shai Erera
Interesting data Michael. I am not sure though that the shared commits tell
us that there are people that contribute to both projects. Eventually, an
API change/update in Lucene will require a change in Solr (but not vice
versa). Those commits will still occur in both projects, only on the Solr
side they will occur when Solr will upgrade to the respective Lucene
version.

I wonder if we can tell, out of the shared commits, how many started in
Lucene and ended in Solr because of the shared build (i.e. an API change
required Solr code changes for the build to pass), vs how many started in
Solr, and ended in Lucene because a core change was needed to support the
Solr feature/update. The first case does not indicate, IMO, a shared
contribution (whoever changes a Lucene API will not then go and update Solr
and Elasticsearch if the projects were split), while the second case is a
stronger indication of a shared contribution.

Maybe if we could "label" committers as mostly Lucene/Solr, we could tell
more about the shared commits?

Anyway, data is good, I agree.

Shai

On Mon, May 4, 2020 at 5:49 PM Michael Sokolov  wrote:

> I always like to look at data when making a big decision, so I
> gathered some statistics about authors and commits to git over the
> history of the project. I wanted to see what these statistics could
> tell us about the degree of overlap between the two projects and
> whether it has changed over time. Using commands like
>
>  git log --pretty=%an --since=2012 --lucene
>  git log --pretty=%an --since=2012 --solr
>
> I looked at the authors of commits in the lucene and solr top-level
> folders of the project. I think this makes a reasonable proxy for
> contributors to the two projects. From there I found that since 2012,
> there are 60 Lucene-only authors, 71 Solr-only authors, and 101
> authors (or 43%) contributing at least one commit to each project.
> Since 2018, the percentage of both-project authors is somewhat lower:
> 36%.
>
> I also looked at commits spanning both projects. I'm not sure this
> captures all the work that touches both projects, but it's a window
> into that, at least. I found that since 2012, 1387/19063 (6.8%) of
> commits spanned both project folders. Since 2018, 7.4% did.
>
> I don't think you can really draw very many meaningful conclusions
> from this, but a few things jump out: First, it is clear that these
> projects are not completely separate today. A substantial number of
> people commit to both, over time, although most people do not. Also,
> relatively few commits span both projects. Some do though, and it's
> certainly worth considering what the workflow for such changes would
> be like in the split world. Maybe a majority of these are
> build-related; it's hard to tell from this coarse analysis.
>
>
> On Mon, May 4, 2020 at 5:11 AM Dawid Weiss  wrote:
> >
> > Dear Lucene and Solr developers!
> >
> > A few days ago, I initiated a discussion among PMC members about
> > potential pros and cons of splitting the project into separate Lucene
> > and Solr entities by promoting Solr to its own top-level Apache
> > project (TLP). Let me share with you the motivation for such an action
> > and some follow-up thoughts I heard from other PMC members so far.
> >
> > Please read this e-mail carefully. Both the PMC and I look forward to
> > hearing your opinion. This is a DISCUSS thread and it will be followed
> > next week by a VOTE thread. This is our shared project and we should
> > all shape its future responsibly.
> >
> > The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
> >
> > Here are several technical considerations that drove me to ask the
> > question above (in no order of priorities):
> >
> > 1) Precommit/ test times. These are crazy high. If we split into two
> > projects we can pretty much cut all of Lucene testing out of Solr (and
> > likewise), making development a bit more fun again.
> >
> > 2) Build system itself and source release packaging. The current
> > combined codebase is a *beast* to maintain. Working with gradle on
> > both projects at once made me realise how little the two have in
> > common. The code layout, the dependencies, even the workflow of people
> >
> > working on these projects... The build (both ant and gradle) is full
> > of Solr and Lucene-specific exceptions and hooks that could be more
> > elegantly solved if moved to each project independently.
> >
> > 3) Packaging. There is no single source distribution package for
> > Solr+Lucene. They are already "independent" there. Why should Lucene
> > and Solr always be released at the same pace? Does it always make
> > sense?
> >
> > 4) Solr is essentially taking in Lucene and its dependencies as a
> > whole (so is Elasticsearch and many other projects). In my opinion
> > this makes Lucene eligible for refactoring and
> >
> > maintenance as a separate component. The learning curve for people
> > coming to each 

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Michael Sokolov
I always like to look at data when making a big decision, so I
gathered some statistics about authors and commits to git over the
history of the project. I wanted to see what these statistics could
tell us about the degree of overlap between the two projects and
whether it has changed over time. Using commands like

 git log --pretty=%an --since=2012 --lucene
 git log --pretty=%an --since=2012 --solr

I looked at the authors of commits in the lucene and solr top-level
folders of the project. I think this makes a reasonable proxy for
contributors to the two projects. From there I found that since 2012,
there are 60 Lucene-only authors, 71 Solr-only authors, and 101
authors (or 43%) contributing at least one commit to each project.
Since 2018, the percentage of both-project authors is somewhat lower:
36%.

I also looked at commits spanning both projects. I'm not sure this
captures all the work that touches both projects, but it's a window
into that, at least. I found that since 2012, 1387/19063 (6.8%) of
commits spanned both project folders. Since 2018, 7.4% did.

I don't think you can really draw very many meaningful conclusions
from this, but a few things jump out: First, it is clear that these
projects are not completely separate today. A substantial number of
people commit to both, over time, although most people do not. Also,
relatively few commits span both projects. Some do though, and it's
certainly worth considering what the workflow for such changes would
be like in the split world. Maybe a majority of these are
build-related; it's hard to tell from this coarse analysis.


On Mon, May 4, 2020 at 5:11 AM Dawid Weiss  wrote:
>
> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
>
>
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
>
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
>
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
>