Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread David Smiley
I agree with Doug that the burden of proof is on keeping the codebases
together instead of the reverse.  I liken it to a marriage; it has to work
well for both parties.It seems to be mostly beneficial for Solr but
much less so for Lucene.

BTW an even better example than the huge FuzzyQuery case was the loss of an
entire postings format that Solr was using -- LUCENE-9116
.  That one was caught
thanks to Solr tests and prevented the release.  The huge FuzzyQuery, on
the other hand, was released.  I hope that with a split project, we're able
to do Solr side tests quickly enough prior to Lucene doing releases.  I
wonder if ElasticSearch tries to do this on their side too; does it?

An idea just occurred to me that may help make a split nicer for Solr than
it is today.  Solr could use a branch of the Lucene project that's used for
the Solr project.  That's just impossible today due to the single
codebase.  This affords the possibility of changes that are not endorsed on
the Lucene side (i.e. that would not make it into a real Lucene release).
An example of this are API changes like LUCENE-8159
 or perhaps making some
classes public so that Solr can access them without awkward hacks.  Put
differently, like some companies maintain forks of Lucene/Solr, in the
future, Solr should be able to have its fork of Lucene likewise.  Should
this approach be adopted, Solr would want to keep this to a minimum to keep
upkeep of the branch low, and the branch _would_ need upkeep (e.g. running
tests), so it's not a total panacea.  On the other hand, if Solr strictly
only releases with released Lucene versions, then this is way nicer from a
versioning and artifact management (i.e. publishing to Maven) point of
view.  It's nice to have options.

~ David


On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:

> > The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
>
> Sounds like there are quite a few tasks to complete to get this done.
> Splitting the build and codebase. Presumably a bunch of administration
> within Apache/the PMC. Setting up infrastructure etc.
>
> These are the costs, to be paid up front in the currency of someone's
> time. The benefits are less clear. Faster build times and easier
> maintenance sound attractive, but when will those benefits be visible?
> Next month? Or in a year?
>
> Whoever will be doing this work should probably ask themselves the
> questions: is this the best use of their time?
>
>  - Bram
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Adrien Grand
On Sun, May 10, 2020 at 8:20 AM David Smiley 
wrote:

> I wonder if ElasticSearch tries to do this on their side too; does it?
>

Yes, Elasticsearch regularly upgrades to new snapshots of Lucene[1][2],
often multiple times per minor version. It helps give Lucene more test and
performance coverage, and also makes it easier for us to identify which
particular Lucene change contributed to an improvement or regression in
Elasticsearch.

[1]
https://github.com/elastic/elasticsearch/search?o=desc=lucene+snapshot=author-date=Commits
[2] https://github.com/elastic/elasticsearch/pull/56175

-- 
Adrien


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Bram Van Dam
On 10/05/2020 08:20, David Smiley wrote:
> An idea just occurred to me that may help make a split nicer for Solr
> than it is today.  Solr could use a branch of the Lucene project that's
> used for the Solr project.

Maybe I'm alone in this, but (better) Lucene compatibility is one of the
reasons why our company chose Solr over ElasticSearch.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Mike Drob
Solr maintaining a fork of Lucene sounds like exactly the situation that
let to the original merge, where there are two sets of divergent development

On Sun, May 10, 2020 at 1:20 AM David Smiley 
wrote:

> I agree with Doug that the burden of proof is on keeping the codebases
> together instead of the reverse.  I liken it to a marriage; it has to work
> well for both parties.It seems to be mostly beneficial for Solr but
> much less so for Lucene.
>
> BTW an even better example than the huge FuzzyQuery case was the loss of
> an entire postings format that Solr was using -- LUCENE-9116
> .  That one was caught
> thanks to Solr tests and prevented the release.  The huge FuzzyQuery, on
> the other hand, was released.  I hope that with a split project, we're able
> to do Solr side tests quickly enough prior to Lucene doing releases.  I
> wonder if ElasticSearch tries to do this on their side too; does it?
>
> An idea just occurred to me that may help make a split nicer for Solr than
> it is today.  Solr could use a branch of the Lucene project that's used for
> the Solr project.  That's just impossible today due to the single
> codebase.  This affords the possibility of changes that are not endorsed on
> the Lucene side (i.e. that would not make it into a real Lucene release).
> An example of this are API changes like LUCENE-8159
>  or perhaps making
> some classes public so that Solr can access them without awkward hacks.
> Put differently, like some companies maintain forks of Lucene/Solr, in the
> future, Solr should be able to have its fork of Lucene likewise.  Should
> this approach be adopted, Solr would want to keep this to a minimum to keep
> upkeep of the branch low, and the branch _would_ need upkeep (e.g. running
> tests), so it's not a total panacea.  On the other hand, if Solr strictly
> only releases with released Lucene versions, then this is way nicer from a
> versioning and artifact management (i.e. publishing to Maven) point of
> view.  It's nice to have options.
>
>
> ~ David
>
>
> On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:
>
>> > The big question is this: “Is this the right time to split Solr and
>> > Lucene into two independent projects?”.
>>
>> Sounds like there are quite a few tasks to complete to get this done.
>> Splitting the build and codebase. Presumably a bunch of administration
>> within Apache/the PMC. Setting up infrastructure etc.
>>
>> These are the costs, to be paid up front in the currency of someone's
>> time. The benefits are less clear. Faster build times and easier
>> maintenance sound attractive, but when will those benefits be visible?
>> Next month? Or in a year?
>>
>> Whoever will be doing this work should probably ask themselves the
>> questions: is this the best use of their time?
>>
>>  - Bram
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Shawn Heisey

On 5/10/2020 3:41 PM, Michael McCandless wrote:
I think the costs (I agree: they are high) are a one-time thing, while 
the benefits are long term, and accrue/multiply with time.  We should 
make decisions like this with the long-term benefits in mind.


I expect Lucene and Solr to have long healthy lives ahead, and that 
means this one-time cost will eventually be amortized and made 
miniscule/negligible compared to the long-term benefits to both projects.


+1

I think that those with a primary interest in Lucene would be strongly 
in favor of this split.  I think the presence of Solr in the codebase 
has diluted Lucene's releases, making them come far too quickly.  I 
would bet that without Solr, Lucene would probably be somewhere in 6.x, 
not 8.x.


My personal interests are with Solr.  I have little interest in Lucene 
code.  I think it would be beneficial in both directions to have them be 
separate.


To protect against the divergence which prompted the joining of the two 
codebases, I do think it would be a good idea for a few committers to 
remain with both projects, but I can say unequivocally that if the 
projects split, I will only want to keep those privileges on a new Solr TLP.


Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Thu, May 7, 2020 at 1:07 PM Bram Van Dam  wrote:

> The big question is this: “Is this the right time to split Solr and
> > Lucene into two independent projects?”.
>
> Sounds like there are quite a few tasks to complete to get this done.
> Splitting the build and codebase. Presumably a bunch of administration
> within Apache/the PMC. Setting up infrastructure etc.
>

+1

These are the costs, to be paid up front in the currency of someone's
> time. The benefits are less clear. Faster build times and easier
> maintenance sound attractive, but when will those benefits be visible?
> Next month? Or in a year?
>

I think the costs (I agree: they are high) are a one-time thing, while the
benefits are long term, and accrue/multiply with time.  We should make
decisions like this with the long-term benefits in mind.

I expect Lucene and Solr to have long healthy lives ahead, and that means
this one-time cost will eventually be amortized and made
miniscule/negligible compared to the long-term benefits to both projects.


> Whoever will be doing this work should probably ask themselves the
> questions: is this the best use of their time?
>

+1

Also, since we "just" completed the Gradle migration in master, hopefully
that is still fresh on people's minds, and separating the Lucene and Solr
builds will then be easier.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Wed, May 6, 2020 at 4:24 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

Offtopic: Gus, I'm looking at Vespa core for Solr someday
>

+1, Vespa looks really fascinating!  Plus it is released under ASL 2 as
well.  And the world clearly needs more open-source search engines.

Mike McCandless

http://blog.mikemccandless.com

>


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Tue, May 5, 2020 at 4:59 PM Tomás Fernández Löbbe 
wrote:

On Tue, May 5, 2020 at 12:37 PM Dawid Weiss  wrote:
>
>> > I read “promotion to TLP” as if this was some achievement that needs to
>> be celebrated now.
>>
>> I honestly believe it is an achievement for a project to receive
>> top-level status. It's a sign of having a community of users,
>> committers and processes mature enough to empower its further
>> development.
>>
>
+1


> My point is that this is not something new. Solr is a mature product and
> has had the community and process in place for a long time.
>

I agree it's not new.  But that, to me, means that we have already waited
too long to promote Solr up to its own top-level Apache project.

This should have been done long ago.

Mike McCandless

http://blog.mikemccandless.com


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Jan Høydahl
Better stick to Lucene snapshot versions. Commit the Lucene change first, then 
solr. If the Lucene change is not mature enough to commit to Lucene, it is 
probably not mature enough for Solr either. Avoid hacks or forks, spend some 
longer time to get it right.

If things get removed from Lucene and we want to support it in Solr for another 
major version, consider moving the code to solr git, under the Lucene package 
namespace. But it should not be something we do often.

Jan

>> 10. mai 2020 kl. 21:23 skrev Gus Heck :
> 
> 
>>> On Sun, May 10, 2020 at 11:55 AM Mike Drob  wrote:
>> Solr maintaining a fork of Lucene sounds like exactly the situation that let 
>> to the original merge, where there are two sets of divergent development
> 
> Exactly


Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Gus Heck
On Sun, May 10, 2020 at 11:55 AM Mike Drob  wrote:

> Solr maintaining a fork of Lucene sounds like exactly the situation that
> let to the original merge, where there are two sets of divergent development
>

Exactly


Getting warnings out

2020-05-10 Thread Erick Erickson
I’m really struggling with what to do with compiler warnings, particularly all 
the rawtypes and unchecked warnings.

On the one hand, the simple mechanical thing to do would be to SuppressWarnings 
on each one that exists presently. Frankly that feels pretty useless; that 
would preserve poor code forever.

OTOH, actually _fixing_ the issues to not have, say, rawtypes is going to be 
time consuming and error-prone. Especially since I don’t really understand all 
the nuances yet and learning them one by one will introduce serious errors 
without doubt.

So here’s what I propose. Even though it feels useless, just SuppressWarnings 
on anything that’s not a simple fix. Then start failing builds on these 
warnings to catch any that come in in future. At least that way there’ll be 
some incentive to keep the code from getting _worse_, although people will 
still be able to just add SuppressWarnings to the mix I suppose.

The number of raw NamedList member variables we have is overwhelming all by 
itself….

Comments?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Getting warnings out

2020-05-10 Thread David Smiley
Can't we customize the linting to disregard entire categories of certain
warnings for now?  This makes your task manageable.
https://discuss.gradle.org/t/recompile-with-xlint-parameters/25279

~ David


On Sun, May 10, 2020 at 10:41 PM Erick Erickson 
wrote:

> I’m really struggling with what to do with compiler warnings, particularly
> all the rawtypes and unchecked warnings.
>
> On the one hand, the simple mechanical thing to do would be to
> SuppressWarnings on each one that exists presently. Frankly that feels
> pretty useless; that would preserve poor code forever.
>
> OTOH, actually _fixing_ the issues to not have, say, rawtypes is going to
> be time consuming and error-prone. Especially since I don’t really
> understand all the nuances yet and learning them one by one will introduce
> serious errors without doubt.
>
> So here’s what I propose. Even though it feels useless, just
> SuppressWarnings on anything that’s not a simple fix. Then start failing
> builds on these warnings to catch any that come in in future. At least that
> way there’ll be some incentive to keep the code from getting _worse_,
> although people will still be able to just add SuppressWarnings to the mix
> I suppose.
>
> The number of raw NamedList member variables we have is overwhelming all
> by itself….
>
> Comments?
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>