Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-16 Thread Dennis Gove
There’s something enticing when thinking of Lucene and Solr as independent codebases. I’ve always thought of Lucene as core search (indexing, analysis, tokenization, etc…) and Solr as a search experience. Lucene is more a library (or set of libraries) used by applications providing search

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-15 Thread Dawid Weiss
Hi Christine! > * After a while (perhaps with Lucene 10.0 or perhaps at some other natural > point) we re-arrive at the "together or separate" question. If splitting > worked well then Solr promotion to TLP could be a natural next step My whole point is that I think the split is by large

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Doug Turnbull
Perhaps Christine! That's a nice idea! On naming, it would have to be probably something snazzier than "Search" as you get at. It would probably not be a good trademark, and would imply that Lucene & Solr are the only things in ASF that could be "Search". Who knows, one day Vespa or something

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Anshum Gupta
Thanks Christine! I genuinely like this idea. This actually gets us what we want without having to handle everything at the same time, and also giving us time to see if the split is working or not. This process also ensures that both, Lucene and Solr maintain the symbiotic relationship at least

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Christine Poerschke
Perhaps a bit of a wildcard question or thought ... would any split out top-level project necessarily be called "Apache Solr" or could the split out project be called "Apache " with "Apache Solr" as its initial sub-project and over time there may be other sub-projects added? No particular name

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-14 Thread Christine Poerschke
Hello. The discussion subject here has two parts i.e. "Lucene-Solr split" and "Solr promoted to TLP" and I'd be curious what doing the former separately ahead of the latter might look like and/or if consensus around that would be different? Thinking aloud, as a hypothetical scenario like. *

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
> Would this not be eased to some extent if the initial committer base of both > the projects was the same? "Who has commit karma to a project" is a separate question from "Who will make commits in practice". Having Lucene committers retain their status as Solr committers only helps if they're

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Dawid Weiss
> This might sound a bit harsh, but maybe Lucene devs helping with Solr has let > Solr off the hook a bit too much? I actually like the fact that the split > causes Solr to figure out it's own situation and focus on its problems. Well said. > Take our ongoing test flakiness woes and SolrCloud

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Dawid Weiss
> Would this not be eased to some extent if the initial committer base > of both the projects was the same? This is what I originally suggested; somebody (can't remember who) said it should be voluntary. I'm really open to either option. D.

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Atri Sharma
Would this not be eased to some extent if the initial committer base of both the projects was the same? On Wed, May 13, 2020 at 10:44 PM Jason Gerlowski wrote: > > There's nothing wrong with a harsh "sink or swim" approach if the > risks are bearable. If the worst case risk here is that we have

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
There's nothing wrong with a harsh "sink or swim" approach if the risks are bearable. If the worst case risk here is that we have a few rough releases as we smooth out the process, I'm all on board with "sink or swim". But by the same token - "sink or swim" gets less appealing as the risks

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jan Høydahl
I choose to be more optimistic wrt «Solr committers» ability to integrate new and changed Lucene APIs in Solr. You do not need to be a Lucene committer in order to learn how to USE the Lucene APIs, and I believe there are several «Solr committers» who already posess those skills and are pretty

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Doug Turnbull
Jason, I hear your arguments and think of them FOR a split This might sound a bit harsh, but maybe Lucene devs helping with Solr has let Solr off the hook a bit too much? I actually like the fact that the split causes Solr to figure out it's own situation and focus on its problems. Regardless of

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-13 Thread Jason Gerlowski
Wanted to add my two cents to the mix, though I'm a little late as the vote has already progressed pretty far. I'm against a split. From the points raised, I agree that Lucene has much to gain. But Solr has a lot to lose. Lucene devs would be freed from keeping Solr usage up to date. That's a

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Namgyu Kim
It's hard to make a decision because it seems to have pros and cons. Basically, I agree to separate but there are some questions. So I don't not vote right now. 1) Release version Currently, versions of Lucene and Solr are aligned, how will they be managed in the future? Other people took

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Doug Turnbull
I'll give a perspective that comes more from the user's / "market" point of view as at OSC we onboard lots of new organizations into Solr. - Most new users incorrectly think of Solr as an independent Apache project, and many will have little knowledge or awareness of Lucene itself until given the

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-12 Thread Alan Woodward
One advantage I find with the way Elasticsearch and Lucene interact is that ES doesn’t depend on the master branch. We upgrade our master branch frequently to keep up to date with the latest release branch, and that lets us find regressions or API problems pretty quickly, but it also insulates

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Atri Sharma
My two cents: As a Lucene heavy developer, I have several found maintaining Solr dependencies while making large changes a bit cumbersome. I believe Lucene and Solr should exist in a symbiotic relationship but not tightly coupled with each other. On Mon, May 11, 2020 at 7:22 PM Erik Hatcher

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Erik Hatcher
Without reading much or replying to any specific points made on this thread, here's my raw thoughts on this age-old topic (finally coming out of my cocoon after taking things in for a bit) Solr is a search -server- with distributed capabilities, that leverages the magic of Lucene

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Bram Van Dam
On 11/05/2020 15:09, Dawid Weiss wrote: >> Maybe I'm alone in this, but (better) Lucene compatibility is one of the >> reasons why our company chose Solr over ElasticSearch. > > I fail to see anything supporting superior Lucene > compatibility of one vs. another. Yeah you're right. It's since

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Dawid Weiss
> Maybe I'm alone in this, but (better) Lucene compatibility is one of the > reasons why our company chose Solr over ElasticSearch. There are a number of Elasticsearch developers working on Lucene core (or maybe rather Lucene developers working at Elasticsearch?). And there are Solr developers

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Simon Willnauer
On Sun, May 10, 2020 at 3:41 PM Bram Van Dam wrote: > > On 10/05/2020 08:20, David Smiley wrote: > > An idea just occurred to me that may help make a split nicer for Solr > > than it is today. Solr could use a branch of the Lucene project that's > > used for the Solr project. > > Maybe I'm alone

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Adrien Grand
On Mon, May 11, 2020 at 1:17 AM Shawn Heisey wrote: > I think the presence of Solr in the codebase > has diluted Lucene's releases, making them come far too quickly. I > would bet that without Solr, Lucene would probably be somewhere in 6.x, > not 8.x. > Actually I think that Lucene would be

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Shawn Heisey
On 5/10/2020 3:41 PM, Michael McCandless wrote: I think the costs (I agree: they are high) are a one-time thing, while the benefits are long term, and accrue/multiply with time.  We should make decisions like this with the long-term benefits in mind. I expect Lucene and Solr to have long

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Tue, May 5, 2020 at 4:59 PM Tomás Fernández Löbbe wrote: On Tue, May 5, 2020 at 12:37 PM Dawid Weiss wrote: > >> > I read “promotion to TLP” as if this was some achievement that needs to >> be celebrated now. >> >> I honestly believe it is an achievement for a project to receive >> top-level

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Wed, May 6, 2020 at 4:24 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: Offtopic: Gus, I'm looking at Vespa core for Solr someday > +1, Vespa looks really fascinating! Plus it is released under ASL 2 as well. And the world clearly needs more open-source search engines. Mike

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Michael McCandless
On Thu, May 7, 2020 at 1:07 PM Bram Van Dam wrote: > The big question is this: “Is this the right time to split Solr and > > Lucene into two independent projects?”. > > Sounds like there are quite a few tasks to complete to get this done. > Splitting the build and codebase. Presumably a bunch of

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Jan Høydahl
Better stick to Lucene snapshot versions. Commit the Lucene change first, then solr. If the Lucene change is not mature enough to commit to Lucene, it is probably not mature enough for Solr either. Avoid hacks or forks, spend some longer time to get it right. If things get removed from Lucene

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Gus Heck
On Sun, May 10, 2020 at 11:55 AM Mike Drob wrote: > Solr maintaining a fork of Lucene sounds like exactly the situation that > let to the original merge, where there are two sets of divergent development > Exactly

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Mike Drob
Solr maintaining a fork of Lucene sounds like exactly the situation that let to the original merge, where there are two sets of divergent development On Sun, May 10, 2020 at 1:20 AM David Smiley wrote: > I agree with Doug that the burden of proof is on keeping the codebases > together instead

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Bram Van Dam
On 10/05/2020 08:20, David Smiley wrote: > An idea just occurred to me that may help make a split nicer for Solr > than it is today.  Solr could use a branch of the Lucene project that's > used for the Solr project. Maybe I'm alone in this, but (better) Lucene compatibility is one of the reasons

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread Adrien Grand
On Sun, May 10, 2020 at 8:20 AM David Smiley wrote: > I wonder if ElasticSearch tries to do this on their side too; does it? > Yes, Elasticsearch regularly upgrades to new snapshots of Lucene[1][2], often multiple times per minor version. It helps give Lucene more test and performance coverage,

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-10 Thread David Smiley
I agree with Doug that the burden of proof is on keeping the codebases together instead of the reverse. I liken it to a marriage; it has to work well for both parties.It seems to be mostly beneficial for Solr but much less so for Lucene. BTW an even better example than the huge FuzzyQuery

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-07 Thread Bram Van Dam
> The big question is this: “Is this the right time to split Solr and > Lucene into two independent projects?”. Sounds like there are quite a few tasks to complete to get this done. Splitting the build and codebase. Presumably a bunch of administration within Apache/the PMC. Setting up

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-07 Thread Adrien Grand
There are definitely pros and cons of splitting vs. being a single project. The bigger pains for me until now have been the following ones: Digging Solr failures The theory is that Solr failures can help find Lucene bugs that Lucene bugs wouldn't catch, and while this occurred a couple times, I

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Anshum Gupta
I personally feel that all the current issues can be solved by actually working on those problems instead of splitting and calling it a day. I don't really think that splitting provides any benefit to the Solr side of things at all, however I also completely agree that it would make things easier

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Ishan Chattopadhyaya
Offtopic: Gus, I'm looking at Vespa core for Solr someday. I used to work at Yahoo! and I was a strong advocate of replacing Vespa with Solr for our team. A quick benchmark back then proved to me that Vespa was faster than Solr for our usecase and I rested my case. Both Lucene and Vespa have

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Gus Heck
> > IMO, if we need to say “we can’t release X because it breaks Y”, or “we > need to release X to be able to release Y”, the projects are not really > independent, and “the PMCs will overlap” won’t take us very far. > This. I don't think the two really can be separated. Any separation will

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Simon Willnauer
I can speak from experience that working with a snapshot is much cleaner than working with submodules. We do this in elasticsearch for a very long time now and our process here works just fine. It has a bunch of advantages over a direct / source dependency like solr has right now. I recall that

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Ishan Chattopadhyaya
Except the logistics of enacting the split, I see no valid reason of keeping the projects together. Git submodule is the magic that we have to ease any potential discomfort. However, the effort needed to split feels absolutely massive, so I'm not sure if it is worth the hassle. On Wed, 6 May,

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Dawid Weiss
> If you go to lucene.apache.org, you'll see three things: Lucene Core (Lucene > with all it's modules), Solr and PyLucene. That's what I mean. Hmm... Maybe I'm dim but that's essentially what I want to do. Look: 1. Lucene Core (Lucene with all it's modules) 2. Solr 3. PyLucene The thing is:

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Tomás Fernández Löbbe
On Tue, May 5, 2020 at 12:37 PM Dawid Weiss wrote: > > I read “promotion to TLP” as if this was some achievement that needs to > be celebrated now. > > I honestly believe it is an achievement for a project to receive > top-level status. It's a sign of having a community of users, > committers

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Dawid Weiss
> I read “promotion to TLP” as if this was some achievement that needs to be > celebrated now. I honestly believe it is an achievement for a project to receive top-level status. It's a sign of having a community of users, committers and processes mature enough to empower its further development.

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Dawid Weiss
> Question: When Lucene no longer has the Solr test suite to help catch bugs, > how long time would it take from a Lucene commit, before Solr/ES Jenkins > instances would have had time to produce a build and run tests? Would it be > possible to setup a trigger in Solr Jenkins? It depends how

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Tomás Fernández Löbbe
I don’t agree with the argument “Solr outgrew being a subproject of Lucene”. I read “promotion to TLP” as if this was some achievement that needs to be celebrated now. Solr didn’t become a TLP years ago because the decision then was to merge with Lucene development, thinking they would progress

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Tue, May 5, 2020 at 11:41 AM Jan Høydahl wrote: As it is today, deveopers have had to do necessary Solr changes at the same > time when doing changes in Lucene. This is not really fair to the (mainly) > Lucene developers. It is not fair to Solr either, as such work might be > done in a hasty

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Jan Høydahl
Thanks for bringing it up Dawid. I’ve asked myself the same question several times over the last couple of years, and have kind of been waiting for someone to make the proposal :) In my head, Solr has out-grown being a sub project of Lucene, like hadoop, mahout, nutch and tika before it. The

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Doug Turnbull
Personally I feel the burden of proof should not be why they should be split up, but the other way - "what arguments can be made for keeping them together?" I would be curious if people can make the argument for keeping them together... -Doug On Tue, May 5, 2020 at 10:29 AM Michael McCandless <

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Mon, May 4, 2020 at 2:13 PM Dawid Weiss wrote: > This sounds like a decision has already been made. > > No. I plan to send a VOTE thread nonetheless. A vote thread is just > that -- a vote. If majority decides both projects > should stay together it's still a decision. A discussion without

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-05 Thread Michael McCandless
On Mon, May 4, 2020 at 5:28 PM Gézapeti Cseh wrote: I think separating the git repository and even the release schedules could > be done under the same TLP. > It would solve most of the technical issues reflected in the first mail and > there would be more time and data to > Hmm that is

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Gézapeti Cseh
I think separating the git repository and even the release schedules could be done under the same TLP. It would solve most of the technical issues reflected in the first mail and there would be more time and data to see if creating Apache Solr again is something the PMC would want to do gp On

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Dawid Weiss
Perhaps I didn't clarify this so far: my own interests (personal and business) are shared equally between Solr and Lucene (we have products that have plain Lucene underneath and we maintain products and systems that use Solr). So I am going to have a foot in both worlds no matter the outcome. I

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Dawid Weiss
> This sounds like a decision has already been made. No. I plan to send a VOTE thread nonetheless. A vote thread is just that -- a vote. If majority decides both projects should stay together it's still a decision. A discussion without any resolution is going to dissolve over time into no

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Mike Drob
This is an interesting approach, Michael. I took it a bit further by excluding all authors with only a single commit[1], since I think GitHub PRs tend to highlight that kind of contribution more. Since 2012 I found 24 lucene-only, 31 solr-only, and 77 (about 58%) contributing to both. Since 2018,

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Shai Erera
Interesting data Michael. I am not sure though that the shared commits tell us that there are people that contribute to both projects. Eventually, an API change/update in Lucene will require a change in Solr (but not vice versa). Those commits will still occur in both projects, only on the Solr

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-04 Thread Michael Sokolov
I always like to look at data when making a big decision, so I gathered some statistics about authors and commits to git over the history of the project. I wanted to see what these statistics could tell us about the degree of overlap between the two projects and whether it has changed over time.