date:20200811

Re: First Issue Label

2020-08-11 Thread Tomoko Uchida

JFYI: I opened an issue with the "newdev" label. It's mainly about
documentation and requires a bit of knowledge about our build system
(gradle).
https://issues.apache.org/jira/browse/LUCENE-9459

Thanks,
Tomoko


2020年8月9日(日) 5:00 Eric Pugh :

> I’d be interested in shepherding “newdev” style contributions to being
> commits.   I’m not comfortable making any deeper changes in Solr, but if
> it’s a “newdev” labeled featured, well then it’s probably “newcommitter”
> friendly as well ;-).
>
> Feel free to tag me on any issues that have patches etc….
>
> On Aug 8, 2020, at 8:19 AM, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
> Thanks for the reminder, Marcus. I just added a "newdev" label to this:
> https://issues.apache.org/jira/browse/SOLR-13438.
>
> On Fri, Aug 7, 2020 at 4:55 AM Jan Høydahl  wrote:
>
>> I have tagged some of the issues I have filed but not had bandwidth to
>> tackle immediately as ’newdev’, but could probably have done it far more
>> often.
>> If all of us browse through the issues we have created and tag those we
>> think are simple and important, then there would suddenly be a bunch!
>> Great reminder, Marcus! Having a clear focus on new devs is important!
>>
>> Jan
>>
>> 6. aug. 2020 kl. 06:12 skrev Anshum Gupta :
>>
>> There used to be a 'newdev' label in the past that fell through the
>> cracks.
>> https://cwiki.apache.org/confluence/display/LUCENE/HowToContribute mentions
>> the label, but of course, not a ton of JIRAs exist with that label for new
>> devs to pick up and run with.
>>
>> We could start using the label. I personally tagged a bunch of JIRAs once
>> upon a time with that label and also remember that as something we did at
>> one of the committer meetings, but then the lower hanging JIRAs were really
>> created and resolved without much delay, not leaving much on the table for
>> new developers.
>> We can certainly get back to using the label again.
>>
>> -Anshum
>>
>> On Wed, Aug 5, 2020 at 8:04 PM Marcus Eagan 
>> wrote:
>>
>>> Community,
>>>
>>> In the vane of more developer friendly, I think we should create a first
>>> issue label. In my experience, that label has been a great way to get
>>> newcomers involved in projects new to them.
>>>
>>> I've seen it in a number of Apache projects that I have contributed to,
>>> proprietary projects, and in CNCF projects.
>>>
>>> Please let me know what you think about a first issue label to make it
>>> easier for people not necessarily in the community looking to join to do so
>>> in the future.
>>>
>>> Thanks,
>>> --
>>> Marcus Eagan
>>>
>>>
>>
>> --
>> Anshum Gupta
>>
>>
>>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

Re: hybrid document routing

2020-08-11 Thread David Smiley

Cool!
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 11, 2020 at 9:55 AM Joel Bernstein  wrote:

> SOLR-14728 supports sub-second performance on joins with more than 1
> million values from the from index. Nice for access control.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Aug 11, 2020 at 9:49 AM Joel Bernstein  wrote:
>
>> This ticket will shed some light:
>>
>> https://issues.apache.org/jira/browse/SOLR-14728
>>
>>
>> I think I'm planning using a different approach to distribute tha ACL's
>> to all shards.
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Tue, Aug 11, 2020 at 1:18 AM Gus Heck  wrote:
>>
>>> Sounds like complex ACLs based on group memberships that use graph
>>> queries ? that would require local ACL's...
>>>
>>> On Mon, Aug 10, 2020 at 5:56 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 This seems like an XY problem. Would it be possible to describe the
 original problem that led you to this solution (in the prototype)? Also, do
 you think folks at solr-users@ list would have more ideas related to
 this usecase and cross posting there would help?

 On Tue, 11 Aug, 2020, 1:43 am David Smiley,  wrote:

> Are you sure you need the docs in the same shard when maybe you could
> assume a core exists on each node and then do a query-time join?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein 
> wrote:
>
>> I have a situation where I'd like to have the standard compositeId
>> router in place for a collection. But, I'd like certain documents (ACL
>> documents) to be duplicated on each shard in the collection. To achieve 
>> the
>> level of access control performance and scalability I'm looking for I 
>> need
>> the ACL records to be in the same core as the main documents.
>>
>> I put together a prototype where the compositeId router accepted
>> implicit routing parameters and it worked in my testing. Before I open a
>> ticket suggesting this approach I wonder what other people thought the 
>> best
>> approach would be to accomplish this goal.
>>
>>
>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread David Smiley

"classic mode" anyone? Meh.
FWIW I think "Standalone mode" is fine because it refers to the node
itself, not the cluster.  A cluster of standalone mode Solrs vs. a
SolrCloud cluster.

The SolrCloud name seems too entrenched to try to re-brand it.  And that's
fine with me; it's not a terrible or confusing name (IMO).

I agree with Jan's concern that "self-managed" is ambiguous as to who/what
is managing Solr.  It reminds me of the debacle of the "implicit router" --
boy was that a bad choice!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 11, 2020 at 11:13 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> I like Cassandra's original suggestion: uncoordinated vs coordinated (or
> non-coordinated vs coordinated).
>
> On Tue, Aug 11, 2020 at 8:19 PM Jan Høydahl  wrote:
>
>> Hehe, «self» can be self as in user or self as in Solr :)
>>
>> Legacy feels like something that is going away, and so far the
>> «standalone» mode is not going anywhere.
>> Cassandra, feel free to propose what is your best shot and then I don’t
>> think we need a poll for it, but suffice a bunch of +1 on this thread.
>>
>> Managed Cluster vs Non-managed Cluster?
>> Managed Cluster vs User Managed Cluster?
>>
>> Jan
>>
>> 11. aug. 2020 kl. 16:21 skrev Cassandra Targett :
>>
>> OK, fair point about self-managed. But I object to "leaving it" as
>> Legacy, as I've previously explained (I put that in quotes because it’s not
>> always called that at all - it has at least 3 names right now).
>>
>> The reality is someone can come up with an objection to every single
>> possibility. Someday we have to live with something that’s good enough and
>> move forward, or we’ll end up just living with the total mash of things we
>> have today. Which maybe is fine with everyone.
>>
>> I’ve tried to put real mental work into thinking about a good name, and
>> have tried to compromise based on feedback. At this point, though, unless
>> someone else comes up with something I’m likely done here. We’ll just
>> “leave it” all as it is now.
>>
>> Cassandra
>> On Aug 11, 2020, 9:11 AM -0500, Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com>, wrote:
>>
>> I object to "self managed". It gives the impression that Solr manages
>> itself, whereas it is the other way around: users need to manage the
>> standalone mode with lots of manual effort, as opposed to SolrCloud which
>> is in spirit self managed (solr manages itself using zk).
>>
>> I'm +1 with Legacy replication and SolrCloud replication for now. Later,
>> we can get rid of "SolrCloud" and call it something else. Also, once
>> SolrCloud is stable enough, we can get rid of legacy mode altogether. We
>> can discuss that elsewhere.
>>
>> On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett, 
>> wrote:
>>
>>> I don’t feel there is a consensus for me to move forward confidently,
>>> but the docs need to be fixed before 8.7. I’ve thought about Ilan’s
>>> suggestion, and like calling the non-SolrCloud cluster “self-managed”. It
>>> avoids the currently awkward phrasing and any misinterpretation of my
>>> original suggestion with clumsiness as Gus pointed out. Can everyone live
>>> with that?
>>>
>>> If so, that leaves what we might eventually call SolrCloud is the
>>> remaining sticking point. It’s not a problem that needs to be solved today
>>> as the term isn’t going anywhere yet since there aren’t any patches or PRs
>>> to change it at a code level.
>>>
>>> Barring further objections, then, I think I will go ahead with mostly
>>> leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”,
>>> “leader/follower mode”, some cases of “Standalone mode”, and similar
>>> constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as
>>> appropriate.
>>>
>>> Cassandra
>>> On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett ,
>>> wrote:
>>>
>>> The suggestion to use “managed” and maybe “self-managed” is an
>>> interesting one. Do you think it’s possible some might confuse that with
>>> the other ways we use managed - like the “managed-schema”, and “managed
>>> resources” (synonyms and stop words)? Neither of those are
>>> cluster-specific, and I wonder if the overlap in terminology would cause
>>> them to be conflated.
>>>
>>> Cassandra
>>> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg ,
>>> wrote:
>>>
>>> Both "legacy" and "SolrCloud" clusters are search server clusters. Seen
>>> from far enough, they look the same.
>>>
>>> In "legacy" the management code is elsewhere (developed by the client
>>> operating the cluster, running on other machines using a diferent logic and
>>> potentially another DB than Zookeeper) whereas in "SolrCloud" the
>>> management code is embedded in the search server(s) code and it happens
>>> that (currently) this code relies on Zookeeper.
>>>
>>> I see SolrCloud as a "managed cluster" vs. legacy that would be "Self
>>> managed" by the client, or "U manage" (non managed when looking at it from
>>> the

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

> Maybe if we have a common benchmarking suite, such efforts will be less
effort and can actually be contributed back so that we can potentially
monitor the matter.

I am +1 to contributing this to an Apache repository, the moment this is
stable. The moment periodic numbers start getting published, the risk of
the suite being abandoned is reduced. Two more things to do before this
happens: 1. identifying datasets and queries (I'm making progress) and 2. a
web UI that plots charts based on those numbers. Help welcome.

> Whatever we do or not do is imperfect.  I hope some "mandate" doesn't
stop progress.
> We don't go changing code just for the heck of it; we do it for a variety
of matters.

We sometimes do: https://issues.apache.org/jira/browse/SOLR-12845. I don't
want to stop progress, but I want to avoid situations where someone commits
an issue (e.g. SOLR-12845), it causes a massive regression (SOLR-14665),
and others have to come and fix the situation (
https://issues.apache.org/jira/browse/SOLR-14706 and releases) with very
little help or support from the original committer. Just because there was
no mandate in place, hours and hours of effort has already been wasted on
that issue, let aside the users who are suffering as well.

Requesting a performance testing for all features affecting critical code
paths seemed like the most constructive way to tackle this situation, but
if there is any other solution that comes to mind to address this
situation, please suggest.

>  If those
> things are blocked, we'll be trading the opportunity cost of the change
for the performance
> risk.  Each issue is different -- has its own risk-reward trade-off.
Just keep this in mind, Ishan.

I totally understand.

On Wed, Aug 12, 2020 at 10:18 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> > I don't think that the problem is nobody cares, more likely the problem
> is it's hard and there's always a tug of war between getting things done
> and out there where people can benefit from the feature/fix etc vs the risk
> that they stall out waiting for one more thing to do.
> I have tried desperately to stay constructive in this effort and in my
> intention, so I will not repeat what I have said in the past.
>
> > If the time to complete a task grows the likelihood that real life, and
> jobs interrupt it grows, and the chance it lingers indefinitely or is
> abandoned goes up.
> I'm afraid that shouldn't be an excuse to not do the due diligence. It is
> better to not commit something that is not performance tested (and affects
> default code paths for every user) than to commit it, cause a regression
> and have other people come clean up the performance mess after you.
>
>
>
> On Wed, Aug 12, 2020 at 10:03 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> > I was going to use the data set the Mike uses for the lucene nightly
>> benchmarks
>> I've gone with the same in the suite to begin with:
>> https://github.com/TheSearchStack/solr-bench/blob/master/small-data/small-enwiki.tsv.gz
>> The larger file can be downloaded and used as well.
>>
>> The suite is also capable of using .jsonl files, and I'm building another
>> dataset (based on Hacker News articles) for that at the moment.
>>
>> On Wed, Aug 12, 2020 at 10:00 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Here's the local mode example:
>>> https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json
>>> (Here, please ignore the JDK URL, it is downloaded but the system JDK is
>>> used)
>>>
>>> A pre-built Solr can be used as per
>>> https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json
>>> (I just added this).
>>> In this example, Solr is downloaded from the given URL and used.
>>> Alternatively, you can build Solr a tarball and place it in the solr-bench
>>> directory and specify its name (not the full path) in the "solr-package".
>>>
>>> When both "solr-package" and "repository" are specified, the former is
>>> used and the latter is ignored. If only the latter is specified
>>> ("repository"), Solr is compiled/built using the specified commit point.
>>>
>>>
>>>
>>>
>>> On Wed, Aug 12, 2020 at 6:17 AM Mike Drob  wrote:
>>>
 Can you give examples of this? I don’t see them in the repo.

 On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
 ichattopadhy...@gmail.com> wrote:

> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
> configured. It is just a configuration, one among many, that can be 
> changed
> as per needs of the benchmark. The benchmarks can be used with almost any
> branch (just specify the commit sha in the repository section, or
> alternatively build Solr tgz separately and refer to it in the 
> solr-package
> parameter).
>
>
> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>
>> Hi Ishan,
>>
>> Thanks for starting this conversation! I think it's important to pay

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

> I don't think that the problem is nobody cares, more likely the problem
is it's hard and there's always a tug of war between getting things done
and out there where people can benefit from the feature/fix etc vs the risk
that they stall out waiting for one more thing to do.
I have tried desperately to stay constructive in this effort and in my
intention, so I will not repeat what I have said in the past.

> If the time to complete a task grows the likelihood that real life, and
jobs interrupt it grows, and the chance it lingers indefinitely or is
abandoned goes up.
I'm afraid that shouldn't be an excuse to not do the due diligence. It is
better to not commit something that is not performance tested (and affects
default code paths for every user) than to commit it, cause a regression
and have other people come clean up the performance mess after you.



On Wed, Aug 12, 2020 at 10:03 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> > I was going to use the data set the Mike uses for the lucene nightly
> benchmarks
> I've gone with the same in the suite to begin with:
> https://github.com/TheSearchStack/solr-bench/blob/master/small-data/small-enwiki.tsv.gz
> The larger file can be downloaded and used as well.
>
> The suite is also capable of using .jsonl files, and I'm building another
> dataset (based on Hacker News articles) for that at the moment.
>
> On Wed, Aug 12, 2020 at 10:00 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Here's the local mode example:
>> https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json
>> (Here, please ignore the JDK URL, it is downloaded but the system JDK is
>> used)
>>
>> A pre-built Solr can be used as per
>> https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json
>> (I just added this).
>> In this example, Solr is downloaded from the given URL and used.
>> Alternatively, you can build Solr a tarball and place it in the solr-bench
>> directory and specify its name (not the full path) in the "solr-package".
>>
>> When both "solr-package" and "repository" are specified, the former is
>> used and the latter is ignored. If only the latter is specified
>> ("repository"), Solr is compiled/built using the specified commit point.
>>
>>
>>
>>
>> On Wed, Aug 12, 2020 at 6:17 AM Mike Drob  wrote:
>>
>>> Can you give examples of this? I don’t see them in the repo.
>>>
>>> On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 Local mode uses the installed JDK. GCP mode can pick up a JDK url as
 configured. It is just a configuration, one among many, that can be changed
 as per needs of the benchmark. The benchmarks can be used with almost any
 branch (just specify the commit sha in the repository section, or
 alternatively build Solr tgz separately and refer to it in the solr-package
 parameter).


 On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:

> Hi Ishan,
>
> Thanks for starting this conversation! I think it's important to pay
> attention to performance, but I also have some concerns with coming out
> with such a strong mandate. In the repository, I'm looking at how to run 
> in
> local mode, and see that it looks like it will try to download a jdk from
> some university website? That seems overly restrictive to me, why can't we
> use the already installed JDK?
>
> Is the benchmark suite designed for master? Or for branch_8x?
>
> Mike
>
> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Hi Everyone!
>>From now on, I intend to request/nag/demand/veto code changes,
>> which affect default code paths for most users, be accompanied by
>> performance testing numbers for it (e.g. [1]). Opt in features are fine, 
>> I
>> won't personally bother about them (but if you'd like to perf test them, 
>> it
>> would set a great precedent anyway).
>>
>> I will also work on setting up automated performance and stress
>> testing [2], but in the absence of that, let us do performance test
>> manually and report them in the JIRA. Unless we don't hold ourselves to a
>> high standards, performance will be a joke whereby performance 
>> regressions
>> can creep in without the committer(s) taking any responsibility towards
>> those users affected by it (SOLR-14665).
>>
>> A benchmarking suite that I am working on is at
>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
>> test suite is under development (SOLR-13933). If you wish to use either 
>> of
>> these, I shall offer help and support (please ping me on Slack directly 
>> or
>> #solr-dev, or open a Github Issue on that repo).
>>
>> Regards,
>> Ishan
>>
>> [1] -
>>

Re: Performance testing is necessary now

2020-08-11 Thread David Smiley

I haven't tried "solr-bench"
https://github.com/thesearchstack/solr-bench closely
but I sure hope we can rally around something that's pretty good; maybe
this is it.  I really need to give this one a shot.  I've noticed on
occasion some of us will throw together dedicated utilities to do
benchmarking for a specific matter and then throw it away.  Maybe if we
have a common benchmarking suite, such efforts will be less effort and can
actually be contributed back so that we can potentially monitor the matter.

Whatever we do or not do is imperfect.  I hope some "mandate" doesn't stop
progress.  We don't go changing code just for the heck of it; we do it for
a variety of matters.  If those things are blocked, we'll be trading the
opportunity cost of the change for the performance risk.  Each issue is
different -- has its own risk-reward trade-off.  Just keep this in mind,
Ishan.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, Aug 11, 2020 at 8:47 PM Mike Drob  wrote:

> Can you give examples of this? I don’t see them in the repo.
>
> On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
>> configured. It is just a configuration, one among many, that can be changed
>> as per needs of the benchmark. The benchmarks can be used with almost any
>> branch (just specify the commit sha in the repository section, or
>> alternatively build Solr tgz separately and refer to it in the solr-package
>> parameter).
>>
>>
>> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>>
>>> Hi Ishan,
>>>
>>> Thanks for starting this conversation! I think it's important to pay
>>> attention to performance, but I also have some concerns with coming out
>>> with such a strong mandate. In the repository, I'm looking at how to run in
>>> local mode, and see that it looks like it will try to download a jdk from
>>> some university website? That seems overly restrictive to me, why can't we
>>> use the already installed JDK?
>>>
>>> Is the benchmark suite designed for master? Or for branch_8x?
>>>
>>> Mike
>>>
>>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 Hi Everyone!
From now on, I intend to request/nag/demand/veto code changes, which
 affect default code paths for most users, be accompanied by performance
 testing numbers for it (e.g. [1]). Opt in features are fine, I won't
 personally bother about them (but if you'd like to perf test them, it would
 set a great precedent anyway).

 I will also work on setting up automated performance and stress testing
 [2], but in the absence of that, let us do performance test manually and
 report them in the JIRA. Unless we don't hold ourselves to a high
 standards, performance will be a joke whereby performance regressions can
 creep in without the committer(s) taking any responsibility towards those
 users affected by it (SOLR-14665).

 A benchmarking suite that I am working on is at
 https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
 test suite is under development (SOLR-13933). If you wish to use either of
 these, I shall offer help and support (please ping me on Slack directly or
 #solr-dev, or open a Github Issue on that repo).

 Regards,
 Ishan

 [1] -
 https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
 [2] -
 https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
 (edited)

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

> I was going to use the data set the Mike uses for the lucene nightly
benchmarks
I've gone with the same in the suite to begin with:
https://github.com/TheSearchStack/solr-bench/blob/master/small-data/small-enwiki.tsv.gz
The larger file can be downloaded and used as well.

The suite is also capable of using .jsonl files, and I'm building another
dataset (based on Hacker News articles) for that at the moment.

On Wed, Aug 12, 2020 at 10:00 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Here's the local mode example:
> https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json
> (Here, please ignore the JDK URL, it is downloaded but the system JDK is
> used)
>
> A pre-built Solr can be used as per
> https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json
> (I just added this).
> In this example, Solr is downloaded from the given URL and used.
> Alternatively, you can build Solr a tarball and place it in the solr-bench
> directory and specify its name (not the full path) in the "solr-package".
>
> When both "solr-package" and "repository" are specified, the former is
> used and the latter is ignored. If only the latter is specified
> ("repository"), Solr is compiled/built using the specified commit point.
>
>
>
>
> On Wed, Aug 12, 2020 at 6:17 AM Mike Drob  wrote:
>
>> Can you give examples of this? I don’t see them in the repo.
>>
>> On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
>>> configured. It is just a configuration, one among many, that can be changed
>>> as per needs of the benchmark. The benchmarks can be used with almost any
>>> branch (just specify the commit sha in the repository section, or
>>> alternatively build Solr tgz separately and refer to it in the solr-package
>>> parameter).
>>>
>>>
>>> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>>>
 Hi Ishan,

 Thanks for starting this conversation! I think it's important to pay
 attention to performance, but I also have some concerns with coming out
 with such a strong mandate. In the repository, I'm looking at how to run in
 local mode, and see that it looks like it will try to download a jdk from
 some university website? That seems overly restrictive to me, why can't we
 use the already installed JDK?

 Is the benchmark suite designed for master? Or for branch_8x?

 Mike

 On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
 ichattopadhy...@gmail.com> wrote:

> Hi Everyone!
>From now on, I intend to request/nag/demand/veto code changes,
> which affect default code paths for most users, be accompanied by
> performance testing numbers for it (e.g. [1]). Opt in features are fine, I
> won't personally bother about them (but if you'd like to perf test them, 
> it
> would set a great precedent anyway).
>
> I will also work on setting up automated performance and stress
> testing [2], but in the absence of that, let us do performance test
> manually and report them in the JIRA. Unless we don't hold ourselves to a
> high standards, performance will be a joke whereby performance regressions
> can creep in without the committer(s) taking any responsibility towards
> those users affected by it (SOLR-14665).
>
> A benchmarking suite that I am working on is at
> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
> test suite is under development (SOLR-13933). If you wish to use either of
> these, I shall offer help and support (please ping me on Slack directly or
> #solr-dev, or open a Github Issue on that repo).
>
> Regards,
> Ishan
>
> [1] -
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
> [2] -
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
> (edited)
>
>
>

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

Here's the local mode example:
https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json
(Here, please ignore the JDK URL, it is downloaded but the system JDK is
used)

A pre-built Solr can be used as per
https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json
(I just added this).
In this example, Solr is downloaded from the given URL and used.
Alternatively, you can build Solr a tarball and place it in the solr-bench
directory and specify its name (not the full path) in the "solr-package".

When both "solr-package" and "repository" are specified, the former is used
and the latter is ignored. If only the latter is specified ("repository"),
Solr is compiled/built using the specified commit point.

On Wed, Aug 12, 2020 at 6:17 AM Mike Drob  wrote:

> Can you give examples of this? I don’t see them in the repo.
>
> On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
>> configured. It is just a configuration, one among many, that can be changed
>> as per needs of the benchmark. The benchmarks can be used with almost any
>> branch (just specify the commit sha in the repository section, or
>> alternatively build Solr tgz separately and refer to it in the solr-package
>> parameter).
>>
>>
>> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>>
>>> Hi Ishan,
>>>
>>> Thanks for starting this conversation! I think it's important to pay
>>> attention to performance, but I also have some concerns with coming out
>>> with such a strong mandate. In the repository, I'm looking at how to run in
>>> local mode, and see that it looks like it will try to download a jdk from
>>> some university website? That seems overly restrictive to me, why can't we
>>> use the already installed JDK?
>>>
>>> Is the benchmark suite designed for master? Or for branch_8x?
>>>
>>> Mike
>>>
>>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 Hi Everyone!
From now on, I intend to request/nag/demand/veto code changes, which
 affect default code paths for most users, be accompanied by performance
 testing numbers for it (e.g. [1]). Opt in features are fine, I won't
 personally bother about them (but if you'd like to perf test them, it would
 set a great precedent anyway).

 I will also work on setting up automated performance and stress testing
 [2], but in the absence of that, let us do performance test manually and
 report them in the JIRA. Unless we don't hold ourselves to a high
 standards, performance will be a joke whereby performance regressions can
 creep in without the committer(s) taking any responsibility towards those
 users affected by it (SOLR-14665).

 A benchmarking suite that I am working on is at
 https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
 test suite is under development (SOLR-13933). If you wish to use either of
 these, I shall offer help and support (please ping me on Slack directly or
 #solr-dev, or open a Github Issue on that repo).

 Regards,
 Ishan

 [1] -
 https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
 [2] -
 https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
 (edited)

Re: Performance testing is necessary now

2020-08-11 Thread Mike Drob

Can you give examples of this? I don’t see them in the repo.

On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
> configured. It is just a configuration, one among many, that can be changed
> as per needs of the benchmark. The benchmarks can be used with almost any
> branch (just specify the commit sha in the repository section, or
> alternatively build Solr tgz separately and refer to it in the solr-package
> parameter).
>
>
> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>
>> Hi Ishan,
>>
>> Thanks for starting this conversation! I think it's important to pay
>> attention to performance, but I also have some concerns with coming out
>> with such a strong mandate. In the repository, I'm looking at how to run in
>> local mode, and see that it looks like it will try to download a jdk from
>> some university website? That seems overly restrictive to me, why can't we
>> use the already installed JDK?
>>
>> Is the benchmark suite designed for master? Or for branch_8x?
>>
>> Mike
>>
>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Hi Everyone!
>>>From now on, I intend to request/nag/demand/veto code changes, which
>>> affect default code paths for most users, be accompanied by performance
>>> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
>>> personally bother about them (but if you'd like to perf test them, it would
>>> set a great precedent anyway).
>>>
>>> I will also work on setting up automated performance and stress testing
>>> [2], but in the absence of that, let us do performance test manually and
>>> report them in the JIRA. Unless we don't hold ourselves to a high
>>> standards, performance will be a joke whereby performance regressions can
>>> creep in without the committer(s) taking any responsibility towards those
>>> users affected by it (SOLR-14665).
>>>
>>> A benchmarking suite that I am working on is at
>>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
>>> test suite is under development (SOLR-13933). If you wish to use either of
>>> these, I shall offer help and support (please ping me on Slack directly or
>>> #solr-dev, or open a Github Issue on that repo).
>>>
>>> Regards,
>>> Ishan
>>>
>>> [1] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>>> [2] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>>> (edited)
>>>
>>>
>>>

Re: Performance testing is necessary now

2020-08-11 Thread Đạt Cao Mạnh

>
> Another note here is problems come and go away in unpredictable ways.
Before SOLR-14665 I never thought about doing a performance test of
creating thousands of collections. The problem here is the same with our
tests, even Solr has a huge number of tests, bugs still happen here and
there, sometimes they are serious bugs. So no matter how good a performance
tool we have (not mentioning that we do not have a consistent, unified way
to do that) degrade performance can still happen in unpredictable ways.

Moreover, if a commit changes a particular codepath, like PeerSync class,
we do not have an available tool for that. Requiring people to write a tool
just to measure their changes to do a simple commit (and likely throw it
away) seems a big -1 to me, not to mention that bugs can arise in their
performance tools and numbers are hardly to be trusted.

Re: Survey on ManagedResources feature

2020-08-11 Thread Noble Paul

The end point is served by restlet. So, your rules are not going to be
honored. The rules work only if it is served by a Solr request handler

On Wed, Aug 12, 2020, 12:46 AM Jason Gerlowski 
wrote:

> Hey Noble,
>
> Can you explain what you mean when you say it's not secured?  Just for
> those of us who haven't been following the discussion so far?  On the
> surface of things users taking advantage of our RuleBasedAuth plugin
> can secure this API like they can any other HTTP API.  Or are you
> talking about some other security aspect here?
>
> Jason
>
> On Tue, Aug 11, 2020 at 9:55 AM Noble Paul  wrote:
> >
> > Hi all,
> > The end-point for Managed resources is not secured. So it needs to be
> > fixed/eliminated.
> >
> > I would like to know what is the level of adoption for that feature
> > and if it is a critical feature for users.
> >
> > Another possibility is to offer a replacement for the feature using a
> > different API
> >
> > Your feedback will help us decide on what a potential solution should be
> >
> > --
> > -
> > Noble Paul
>

Re: Performance testing is necessary now

2020-08-11 Thread Gus Heck

Not going to agree with your analogy. The difference is that everyone knows
murder is wrong, degredations in performance are indadvertent and happen
while folks are attempting to capture other benefits for the good of the
project. I don't see as how you should take the blame (any more than the
rest of us). I've had the idea in the past too, but not found time to act
on it. Specifically I wanted to start by getting a consistent periodic
indexing benchmark going for my JesterJ project which conveniently then
provides a consistent data set in a consistent cluster against which to run
query benchmarks... I was going to use the data set the Mike uses for the
lucene nightly benchmarks... I too think change is needed, but I don't
think anyone out there is malicious or knowingly causing slowdowns, so it's
not the culture, or at least that's only a minor part of it. I suspect if
we spend a lot of energy on haranguing people to "be better" we'll get much
less out of our effort than working towards tooling (the caveat being that
the effort has to make a concrete step that can be built on, which none of
my efforts have yet). We may disagree, but I think we both want benchmarks
so let's focus on that :)

-Gus

On Tue, Aug 11, 2020 at 5:56 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> I mostly disagree, Gus. Barring isolated efforts, no one ever stepped up
> to wrap up automated performance benchmarks (I take the blame on this). And
> after every major regression, we hear the same excuse: there are no
> automated performance benchmarks. Is that a valid excuse? It is like
> saying, there were no surveillance cameras, so I committed a murder. We
> need to change the culture.
>
> I am going to setup automated benchmarks, but it will take a while (weeks,
> not months). The suite is functional, I'm just identifying the right
> datasets and queries at the moment. Help welcome (SOLR-10317). But until
> that happens, are you comfortable in letting problems like SOLR-14665
> happen?
>
> On Wed, 12 Aug, 2020, 3:11 am Gus Heck,  wrote:
>
>> I think we need to get the system for measuring performance in place
>> before we can issue a mandate. The analogy is "test the application
>> functionality carefully before chcking in" vs "run these unit tests before
>> checking in." Even if everyone does their own microbenchmarks they likely
>> won't be comparable (some will have errors, some will measure
>> different things, the data will vary widely etc). Once we have a clear
>> metric that everyone can use, I'm all for a requirement to document changes
>> to that metric. I'm 100% behind careful scrutiny of things in the main code
>> paths until then, but I don't think we can issue a mandate that is
>> essentially "do your own thing". That will quickly revert to the current
>> situation. I don't think that the problem is nobody cares, more likely the
>> problem is it's hard and there's always a tug of war between getting things
>> done and out there where people can benefit from the feature/fix etc vs the
>> risk that they stall out waiting for one more thing to do. If the time to
>> complete a task grows the likelihood that real life, and jobs interrupt it
>> grows, and the chance it lingers indefinitely or is abandoned goes up.
>>
>> On Tue, Aug 11, 2020 at 5:09 PM Mike Drob  wrote:
>>
>>> Hi Ishan,
>>>
>>> Thanks for starting this conversation! I think it's important to pay
>>> attention to performance, but I also have some concerns with coming out
>>> with such a strong mandate. In the repository, I'm looking at how to run in
>>> local mode, and see that it looks like it will try to download a jdk from
>>> some university website? That seems overly restrictive to me, why can't we
>>> use the already installed JDK?
>>>
>>> Is the benchmark suite designed for master? Or for branch_8x?
>>>
>>> Mike
>>>
>>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 Hi Everyone!
From now on, I intend to request/nag/demand/veto code changes, which
 affect default code paths for most users, be accompanied by performance
 testing numbers for it (e.g. [1]). Opt in features are fine, I won't
 personally bother about them (but if you'd like to perf test them, it would
 set a great precedent anyway).

 I will also work on setting up automated performance and stress testing
 [2], but in the absence of that, let us do performance test manually and
 report them in the JIRA. Unless we don't hold ourselves to a high
 standards, performance will be a joke whereby performance regressions can
 creep in without the committer(s) taking any responsibility towards those
 users affected by it (SOLR-14665).

 A benchmarking suite that I am working on is at
 https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
 test suite is under development (SOLR-13933). If you wish to use either of
 these, I shall offer help and

Re: [VOTE] Release Lucene/Solr 8.6.1 RC2

2020-08-11 Thread Houston Putman

I have also run Ishan's performance test for creating collections on a
local 15 node cluster.

The time for creating additional collections is stable, and fixes the
performance degradation introduced in 8.6.0.

[image: Screen Shot 2020-08-11 at 5.57.01 PM.png]

- Houston

On Tue, Aug 11, 2020 at 5:54 PM Gus Heck  wrote:

> SUCCESS! [0:54:03.106188]
>
> And installed the tarball as a 4 node cluster, created a collection and
> added a document - success :)
>
> +1
>
> On Tue, Aug 11, 2020 at 12:13 PM Timothy Potter 
> wrote:
>
>> Thanks Houston.
>>
>> SUCCESS! [1:34:35.219332]
>>
>> +1
>>
>> On Mon, Aug 10, 2020 at 1:02 PM Houston Putman 
>> wrote:
>>
>>> Please vote for release candidate 2 for Lucene/Solr 8.6.1
>>>
>>> The artifacts can be downloaded from:
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>>>
>>> You can run the smoke tester directly with this command:
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>>>
>>> The vote will be open for at least 72 hours i.e. until 2020-08-13 20:00
>>> UTC.
>>>
>>> [ ] +1  approve
>>> [ ] +0  no opinion
>>> [ ] -1  disapprove (and reason why)
>>>
>>> Here is my +1
>>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

I mostly disagree, Gus. Barring isolated efforts, no one ever stepped up to
wrap up automated performance benchmarks (I take the blame on this). And
after every major regression, we hear the same excuse: there are no
automated performance benchmarks. Is that a valid excuse? It is like
saying, there were no surveillance cameras, so I committed a murder. We
need to change the culture.

I am going to setup automated benchmarks, but it will take a while (weeks,
not months). The suite is functional, I'm just identifying the right
datasets and queries at the moment. Help welcome (SOLR-10317). But until
that happens, are you comfortable in letting problems like SOLR-14665
happen?

On Wed, 12 Aug, 2020, 3:11 am Gus Heck,  wrote:

> I think we need to get the system for measuring performance in place
> before we can issue a mandate. The analogy is "test the application
> functionality carefully before chcking in" vs "run these unit tests before
> checking in." Even if everyone does their own microbenchmarks they likely
> won't be comparable (some will have errors, some will measure
> different things, the data will vary widely etc). Once we have a clear
> metric that everyone can use, I'm all for a requirement to document changes
> to that metric. I'm 100% behind careful scrutiny of things in the main code
> paths until then, but I don't think we can issue a mandate that is
> essentially "do your own thing". That will quickly revert to the current
> situation. I don't think that the problem is nobody cares, more likely the
> problem is it's hard and there's always a tug of war between getting things
> done and out there where people can benefit from the feature/fix etc vs the
> risk that they stall out waiting for one more thing to do. If the time to
> complete a task grows the likelihood that real life, and jobs interrupt it
> grows, and the chance it lingers indefinitely or is abandoned goes up.
>
> On Tue, Aug 11, 2020 at 5:09 PM Mike Drob  wrote:
>
>> Hi Ishan,
>>
>> Thanks for starting this conversation! I think it's important to pay
>> attention to performance, but I also have some concerns with coming out
>> with such a strong mandate. In the repository, I'm looking at how to run in
>> local mode, and see that it looks like it will try to download a jdk from
>> some university website? That seems overly restrictive to me, why can't we
>> use the already installed JDK?
>>
>> Is the benchmark suite designed for master? Or for branch_8x?
>>
>> Mike
>>
>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Hi Everyone!
>>>From now on, I intend to request/nag/demand/veto code changes, which
>>> affect default code paths for most users, be accompanied by performance
>>> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
>>> personally bother about them (but if you'd like to perf test them, it would
>>> set a great precedent anyway).
>>>
>>> I will also work on setting up automated performance and stress testing
>>> [2], but in the absence of that, let us do performance test manually and
>>> report them in the JIRA. Unless we don't hold ourselves to a high
>>> standards, performance will be a joke whereby performance regressions can
>>> creep in without the committer(s) taking any responsibility towards those
>>> users affected by it (SOLR-14665).
>>>
>>> A benchmarking suite that I am working on is at
>>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
>>> test suite is under development (SOLR-13933). If you wish to use either of
>>> these, I shall offer help and support (please ping me on Slack directly or
>>> #solr-dev, or open a Github Issue on that repo).
>>>
>>> Regards,
>>> Ishan
>>>
>>> [1] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>>> [2] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>>> (edited)
>>>
>>>
>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: [VOTE] Release Lucene/Solr 8.6.1 RC2

2020-08-11 Thread Gus Heck

SUCCESS! [0:54:03.106188]

And installed the tarball as a 4 node cluster, created a collection and
added a document - success :)

+1

On Tue, Aug 11, 2020 at 12:13 PM Timothy Potter 
wrote:

> Thanks Houston.
>
> SUCCESS! [1:34:35.219332]
>
> +1
>
> On Mon, Aug 10, 2020 at 1:02 PM Houston Putman 
> wrote:
>
>> Please vote for release candidate 2 for Lucene/Solr 8.6.1
>>
>> The artifacts can be downloaded from:
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>>
>> You can run the smoke tester directly with this command:
>>
>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>>
>> The vote will be open for at least 72 hours i.e. until 2020-08-13 20:00
>> UTC.
>>
>> [ ] +1  approve
>> [ ] +0  no opinion
>> [ ] -1  disapprove (and reason why)
>>
>> Here is my +1
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

> but I also have some concerns with coming out with such a strong mandate

If you have any alternate suggestions to prevent situations like
SOLR-14665, please let us know. I'm open to any suggestion that can enable
us collectively to prevent such regressions. Noble and I built that perf
tool (after initial work by Vivek), and there are several gaps and missing
features, but it is the closest solution I think we have towards automated
periodic perf testing. If we adopt it, or adopt any other solution, I'm
willing to co-operate in making it effective and easy to use so that it
becomes part of our dev processes, just like Jenkins based testing is today.


On Wed, 12 Aug, 2020, 3:00 am Ishan Chattopadhyaya, <
ichattopadhy...@gmail.com> wrote:

> Local mode uses the installed JDK. GCP mode can pick up a JDK url as
> configured. It is just a configuration, one among many, that can be changed
> as per needs of the benchmark. The benchmarks can be used with almost any
> branch (just specify the commit sha in the repository section, or
> alternatively build Solr tgz separately and refer to it in the solr-package
> parameter).
>
> On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:
>
>> Hi Ishan,
>>
>> Thanks for starting this conversation! I think it's important to pay
>> attention to performance, but I also have some concerns with coming out
>> with such a strong mandate. In the repository, I'm looking at how to run in
>> local mode, and see that it looks like it will try to download a jdk from
>> some university website? That seems overly restrictive to me, why can't we
>> use the already installed JDK?
>>
>> Is the benchmark suite designed for master? Or for branch_8x?
>>
>> Mike
>>
>> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Hi Everyone!
>>>From now on, I intend to request/nag/demand/veto code changes, which
>>> affect default code paths for most users, be accompanied by performance
>>> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
>>> personally bother about them (but if you'd like to perf test them, it would
>>> set a great precedent anyway).
>>>
>>> I will also work on setting up automated performance and stress testing
>>> [2], but in the absence of that, let us do performance test manually and
>>> report them in the JIRA. Unless we don't hold ourselves to a high
>>> standards, performance will be a joke whereby performance regressions can
>>> creep in without the committer(s) taking any responsibility towards those
>>> users affected by it (SOLR-14665).
>>>
>>> A benchmarking suite that I am working on is at
>>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress
>>> test suite is under development (SOLR-13933). If you wish to use either of
>>> these, I shall offer help and support (please ping me on Slack directly or
>>> #solr-dev, or open a Github Issue on that repo).
>>>
>>> Regards,
>>> Ishan
>>>
>>> [1] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>>> [2] -
>>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>>> (edited)
>>>
>>>
>>>

Re: Performance testing is necessary now

2020-08-11 Thread Gus Heck

I think we need to get the system for measuring performance in place before
we can issue a mandate. The analogy is "test the application functionality
carefully before chcking in" vs "run these unit tests before checking in."
Even if everyone does their own microbenchmarks they likely won't be
comparable (some will have errors, some will measure different things, the
data will vary widely etc). Once we have a clear metric that everyone can
use, I'm all for a requirement to document changes to that metric. I'm 100%
behind careful scrutiny of things in the main code paths until then, but I
don't think we can issue a mandate that is essentially "do your own thing".
That will quickly revert to the current situation. I don't think that the
problem is nobody cares, more likely the problem is it's hard and there's
always a tug of war between getting things done and out there where people
can benefit from the feature/fix etc vs the risk that they stall out
waiting for one more thing to do. If the time to complete a task grows the
likelihood that real life, and jobs interrupt it grows, and the chance
it lingers indefinitely or is abandoned goes up.

On Tue, Aug 11, 2020 at 5:09 PM Mike Drob  wrote:

> Hi Ishan,
>
> Thanks for starting this conversation! I think it's important to pay
> attention to performance, but I also have some concerns with coming out
> with such a strong mandate. In the repository, I'm looking at how to run in
> local mode, and see that it looks like it will try to download a jdk from
> some university website? That seems overly restrictive to me, why can't we
> use the already installed JDK?
>
> Is the benchmark suite designed for master? Or for branch_8x?
>
> Mike
>
> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Hi Everyone!
>>From now on, I intend to request/nag/demand/veto code changes, which
>> affect default code paths for most users, be accompanied by performance
>> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
>> personally bother about them (but if you'd like to perf test them, it would
>> set a great precedent anyway).
>>
>> I will also work on setting up automated performance and stress testing
>> [2], but in the absence of that, let us do performance test manually and
>> report them in the JIRA. Unless we don't hold ourselves to a high
>> standards, performance will be a joke whereby performance regressions can
>> creep in without the committer(s) taking any responsibility towards those
>> users affected by it (SOLR-14665).
>>
>> A benchmarking suite that I am working on is at
>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress test
>> suite is under development (SOLR-13933). If you wish to use either of
>> these, I shall offer help and support (please ping me on Slack directly or
>> #solr-dev, or open a Github Issue on that repo).
>>
>> Regards,
>> Ishan
>>
>> [1] -
>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>> [2] -
>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>> (edited)
>>
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

Local mode uses the installed JDK. GCP mode can pick up a JDK url as
configured. It is just a configuration, one among many, that can be changed
as per needs of the benchmark. The benchmarks can be used with almost any
branch (just specify the commit sha in the repository section, or
alternatively build Solr tgz separately and refer to it in the solr-package
parameter).

On Wed, 12 Aug, 2020, 2:39 am Mike Drob,  wrote:

> Hi Ishan,
>
> Thanks for starting this conversation! I think it's important to pay
> attention to performance, but I also have some concerns with coming out
> with such a strong mandate. In the repository, I'm looking at how to run in
> local mode, and see that it looks like it will try to download a jdk from
> some university website? That seems overly restrictive to me, why can't we
> use the already installed JDK?
>
> Is the benchmark suite designed for master? Or for branch_8x?
>
> Mike
>
> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Hi Everyone!
>>From now on, I intend to request/nag/demand/veto code changes, which
>> affect default code paths for most users, be accompanied by performance
>> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
>> personally bother about them (but if you'd like to perf test them, it would
>> set a great precedent anyway).
>>
>> I will also work on setting up automated performance and stress testing
>> [2], but in the absence of that, let us do performance test manually and
>> report them in the JIRA. Unless we don't hold ourselves to a high
>> standards, performance will be a joke whereby performance regressions can
>> creep in without the committer(s) taking any responsibility towards those
>> users affected by it (SOLR-14665).
>>
>> A benchmarking suite that I am working on is at
>> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress test
>> suite is under development (SOLR-13933). If you wish to use either of
>> these, I shall offer help and support (please ping me on Slack directly or
>> #solr-dev, or open a Github Issue on that repo).
>>
>> Regards,
>> Ishan
>>
>> [1] -
>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>> [2] -
>> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>> (edited)
>>
>>
>>

Re: Performance testing is necessary now

2020-08-11 Thread Mike Drob

Hi Ishan,

Thanks for starting this conversation! I think it's important to pay
attention to performance, but I also have some concerns with coming out
with such a strong mandate. In the repository, I'm looking at how to run in
local mode, and see that it looks like it will try to download a jdk from
some university website? That seems overly restrictive to me, why can't we
use the already installed JDK?

Is the benchmark suite designed for master? Or for branch_8x?

Mike

On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Hi Everyone!
>From now on, I intend to request/nag/demand/veto code changes, which
> affect default code paths for most users, be accompanied by performance
> testing numbers for it (e.g. [1]). Opt in features are fine, I won't
> personally bother about them (but if you'd like to perf test them, it would
> set a great precedent anyway).
>
> I will also work on setting up automated performance and stress testing
> [2], but in the absence of that, let us do performance test manually and
> report them in the JIRA. Unless we don't hold ourselves to a high
> standards, performance will be a joke whereby performance regressions can
> creep in without the committer(s) taking any responsibility towards those
> users affected by it (SOLR-14665).
>
> A benchmarking suite that I am working on is at
> https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress test
> suite is under development (SOLR-13933). If you wish to use either of
> these, I shall offer help and support (please ping me on Slack directly or
> #solr-dev, or open a Github Issue on that repo).
>
> Regards,
> Ishan
>
> [1] -
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
> [2] -
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
> (edited)
>
>
>

Re: Badapple report

2020-08-11 Thread Atri Sharma

Merged (thanks Mike D!).

Atri

On Tue, Aug 11, 2020 at 5:32 PM Erick Erickson  wrote:
>
> Great, thanks! Let me know when you push it, I can beast the test again.
>
> > On Aug 11, 2020, at 3:48 AM, Atri Sharma  wrote:
> >
> > I investigated testRequestRateLimiters and hardened the tests up:
> >
> > https://github.com/apache/lucene-solr/pull/1736
> >
> > This will stop testConcurrentRequests from failing and should
> > hopefully stop testSlotBorrowing as well. If testSlotBorrowing
> > continues to fail, I will have to rethink the test.
> >
> > On Mon, Aug 10, 2020 at 8:17 PM Erick Erickson  
> > wrote:
> >>
> >> We’re backsliding some. I encourage people to look at: 
> >> http://fucit.org/solr-jenkins-reports/failure-report.html, we have a 
> >> number of ill-behaved tests, particularly TestRequestRateLimiter, 
> >> TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and 
> >> TestIndexingSequenceNumbers…
> >>
> >>
> >> Raw fail count by week totals, most recent week first (corresponds to 
> >> bits):
> >> Week: 0  had  100 failures
> >> Week: 1  had  82 failures
> >> Week: 2  had  94 failures
> >> Week: 3  had  502 failures
> >>
> >>
> >> Failures in Hoss' reports for the last 4 rollups.
> >>
> >> There were 585 unannotated tests that failed in Hoss' rollups. Ordered by 
> >> the date I downloaded the rollup file, newest->oldest. See above for the 
> >> dates the files were collected
> >> These tests were NOT BadApple'd or AwaitsFix'd
> >>
> >> Failures in the last 4 reports..
> >>   Report   Pct runsfails   test
> >> 0123   4.4 1583 37  BasicDistributedZkTest.test
> >> 0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
> >> 0123   2.5 8598248  
> >> CloudExitableDirectoryReaderTest.testCreepThenBite
> >> 0123   1.9 1712 36  
> >> CloudExitableDirectoryReaderTest.testWhitebox
> >> 0123   0.5 1587 11  
> >> DocValuesNotIndexedTest.testGroupingDVOnlySortLast
> >> 0123   2.2 1679 82  HttpPartitionOnCommitTest.test
> >> 0123   0.5 1592 16  HttpPartitionTest.test
> >> 0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
> >> 0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
> >> 0123   7.4 1643 59  MultiThreadedOCPTest.test
> >> 0123   0.3 1567  8  ReplaceNodeTest.test
> >> 0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
> >> 0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
> >> 0123   2.1  818 19  
> >> TestCircuitBreaker.testBuildingMemoryPressure
> >> 0123   2.6  818 13  
> >> TestCircuitBreaker.testResponseWithCBTiming
> >> 0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
> >> 0123   2.5 1662 33  TestDistributedGrouping.test
> >> 0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
> >> 0123   6.4 1614 74  TestExportWriter.testExpr
> >> 0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
> >> 0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
> >> 0123   0.5 1607 26  TestPackages.testPluginLoading
> >> 0123   0.7 1596 15  
> >> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
> >> 0123   1.5 1610 59  
> >> TestReRankQParserPlugin.testMinExactCount
> >> 0123   0.3 1552  4  TestReplicaProperties.test
> >> 0123   0.3 1556  5  
> >> TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
> >> 0123   0.3 1565  9  TestSolrConfigHandlerCloud.test
> >> 
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>


-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: RoadMap?

2020-08-11 Thread Marcus Eagan

+1 (non-binding)

On Tue, Aug 11, 2020 at 09:52 Gus Heck  wrote:

> I was thinking that level of detail is in the Jira... I don't see any
> reason for things to disappear (in fact rejected should go in a rejected
> list for future reference.)
>
> On Tue, Aug 11, 2020 at 12:04 PM Ilan Ginzburg  wrote:
>
>> Maybe also add “in progress”? So items do not disappear suddenly from the
>> page when work really starts on them?
>>
>> On Tue 11 Aug 2020 at 17:15, Gus Heck  wrote:
>>
>>> Cool, since I brought it up, I can volunteer to help manage the page. We
>>> should get jira issue links in there wherever possible. Do we want to build
>>> an initial list and have some sort of Proposed/Planned workflow so readers
>>> can have confidence (or appropriate lack of confidence) in what they see
>>> there? voting on things seems like too much but maybe folks who care watch
>>> the page, and if something is on there for a week without objection it can
>>> be called accepted? If a discussion starts here it can be marked
>>> "Considering" so... something like this:
>>>
>>> 4 states: Proposed, Considering, Planned, Rejected
>>>
>>> Workflow like this:
>>> Proposed ---(no objection 1 wk) --> Planned
>>> Proposed ---(discussion)--> Considering
>>> Considering (agreement) --> Planned
>>> Considering (deferred) ---> Proposed (later release)
>>> Considering (unsuitable) -> Rejected
>>> Considering (promoted) ---> Proposed (earlier release)
>>> Planned (difficulty found) ---> Considering
>>>
>>> Anything in "Considering" should have an active dev list thread, and if
>>> it didn't happen on the list it didn't happen :). Any of that (or
>>> differences of opinion during Considering) can be overridden by a formal
>>> vote of course
>>>
>>> -Gus
>>>
>>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 I've created a placeholder document here:
 https://cwiki.apache.org/confluence/display/SOLR/Roadmap
 Let us put in all our items there.

 On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl 
 wrote:

> Let’s revive this email thread about Roadmap.
>
>
>
>
>
> With so many large initiatives going on, and the TLP split also, I
> think it makes perfect sense with a Roadmap.
>
>
> I know we’re not used to that kind of thing - we tend to just let
> things play out as it happens to land in various releases, but this time 
> is
> special, and I think we’d benefit from more coordination. I don’t know how
> to enforce such coordination though, other than appealing to all 
> committers
> to endorse the roadmap and respect it when they merge things. We may not 
> be
> able to set a release date for 9.0 right now, but we may be able to define
> preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
> 8.8 - that kind of coarse-grained decisions. We also may need a person 
> that
> «owns» the Roadmap confluence page and actively promotes it, tries to keep
> it up to date and reminds the rest of us about its existence. A roadmap
> must NOT be a brake slowing us down, but a tool helping us avoid silly
> mistakes.
>
>
>
>
>
> Jan
>
>
>
>
>
> > 5. jul. 2020 kl. 02:39 skrev Noble Paul :
>
>
> >
>
>
> > I think the logical thing to do today is completely rip out all
>
>
> > autoscaling code as it exists today.
>
>
> > Let's deprecate that in 8.7 and build something for
> "assign-strategy".
>
>
> > Austoscaling , if required, should not be a part of Solr
>
>
> >
>
>
> >
>
>
> >
>
>
> > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
> wrote:
>
>
> >>
>
>
> >> +1
>
>
> >>
>
>
> >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
> indicate what major things needs to happen when.
>
>
> >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
> pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
> 8.8 hehe)?
>
>
> >> That would enable Lucene to ship 9.0 without waiting for a ton of
> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
>
>
> >>
>
>
> >> Jan
>
>
> >>
>
>
> >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
>
>
> >>
>
>
> >>
>
>
> >>> I totally expect some things to bubble up when we try to release
> with Gradle, the tarball being one. I don’t think that’s a very big issue,
> but if you have lots of “not very big” issues they do add up.
>
>
> >>
>
>
> >>
>
>
> >> Adding a tarball is literally 3-5 lines of code (you add a task
> that builds

Re: RoadMap?

2020-08-11 Thread Gus Heck

I was thinking that level of detail is in the Jira... I don't see any
reason for things to disappear (in fact rejected should go in a rejected
list for future reference.)

On Tue, Aug 11, 2020 at 12:04 PM Ilan Ginzburg  wrote:

> Maybe also add “in progress”? So items do not disappear suddenly from the
> page when work really starts on them?
>
> On Tue 11 Aug 2020 at 17:15, Gus Heck  wrote:
>
>> Cool, since I brought it up, I can volunteer to help manage the page. We
>> should get jira issue links in there wherever possible. Do we want to build
>> an initial list and have some sort of Proposed/Planned workflow so readers
>> can have confidence (or appropriate lack of confidence) in what they see
>> there? voting on things seems like too much but maybe folks who care watch
>> the page, and if something is on there for a week without objection it can
>> be called accepted? If a discussion starts here it can be marked
>> "Considering" so... something like this:
>>
>> 4 states: Proposed, Considering, Planned, Rejected
>>
>> Workflow like this:
>> Proposed ---(no objection 1 wk) --> Planned
>> Proposed ---(discussion)--> Considering
>> Considering (agreement) --> Planned
>> Considering (deferred) ---> Proposed (later release)
>> Considering (unsuitable) -> Rejected
>> Considering (promoted) ---> Proposed (earlier release)
>> Planned (difficulty found) ---> Considering
>>
>> Anything in "Considering" should have an active dev list thread, and if
>> it didn't happen on the list it didn't happen :). Any of that (or
>> differences of opinion during Considering) can be overridden by a formal
>> vote of course
>>
>> -Gus
>>
>>
>>
>>
>> On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> I've created a placeholder document here:
>>> https://cwiki.apache.org/confluence/display/SOLR/Roadmap
>>> Let us put in all our items there.
>>>
>>> On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl 
>>> wrote:
>>>
 Let’s revive this email thread about Roadmap.





 With so many large initiatives going on, and the TLP split also, I
 think it makes perfect sense with a Roadmap.


 I know we’re not used to that kind of thing - we tend to just let
 things play out as it happens to land in various releases, but this time is
 special, and I think we’d benefit from more coordination. I don’t know how
 to enforce such coordination though, other than appealing to all committers
 to endorse the roadmap and respect it when they merge things. We may not be
 able to set a release date for 9.0 right now, but we may be able to define
 preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
 8.8 - that kind of coarse-grained decisions. We also may need a person that
 «owns» the Roadmap confluence page and actively promotes it, tries to keep
 it up to date and reminds the rest of us about its existence. A roadmap
 must NOT be a brake slowing us down, but a tool helping us avoid silly
 mistakes.





 Jan





 > 5. jul. 2020 kl. 02:39 skrev Noble Paul :


 >


 > I think the logical thing to do today is completely rip out all


 > autoscaling code as it exists today.


 > Let's deprecate that in 8.7 and build something for "assign-strategy".


 > Austoscaling , if required, should not be a part of Solr


 >


 >


 >


 > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
 wrote:


 >>


 >> +1


 >>


 >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
 indicate what major things needs to happen when.


 >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
 pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
 8.8 hehe)?


 >> That would enable Lucene to ship 9.0 without waiting for a ton of
 alpha-quality Solr features, and Solr could have its own Roadmap wiki.


 >>


 >> Jan


 >>


 >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :


 >>


 >>


 >>> I totally expect some things to bubble up when we try to release
 with Gradle, the tarball being one. I don’t think that’s a very big issue,
 but if you have lots of “not very big” issues they do add up.


 >>


 >>


 >> Adding a tarball is literally 3-5 lines of code (you add a task that
 builds a tarball or a zip file from the outputs of solr/packaging toDir
 task)... The bigger issue with gradle is that somebody has to step up and
 try to identify any other issues and/or missing bits when trying to do a
 full release cycle.


 >>

Re: [VOTE] Release Lucene/Solr 8.6.1 RC2

2020-08-11 Thread Timothy Potter

Thanks Houston.

SUCCESS! [1:34:35.219332]

+1

On Mon, Aug 10, 2020 at 1:02 PM Houston Putman 
wrote:

> Please vote for release candidate 2 for Lucene/Solr 8.6.1
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99
>
> The vote will be open for at least 72 hours i.e. until 2020-08-13 20:00
> UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>

Re: RoadMap?

2020-08-11 Thread Ilan Ginzburg

Maybe also add “in progress”? So items do not disappear suddenly from the
page when work really starts on them?

On Tue 11 Aug 2020 at 17:15, Gus Heck  wrote:

> Cool, since I brought it up, I can volunteer to help manage the page. We
> should get jira issue links in there wherever possible. Do we want to build
> an initial list and have some sort of Proposed/Planned workflow so readers
> can have confidence (or appropriate lack of confidence) in what they see
> there? voting on things seems like too much but maybe folks who care watch
> the page, and if something is on there for a week without objection it can
> be called accepted? If a discussion starts here it can be marked
> "Considering" so... something like this:
>
> 4 states: Proposed, Considering, Planned, Rejected
>
> Workflow like this:
> Proposed ---(no objection 1 wk) --> Planned
> Proposed ---(discussion)--> Considering
> Considering (agreement) --> Planned
> Considering (deferred) ---> Proposed (later release)
> Considering (unsuitable) -> Rejected
> Considering (promoted) ---> Proposed (earlier release)
> Planned (difficulty found) ---> Considering
>
> Anything in "Considering" should have an active dev list thread, and if it
> didn't happen on the list it didn't happen :). Any of that (or differences
> of opinion during Considering) can be overridden by a formal vote of course
>
> -Gus
>
>
>
>
> On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> I've created a placeholder document here:
>> https://cwiki.apache.org/confluence/display/SOLR/Roadmap
>> Let us put in all our items there.
>>
>> On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl 
>> wrote:
>>
>>> Let’s revive this email thread about Roadmap.
>>>
>>>
>>>
>>>
>>>
>>> With so many large initiatives going on, and the TLP split also, I think
>>> it makes perfect sense with a Roadmap.
>>>
>>>
>>> I know we’re not used to that kind of thing - we tend to just let things
>>> play out as it happens to land in various releases, but this time is
>>> special, and I think we’d benefit from more coordination. I don’t know how
>>> to enforce such coordination though, other than appealing to all committers
>>> to endorse the roadmap and respect it when they merge things. We may not be
>>> able to set a release date for 9.0 right now, but we may be able to define
>>> preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
>>> 8.8 - that kind of coarse-grained decisions. We also may need a person that
>>> «owns» the Roadmap confluence page and actively promotes it, tries to keep
>>> it up to date and reminds the rest of us about its existence. A roadmap
>>> must NOT be a brake slowing us down, but a tool helping us avoid silly
>>> mistakes.
>>>
>>>
>>>
>>>
>>>
>>> Jan
>>>
>>>
>>>
>>>
>>>
>>> > 5. jul. 2020 kl. 02:39 skrev Noble Paul :
>>>
>>>
>>> >
>>>
>>>
>>> > I think the logical thing to do today is completely rip out all
>>>
>>>
>>> > autoscaling code as it exists today.
>>>
>>>
>>> > Let's deprecate that in 8.7 and build something for "assign-strategy".
>>>
>>>
>>> > Austoscaling , if required, should not be a part of Solr
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
>>> wrote:
>>>
>>>
>>> >>
>>>
>>>
>>> >> +1
>>>
>>>
>>> >>
>>>
>>>
>>> >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
>>> indicate what major things needs to happen when.
>>>
>>>
>>> >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
>>> pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
>>> 8.8 hehe)?
>>>
>>>
>>> >> That would enable Lucene to ship 9.0 without waiting for a ton of
>>> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
>>>
>>>
>>> >>
>>>
>>>
>>> >> Jan
>>>
>>>
>>> >>
>>>
>>>
>>> >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >>> I totally expect some things to bubble up when we try to release
>>> with Gradle, the tarball being one. I don’t think that’s a very big issue,
>>> but if you have lots of “not very big” issues they do add up.
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >> Adding a tarball is literally 3-5 lines of code (you add a task that
>>> builds a tarball or a zip file from the outputs of solr/packaging toDir
>>> task)... The bigger issue with gradle is that somebody has to step up and
>>> try to identify any other issues and/or missing bits when trying to do a
>>> full release cycle.
>>>
>>>
>>> >>
>>>
>>>
>>> >> D.
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> > --
>>>
>>>
>>> > -
>>>
>>>
>>> > Noble Paul
>>>
>>>
>>> >
>>>
>>>
>>> > -
>>>
>>>
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>
>>>
>>> > For additional commands, e-mail:

Re: RoadMap?

2020-08-11 Thread Gus Heck

Cool, since I brought it up, I can volunteer to help manage the page. We
should get jira issue links in there wherever possible. Do we want to build
an initial list and have some sort of Proposed/Planned workflow so readers
can have confidence (or appropriate lack of confidence) in what they see
there? voting on things seems like too much but maybe folks who care watch
the page, and if something is on there for a week without objection it can
be called accepted? If a discussion starts here it can be marked
"Considering" so... something like this:

4 states: Proposed, Considering, Planned, Rejected

Workflow like this:
Proposed ---(no objection 1 wk) --> Planned
Proposed ---(discussion)--> Considering
Considering (agreement) --> Planned
Considering (deferred) ---> Proposed (later release)
Considering (unsuitable) -> Rejected
Considering (promoted) ---> Proposed (earlier release)
Planned (difficulty found) ---> Considering

Anything in "Considering" should have an active dev list thread, and if it
didn't happen on the list it didn't happen :). Any of that (or differences
of opinion during Considering) can be overridden by a formal vote of course

-Gus




On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> I've created a placeholder document here:
> https://cwiki.apache.org/confluence/display/SOLR/Roadmap
> Let us put in all our items there.
>
> On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl  wrote:
>
>> Let’s revive this email thread about Roadmap.
>>
>> With so many large initiatives going on, and the TLP split also, I think
>> it makes perfect sense with a Roadmap.
>> I know we’re not used to that kind of thing - we tend to just let things
>> play out as it happens to land in various releases, but this time is
>> special, and I think we’d benefit from more coordination. I don’t know how
>> to enforce such coordination though, other than appealing to all committers
>> to endorse the roadmap and respect it when they merge things. We may not be
>> able to set a release date for 9.0 right now, but we may be able to define
>> preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
>> 8.8 - that kind of coarse-grained decisions. We also may need a person that
>> «owns» the Roadmap confluence page and actively promotes it, tries to keep
>> it up to date and reminds the rest of us about its existence. A roadmap
>> must NOT be a brake slowing us down, but a tool helping us avoid silly
>> mistakes.
>>
>> Jan
>>
>> > 5. jul. 2020 kl. 02:39 skrev Noble Paul :
>> >
>> > I think the logical thing to do today is completely rip out all
>> > autoscaling code as it exists today.
>> > Let's deprecate that in 8.7 and build something for "assign-strategy".
>> > Austoscaling , if required, should not be a part of Solr
>> >
>> >
>> >
>> > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
>> indicate what major things needs to happen when.
>> >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
>> pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
>> 8.8 hehe)?
>> >> That would enable Lucene to ship 9.0 without waiting for a ton of
>> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
>> >>
>> >> Jan
>> >>
>> >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
>> >>
>> >>
>> >>> I totally expect some things to bubble up when we try to release with
>> Gradle, the tarball being one. I don’t think that’s a very big issue, but
>> if you have lots of “not very big” issues they do add up.
>> >>
>> >>
>> >> Adding a tarball is literally 3-5 lines of code (you add a task that
>> builds a tarball or a zip file from the outputs of solr/packaging toDir
>> task)... The bigger issue with gradle is that somebody has to step up and
>> try to identify any other issues and/or missing bits when trying to do a
>> full release cycle.
>> >>
>> >> D.
>> >>
>> >>
>> >
>> >
>> > --
>> > -
>> > Noble Paul
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Ishan Chattopadhyaya

I like Cassandra's original suggestion: uncoordinated vs coordinated (or
non-coordinated vs coordinated).

On Tue, Aug 11, 2020 at 8:19 PM Jan Høydahl  wrote:

> Hehe, «self» can be self as in user or self as in Solr :)
>
> Legacy feels like something that is going away, and so far the
> «standalone» mode is not going anywhere.
> Cassandra, feel free to propose what is your best shot and then I don’t
> think we need a poll for it, but suffice a bunch of +1 on this thread.
>
> Managed Cluster vs Non-managed Cluster?
> Managed Cluster vs User Managed Cluster?
>
> Jan
>
> 11. aug. 2020 kl. 16:21 skrev Cassandra Targett :
>
> OK, fair point about self-managed. But I object to "leaving it" as Legacy,
> as I've previously explained (I put that in quotes because it’s not always
> called that at all - it has at least 3 names right now).
>
> The reality is someone can come up with an objection to every single
> possibility. Someday we have to live with something that’s good enough and
> move forward, or we’ll end up just living with the total mash of things we
> have today. Which maybe is fine with everyone.
>
> I’ve tried to put real mental work into thinking about a good name, and
> have tried to compromise based on feedback. At this point, though, unless
> someone else comes up with something I’m likely done here. We’ll just
> “leave it” all as it is now.
>
> Cassandra
> On Aug 11, 2020, 9:11 AM -0500, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com>, wrote:
>
> I object to "self managed". It gives the impression that Solr manages
> itself, whereas it is the other way around: users need to manage the
> standalone mode with lots of manual effort, as opposed to SolrCloud which
> is in spirit self managed (solr manages itself using zk).
>
> I'm +1 with Legacy replication and SolrCloud replication for now. Later,
> we can get rid of "SolrCloud" and call it something else. Also, once
> SolrCloud is stable enough, we can get rid of legacy mode altogether. We
> can discuss that elsewhere.
>
> On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett, 
> wrote:
>
>> I don’t feel there is a consensus for me to move forward confidently, but
>> the docs need to be fixed before 8.7. I’ve thought about Ilan’s suggestion,
>> and like calling the non-SolrCloud cluster “self-managed”. It avoids the
>> currently awkward phrasing and any misinterpretation of my original
>> suggestion with clumsiness as Gus pointed out. Can everyone live with that?
>>
>> If so, that leaves what we might eventually call SolrCloud is the
>> remaining sticking point. It’s not a problem that needs to be solved today
>> as the term isn’t going anywhere yet since there aren’t any patches or PRs
>> to change it at a code level.
>>
>> Barring further objections, then, I think I will go ahead with mostly
>> leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”,
>> “leader/follower mode”, some cases of “Standalone mode”, and similar
>> constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as
>> appropriate.
>>
>> Cassandra
>> On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett ,
>> wrote:
>>
>> The suggestion to use “managed” and maybe “self-managed” is an
>> interesting one. Do you think it’s possible some might confuse that with
>> the other ways we use managed - like the “managed-schema”, and “managed
>> resources” (synonyms and stop words)? Neither of those are
>> cluster-specific, and I wonder if the overlap in terminology would cause
>> them to be conflated.
>>
>> Cassandra
>> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg ,
>> wrote:
>>
>> Both "legacy" and "SolrCloud" clusters are search server clusters. Seen
>> from far enough, they look the same.
>>
>> In "legacy" the management code is elsewhere (developed by the client
>> operating the cluster, running on other machines using a diferent logic and
>> potentially another DB than Zookeeper) whereas in "SolrCloud" the
>> management code is embedded in the search server(s) code and it happens
>> that (currently) this code relies on Zookeeper.
>>
>> I see SolrCloud as a "managed cluster" vs. legacy that would be "Self
>> managed" by the client, or "U manage" (non managed when looking at it from
>> the Solr codebase perspective).
>>
>> Same idea as coordinated vs uncoordinated basically. I don't know why but
>> I prefer "managed".
>>
>> Ilan
>>
>> On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett 
>> wrote:
>>
>>> On Aug 6, 2020, 10:22 AM -0500, Gus Heck , wrote:
>>>
>>> WRT the name "uncoordinated mode" I fear it could be read (or even
>>> become known as) as "clumsy mode" which is humorous but possibly not what
>>> we're going for :)
>>>
>>>
>>> I had also considered “non-coordinated”, and prefer it but couldn’t
>>> articulate why. The association of “uncoordinated" with clumsiness might be
>>> what was bugging me.
>>>
>>>  I'd perhaps suggest Cluster mode for SolrCloud though I'm not entirely
>>> sure if Legacy Solr (in curren parlance) is not a "cluster" too, cluster
>>>

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Jan Høydahl

Hehe, «self» can be self as in user or self as in Solr :)

Legacy feels like something that is going away, and so far the «standalone» 
mode is not going anywhere.
Cassandra, feel free to propose what is your best shot and then I don’t think 
we need a poll for it, but suffice a bunch of +1 on this thread.

Managed Cluster vs Non-managed Cluster?
Managed Cluster vs User Managed Cluster?

Jan

> 11. aug. 2020 kl. 16:21 skrev Cassandra Targett :
> 
> OK, fair point about self-managed. But I object to "leaving it" as Legacy, as 
> I've previously explained (I put that in quotes because it’s not always 
> called that at all - it has at least 3 names right now).
> 
> The reality is someone can come up with an objection to every single 
> possibility. Someday we have to live with something that’s good enough and 
> move forward, or we’ll end up just living with the total mash of things we 
> have today. Which maybe is fine with everyone.
> 
> I’ve tried to put real mental work into thinking about a good name, and have 
> tried to compromise based on feedback. At this point, though, unless someone 
> else comes up with something I’m likely done here. We’ll just “leave it” all 
> as it is now.
> 
> Cassandra
> On Aug 11, 2020, 9:11 AM -0500, Ishan Chattopadhyaya 
> , wrote:
>> I object to "self managed". It gives the impression that Solr manages 
>> itself, whereas it is the other way around: users need to manage the 
>> standalone mode with lots of manual effort, as opposed to SolrCloud which is 
>> in spirit self managed (solr manages itself using zk).
>> 
>> I'm +1 with Legacy replication and SolrCloud replication for now. Later, we 
>> can get rid of "SolrCloud" and call it something else. Also, once SolrCloud 
>> is stable enough, we can get rid of legacy mode altogether. We can discuss 
>> that elsewhere.
>> 
>> On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett, > > wrote:
>> I don’t feel there is a consensus for me to move forward confidently, but 
>> the docs need to be fixed before 8.7. I’ve thought about Ilan’s suggestion, 
>> and like calling the non-SolrCloud cluster “self-managed”. It avoids the 
>> currently awkward phrasing and any misinterpretation of my original 
>> suggestion with clumsiness as Gus pointed out. Can everyone live with that?
>> 
>> If so, that leaves what we might eventually call SolrCloud is the remaining 
>> sticking point. It’s not a problem that needs to be solved today as the term 
>> isn’t going anywhere yet since there aren’t any patches or PRs to change it 
>> at a code level.
>> 
>> Barring further objections, then, I think I will go ahead with mostly 
>> leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”, 
>> “leader/follower mode”, some cases of “Standalone mode”, and similar 
>> constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as 
>> appropriate.
>> 
>> Cassandra
>> On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett > >, wrote:
>>> The suggestion to use “managed” and maybe “self-managed” is an interesting 
>>> one. Do you think it’s possible some might confuse that with the other ways 
>>> we use managed - like the “managed-schema”, and “managed resources” 
>>> (synonyms and stop words)? Neither of those are cluster-specific, and I 
>>> wonder if the overlap in terminology would cause them to be conflated.
>>> 
>>> Cassandra
>>> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg >> >, wrote:
 Both "legacy" and "SolrCloud" clusters are search server clusters. Seen 
 from far enough, they look the same.
 
 In "legacy" the management code is elsewhere (developed by the client 
 operating the cluster, running on other machines using a diferent logic 
 and potentially another DB than Zookeeper) whereas in "SolrCloud" the 
 management code is embedded in the search server(s) code and it happens 
 that (currently) this code relies on Zookeeper.
 
 I see SolrCloud as a "managed cluster" vs. legacy that would be "Self 
 managed" by the client, or "U manage" (non managed when looking at it from 
 the Solr codebase perspective).
 
 Same idea as coordinated vs uncoordinated basically. I don't know why but 
 I prefer "managed".
 
 Ilan
 
 On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett >>> > wrote:
 On Aug 6, 2020, 10:22 AM -0500, Gus Heck >>> >, wrote:
 WRT the name "uncoordinated mode" I fear it could be read (or even become 
 known as) as "clumsy mode" which is humorous but possibly not what we're 
 going for :)
 
 I had also considered “non-coordinated”, and prefer it but couldn’t 
 articulate why. The association of “uncoordinated" with clumsiness might 
 be what was bugging me.
  I'd perhaps suggest Cluster mode for SolrCloud though I'm not entirely 
 sure if Legacy Solr (in curren

Re: Survey on ManagedResources feature

2020-08-11 Thread Jason Gerlowski

Hey Noble,

Can you explain what you mean when you say it's not secured?  Just for
those of us who haven't been following the discussion so far?  On the
surface of things users taking advantage of our RuleBasedAuth plugin
can secure this API like they can any other HTTP API.  Or are you
talking about some other security aspect here?

Jason

On Tue, Aug 11, 2020 at 9:55 AM Noble Paul  wrote:
>
> Hi all,
> The end-point for Managed resources is not secured. So it needs to be
> fixed/eliminated.
>
> I would like to know what is the level of adoption for that feature
> and if it is a critical feature for users.
>
> Another possibility is to offer a replacement for the feature using a
> different API
>
> Your feedback will help us decide on what a potential solution should be
>
> --
> -
> Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Ishan Chattopadhyaya

How about we collect all the reasonable options and conduct a poll on
solr-users list?

On Tue, Aug 11, 2020 at 7:52 PM Cassandra Targett 
wrote:

> OK, fair point about self-managed. But I object to "leaving it" as Legacy,
> as I've previously explained (I put that in quotes because it’s not always
> called that at all - it has at least 3 names right now).
>
> The reality is someone can come up with an objection to every single
> possibility. Someday we have to live with something that’s good enough and
> move forward, or we’ll end up just living with the total mash of things we
> have today. Which maybe is fine with everyone.
>
> I’ve tried to put real mental work into thinking about a good name, and
> have tried to compromise based on feedback. At this point, though, unless
> someone else comes up with something I’m likely done here. We’ll just
> “leave it” all as it is now.
>
> Cassandra
> On Aug 11, 2020, 9:11 AM -0500, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com>, wrote:
>
> I object to "self managed". It gives the impression that Solr manages
> itself, whereas it is the other way around: users need to manage the
> standalone mode with lots of manual effort, as opposed to SolrCloud which
> is in spirit self managed (solr manages itself using zk).
>
> I'm +1 with Legacy replication and SolrCloud replication for now. Later,
> we can get rid of "SolrCloud" and call it something else. Also, once
> SolrCloud is stable enough, we can get rid of legacy mode altogether. We
> can discuss that elsewhere.
>
> On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett, 
> wrote:
>
>> I don’t feel there is a consensus for me to move forward confidently, but
>> the docs need to be fixed before 8.7. I’ve thought about Ilan’s suggestion,
>> and like calling the non-SolrCloud cluster “self-managed”. It avoids the
>> currently awkward phrasing and any misinterpretation of my original
>> suggestion with clumsiness as Gus pointed out. Can everyone live with that?
>>
>> If so, that leaves what we might eventually call SolrCloud is the
>> remaining sticking point. It’s not a problem that needs to be solved today
>> as the term isn’t going anywhere yet since there aren’t any patches or PRs
>> to change it at a code level.
>>
>> Barring further objections, then, I think I will go ahead with mostly
>> leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”,
>> “leader/follower mode”, some cases of “Standalone mode”, and similar
>> constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as
>> appropriate.
>>
>> Cassandra
>> On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett ,
>> wrote:
>>
>> The suggestion to use “managed” and maybe “self-managed” is an
>> interesting one. Do you think it’s possible some might confuse that with
>> the other ways we use managed - like the “managed-schema”, and “managed
>> resources” (synonyms and stop words)? Neither of those are
>> cluster-specific, and I wonder if the overlap in terminology would cause
>> them to be conflated.
>>
>> Cassandra
>> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg ,
>> wrote:
>>
>> Both "legacy" and "SolrCloud" clusters are search server clusters. Seen
>> from far enough, they look the same.
>>
>> In "legacy" the management code is elsewhere (developed by the client
>> operating the cluster, running on other machines using a diferent logic and
>> potentially another DB than Zookeeper) whereas in "SolrCloud" the
>> management code is embedded in the search server(s) code and it happens
>> that (currently) this code relies on Zookeeper.
>>
>> I see SolrCloud as a "managed cluster" vs. legacy that would be "Self
>> managed" by the client, or "U manage" (non managed when looking at it from
>> the Solr codebase perspective).
>>
>> Same idea as coordinated vs uncoordinated basically. I don't know why but
>> I prefer "managed".
>>
>> Ilan
>>
>> On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett 
>> wrote:
>>
>>> On Aug 6, 2020, 10:22 AM -0500, Gus Heck , wrote:
>>>
>>> WRT the name "uncoordinated mode" I fear it could be read (or even
>>> become known as) as "clumsy mode" which is humorous but possibly not what
>>> we're going for :)
>>>
>>>
>>> I had also considered “non-coordinated”, and prefer it but couldn’t
>>> articulate why. The association of “uncoordinated" with clumsiness might be
>>> what was bugging me.
>>>
>>>  I'd perhaps suggest Cluster mode for SolrCloud though I'm not entirely
>>> sure if Legacy Solr (in curren parlance) is not a "cluster" too, cluster
>>> being a somewhat vague term. However Clustered Mode and Legacy Mode seem
>>> more on target. I think "Legacy" could be changed since we're not really
>>> planning on abandoning it (are we?), but
>>>
>>>
>>> One can have a cluster and not run SolrCloud. I think from an operations
>>> perspective, several servers all running Solr is considered a cluster, no
>>> matter what tools are being used to get them to talk to each other.
>>>
>>> I think “Legacy” (also used today already in

Re: RoadMap?

2020-08-11 Thread Ishan Chattopadhyaya

I've created a placeholder document here:
https://cwiki.apache.org/confluence/display/SOLR/Roadmap
Let us put in all our items there.

On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl  wrote:

> Let’s revive this email thread about Roadmap.
>
> With so many large initiatives going on, and the TLP split also, I think
> it makes perfect sense with a Roadmap.
> I know we’re not used to that kind of thing - we tend to just let things
> play out as it happens to land in various releases, but this time is
> special, and I think we’d benefit from more coordination. I don’t know how
> to enforce such coordination though, other than appealing to all committers
> to endorse the roadmap and respect it when they merge things. We may not be
> able to set a release date for 9.0 right now, but we may be able to define
> preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
> 8.8 - that kind of coarse-grained decisions. We also may need a person that
> «owns» the Roadmap confluence page and actively promotes it, tries to keep
> it up to date and reminds the rest of us about its existence. A roadmap
> must NOT be a brake slowing us down, but a tool helping us avoid silly
> mistakes.
>
> Jan
>
> > 5. jul. 2020 kl. 02:39 skrev Noble Paul :
> >
> > I think the logical thing to do today is completely rip out all
> > autoscaling code as it exists today.
> > Let's deprecate that in 8.7 and build something for "assign-strategy".
> > Austoscaling , if required, should not be a part of Solr
> >
> >
> >
> > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
> wrote:
> >>
> >> +1
> >>
> >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
> indicate what major things needs to happen when.
> >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
> pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
> 8.8 hehe)?
> >> That would enable Lucene to ship 9.0 without waiting for a ton of
> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
> >>
> >> Jan
> >>
> >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
> >>
> >>
> >>> I totally expect some things to bubble up when we try to release with
> Gradle, the tarball being one. I don’t think that’s a very big issue, but
> if you have lots of “not very big” issues they do add up.
> >>
> >>
> >> Adding a tarball is literally 3-5 lines of code (you add a task that
> builds a tarball or a zip file from the outputs of solr/packaging toDir
> task)... The bigger issue with gradle is that somebody has to step up and
> try to identify any other issues and/or missing bits when trying to do a
> full release cycle.
> >>
> >> D.
> >>
> >>
> >
> >
> > --
> > -
> > Noble Paul
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Cassandra Targett

OK, fair point about self-managed. But I object to "leaving it" as Legacy, as 
I've previously explained (I put that in quotes because it’s not always called 
that at all - it has at least 3 names right now).

The reality is someone can come up with an objection to every single 
possibility. Someday we have to live with something that’s good enough and move 
forward, or we’ll end up just living with the total mash of things we have 
today. Which maybe is fine with everyone.

I’ve tried to put real mental work into thinking about a good name, and have 
tried to compromise based on feedback. At this point, though, unless someone 
else comes up with something I’m likely done here. We’ll just “leave it” all as 
it is now.

Cassandra
On Aug 11, 2020, 9:11 AM -0500, Ishan Chattopadhyaya 
, wrote:
> I object to "self managed". It gives the impression that Solr manages itself, 
> whereas it is the other way around: users need to manage the standalone mode 
> with lots of manual effort, as opposed to SolrCloud which is in spirit self 
> managed (solr manages itself using zk).
>
> I'm +1 with Legacy replication and SolrCloud replication for now. Later, we 
> can get rid of "SolrCloud" and call it something else. Also, once SolrCloud 
> is stable enough, we can get rid of legacy mode altogether. We can discuss 
> that elsewhere.
>
> > On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett,  
> > wrote:
> > > I don’t feel there is a consensus for me to move forward confidently, but 
> > > the docs need to be fixed before 8.7. I’ve thought about Ilan’s 
> > > suggestion, and like calling the non-SolrCloud cluster “self-managed”. It 
> > > avoids the currently awkward phrasing and any misinterpretation of my 
> > > original suggestion with clumsiness as Gus pointed out. Can everyone live 
> > > with that?
> > >
> > > If so, that leaves what we might eventually call SolrCloud is the 
> > > remaining sticking point. It’s not a problem that needs to be solved 
> > > today as the term isn’t going anywhere yet since there aren’t any patches 
> > > or PRs to change it at a code level.
> > >
> > > Barring further objections, then, I think I will go ahead with mostly 
> > > leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”, 
> > > “leader/follower mode”, some cases of “Standalone mode”, and similar 
> > > constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., 
> > > as appropriate.
> > >
> > > Cassandra
> > > On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett , 
> > > wrote:
> > > > The suggestion to use “managed” and maybe “self-managed” is an 
> > > > interesting one. Do you think it’s possible some might confuse that 
> > > > with the other ways we use managed - like the “managed-schema”, and 
> > > > “managed resources” (synonyms and stop words)? Neither of those are 
> > > > cluster-specific, and I wonder if the overlap in terminology would 
> > > > cause them to be conflated.
> > > >
> > > > Cassandra
> > > > On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg , 
> > > > wrote:
> > > > > Both "legacy" and "SolrCloud" clusters are search server clusters. 
> > > > > Seen from far enough, they look the same.
> > > > >
> > > > > In "legacy" the management code is elsewhere (developed by the client 
> > > > > operating the cluster, running on other machines using a diferent 
> > > > > logic and potentially another DB than Zookeeper) whereas in 
> > > > > "SolrCloud" the management code is embedded in the search server(s) 
> > > > > code and it happens that (currently) this code relies on Zookeeper.
> > > > >
> > > > > I see SolrCloud as a "managed cluster" vs. legacy that would be "Self 
> > > > > managed" by the client, or "U manage" (non managed when looking at it 
> > > > > from the Solr codebase perspective).
> > > > >
> > > > > Same idea as coordinated vs uncoordinated basically. I don't know why 
> > > > > but I prefer "managed".
> > > > >
> > > > > Ilan
> > > > >
> > > > > > On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett 
> > > > > >  wrote:
> > > > > > > On Aug 6, 2020, 10:22 AM -0500, Gus Heck , 
> > > > > > > wrote:
> > > > > > > > WRT the name "uncoordinated mode" I fear it could be read (or 
> > > > > > > > even become known as) as "clumsy mode" which is humorous but 
> > > > > > > > possibly not what we're going for :)
> > > > > > >
> > > > > > > I had also considered “non-coordinated”, and prefer it but 
> > > > > > > couldn’t articulate why. The association of “uncoordinated" with 
> > > > > > > clumsiness might be what was bugging me.
> > > > > > > >  I'd perhaps suggest Cluster mode for SolrCloud though I'm not 
> > > > > > > > entirely sure if Legacy Solr (in curren parlance) is not a 
> > > > > > > > "cluster" too, cluster being a somewhat vague term. However 
> > > > > > > > Clustered Mode and Legacy Mode seem more on target. I think 
> > > > > > > > "Legacy" could be changed since we're not really planning on 
> > > > > > > > abandoning it (are we?), but
> > > > > > >
> > > > > >

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Ishan Chattopadhyaya

I object to "self managed". It gives the impression that Solr manages
itself, whereas it is the other way around: users need to manage the
standalone mode with lots of manual effort, as opposed to SolrCloud which
is in spirit self managed (solr manages itself using zk).

I'm +1 with Legacy replication and SolrCloud replication for now. Later, we
can get rid of "SolrCloud" and call it something else. Also, once SolrCloud
is stable enough, we can get rid of legacy mode altogether. We can discuss
that elsewhere.

On Tue, 11 Aug, 2020, 7:16 pm Cassandra Targett, 
wrote:

> I don’t feel there is a consensus for me to move forward confidently, but
> the docs need to be fixed before 8.7. I’ve thought about Ilan’s suggestion,
> and like calling the non-SolrCloud cluster “self-managed”. It avoids the
> currently awkward phrasing and any misinterpretation of my original
> suggestion with clumsiness as Gus pointed out. Can everyone live with that?
>
> If so, that leaves what we might eventually call SolrCloud is the
> remaining sticking point. It’s not a problem that needs to be solved today
> as the term isn’t going anywhere yet since there aren’t any patches or PRs
> to change it at a code level.
>
> Barring further objections, then, I think I will go ahead with mostly
> leaving “SolrCloud” as it is, and replacing/modifying “Legacy Scaling”,
> “leader/follower mode”, some cases of “Standalone mode”, and similar
> constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as
> appropriate.
>
> Cassandra
> On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett ,
> wrote:
>
> The suggestion to use “managed” and maybe “self-managed” is an interesting
> one. Do you think it’s possible some might confuse that with the other ways
> we use managed - like the “managed-schema”, and “managed resources”
> (synonyms and stop words)? Neither of those are cluster-specific, and I
> wonder if the overlap in terminology would cause them to be conflated.
>
> Cassandra
> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg , wrote:
>
> Both "legacy" and "SolrCloud" clusters are search server clusters. Seen
> from far enough, they look the same.
>
> In "legacy" the management code is elsewhere (developed by the client
> operating the cluster, running on other machines using a diferent logic and
> potentially another DB than Zookeeper) whereas in "SolrCloud" the
> management code is embedded in the search server(s) code and it happens
> that (currently) this code relies on Zookeeper.
>
> I see SolrCloud as a "managed cluster" vs. legacy that would be "Self
> managed" by the client, or "U manage" (non managed when looking at it from
> the Solr codebase perspective).
>
> Same idea as coordinated vs uncoordinated basically. I don't know why but
> I prefer "managed".
>
> Ilan
>
> On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett 
> wrote:
>
>> On Aug 6, 2020, 10:22 AM -0500, Gus Heck , wrote:
>>
>> WRT the name "uncoordinated mode" I fear it could be read (or even become
>> known as) as "clumsy mode" which is humorous but possibly not what we're
>> going for :)
>>
>>
>> I had also considered “non-coordinated”, and prefer it but couldn’t
>> articulate why. The association of “uncoordinated" with clumsiness might be
>> what was bugging me.
>>
>>  I'd perhaps suggest Cluster mode for SolrCloud though I'm not entirely
>> sure if Legacy Solr (in curren parlance) is not a "cluster" too, cluster
>> being a somewhat vague term. However Clustered Mode and Legacy Mode seem
>> more on target. I think "Legacy" could be changed since we're not really
>> planning on abandoning it (are we?), but
>>
>>
>> One can have a cluster and not run SolrCloud. I think from an operations
>> perspective, several servers all running Solr is considered a cluster, no
>> matter what tools are being used to get them to talk to each other.
>>
>> I think “Legacy” (also used today already in some contexts) is
>> problematic because there aren’t plans to abandon it. Also “Legacy
>> replication” is pretty close to exactly what PULL replicas use to poll
>> leaders and pull new index segments when needed. IOW, it’s not “legacy”,
>> it’s very actively being used in a growing number of clusters. That might
>> be an implementation detail users aren’t aware of, but I feel the term is
>> really lacking mostly in that it just doesn’t say anything besides “it’s
>> older”.
>>
>> the adjective there SHOULD communicate reduced functionality because
>> there are plenty of features that are cloud (cluster) only.
>>
>>
>> In my view, the reduced functionality of non-SolrCloud clusters is mostly
>> around coordination of requests, leader election, configs, and other
>> similar automated activities one does manually otherwise. So, I feel that
>> sort of proves my point - a word that conveys lack of coordination is a
>> good option for what it’s called. If there is a better antonym for
>> “coordinated”, I’m all for considering it but haven’t yet been able to
>> think of/find one.
>>
>> I think

Performance testing is necessary now

2020-08-11 Thread Ishan Chattopadhyaya

Hi Everyone!
   From now on, I intend to request/nag/demand/veto code changes, which
affect default code paths for most users, be accompanied by performance
testing numbers for it (e.g. [1]). Opt in features are fine, I won't
personally bother about them (but if you'd like to perf test them, it would
set a great precedent anyway).

I will also work on setting up automated performance and stress testing
[2], but in the absence of that, let us do performance test manually and
report them in the JIRA. Unless we don't hold ourselves to a high
standards, performance will be a joke whereby performance regressions can
creep in without the committer(s) taking any responsibility towards those
users affected by it (SOLR-14665).

A benchmarking suite that I am working on is at
https://github.com/thesearchstack/solr-bench (SOLR-10317). A stress test
suite is under development (SOLR-13933). If you wish to use either of
these, I shall offer help and support (please ping me on Slack directly or
#solr-dev, or open a Github Issue on that repo).

Regards,
Ishan

[1] -
https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
[2] -
https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
(edited)

Re: hybrid document routing

2020-08-11 Thread Joel Bernstein

SOLR-14728 supports sub-second performance on joins with more than 1
million values from the from index. Nice for access control.



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Aug 11, 2020 at 9:49 AM Joel Bernstein  wrote:

> This ticket will shed some light:
>
> https://issues.apache.org/jira/browse/SOLR-14728
>
>
> I think I'm planning using a different approach to distribute tha ACL's to
> all shards.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Aug 11, 2020 at 1:18 AM Gus Heck  wrote:
>
>> Sounds like complex ACLs based on group memberships that use graph
>> queries ? that would require local ACL's...
>>
>> On Mon, Aug 10, 2020 at 5:56 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> This seems like an XY problem. Would it be possible to describe the
>>> original problem that led you to this solution (in the prototype)? Also, do
>>> you think folks at solr-users@ list would have more ideas related to
>>> this usecase and cross posting there would help?
>>>
>>> On Tue, 11 Aug, 2020, 1:43 am David Smiley,  wrote:
>>>
 Are you sure you need the docs in the same shard when maybe you could
 assume a core exists on each node and then do a query-time join?

 ~ David Smiley
 Apache Lucene/Solr Search Developer
 http://www.linkedin.com/in/davidwsmiley


 On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein 
 wrote:

> I have a situation where I'd like to have the standard compositeId
> router in place for a collection. But, I'd like certain documents (ACL
> documents) to be duplicated on each shard in the collection. To achieve 
> the
> level of access control performance and scalability I'm looking for I need
> the ACL records to be in the same core as the main documents.
>
> I put together a prototype where the compositeId router accepted
> implicit routing parameters and it worked in my testing. Before I open a
> ticket suggesting this approach I wonder what other people thought the 
> best
> approach would be to accomplish this goal.
>
>
>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Survey on ManagedResources feature

2020-08-11 Thread Noble Paul

Hi all,
The end-point for Managed resources is not secured. So it needs to be
fixed/eliminated.

I would like to know what is the level of adoption for that feature
and if it is a critical feature for users.

Another possibility is to offer a replacement for the feature using a
different API

Your feedback will help us decide on what a potential solution should be

-- 
-
Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: hybrid document routing

2020-08-11 Thread Joel Bernstein

This ticket will shed some light:

https://issues.apache.org/jira/browse/SOLR-14728


I think I'm planning using a different approach to distribute tha ACL's to
all shards.




Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Aug 11, 2020 at 1:18 AM Gus Heck  wrote:

> Sounds like complex ACLs based on group memberships that use graph queries
> ? that would require local ACL's...
>
> On Mon, Aug 10, 2020 at 5:56 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> This seems like an XY problem. Would it be possible to describe the
>> original problem that led you to this solution (in the prototype)? Also, do
>> you think folks at solr-users@ list would have more ideas related to
>> this usecase and cross posting there would help?
>>
>> On Tue, 11 Aug, 2020, 1:43 am David Smiley,  wrote:
>>
>>> Are you sure you need the docs in the same shard when maybe you could
>>> assume a core exists on each node and then do a query-time join?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein 
>>> wrote:
>>>
 I have a situation where I'd like to have the standard compositeId
 router in place for a collection. But, I'd like certain documents (ACL
 documents) to be duplicated on each shard in the collection. To achieve the
 level of access control performance and scalability I'm looking for I need
 the ACL records to be in the same core as the main documents.

 I put together a prototype where the compositeId router accepted
 implicit routing parameters and it worked in my testing. Before I open a
 ticket suggesting this approach I wonder what other people thought the best
 approach would be to accomplish this goal.



>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-11 Thread Cassandra Targett

I don’t feel there is a consensus for me to move forward confidently, but the 
docs need to be fixed before 8.7. I’ve thought about Ilan’s suggestion, and 
like calling the non-SolrCloud cluster “self-managed”. It avoids the currently 
awkward phrasing and any misinterpretation of my original suggestion with 
clumsiness as Gus pointed out. Can everyone live with that?

If so, that leaves what we might eventually call SolrCloud is the remaining 
sticking point. It’s not a problem that needs to be solved today as the term 
isn’t going anywhere yet since there aren’t any patches or PRs to change it at 
a code level.

Barring further objections, then, I think I will go ahead with mostly leaving 
“SolrCloud” as it is, and replacing/modifying “Legacy Scaling”, 
“leader/follower mode”, some cases of “Standalone mode”, and similar 
constructions with “Self-Managed Mode” or “Self-Managed Cluster”, etc., as 
appropriate.

Cassandra
On Aug 7, 2020, 9:05 AM -0500, Cassandra Targett , wrote:
> The suggestion to use “managed” and maybe “self-managed” is an interesting 
> one. Do you think it’s possible some might confuse that with the other ways 
> we use managed - like the “managed-schema”, and “managed resources” (synonyms 
> and stop words)? Neither of those are cluster-specific, and I wonder if the 
> overlap in terminology would cause them to be conflated.
>
> Cassandra
> On Aug 6, 2020, 10:51 AM -0500, Ilan Ginzburg , wrote:
> > Both "legacy" and "SolrCloud" clusters are search server clusters. Seen 
> > from far enough, they look the same.
> >
> > In "legacy" the management code is elsewhere (developed by the client 
> > operating the cluster, running on other machines using a diferent logic and 
> > potentially another DB than Zookeeper) whereas in "SolrCloud" the 
> > management code is embedded in the search server(s) code and it happens 
> > that (currently) this code relies on Zookeeper.
> >
> > I see SolrCloud as a "managed cluster" vs. legacy that would be "Self 
> > managed" by the client, or "U manage" (non managed when looking at it from 
> > the Solr codebase perspective).
> >
> > Same idea as coordinated vs uncoordinated basically. I don't know why but I 
> > prefer "managed".
> >
> > Ilan
> >
> > > On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett  
> > > wrote:
> > > > On Aug 6, 2020, 10:22 AM -0500, Gus Heck , wrote:
> > > > > WRT the name "uncoordinated mode" I fear it could be read (or even 
> > > > > become known as) as "clumsy mode" which is humorous but possibly not 
> > > > > what we're going for :)
> > > >
> > > > I had also considered “non-coordinated”, and prefer it but couldn’t 
> > > > articulate why. The association of “uncoordinated" with clumsiness 
> > > > might be what was bugging me.
> > > > >  I'd perhaps suggest Cluster mode for SolrCloud though I'm not 
> > > > > entirely sure if Legacy Solr (in curren parlance) is not a "cluster" 
> > > > > too, cluster being a somewhat vague term. However Clustered Mode and 
> > > > > Legacy Mode seem more on target. I think "Legacy" could be changed 
> > > > > since we're not really planning on abandoning it (are we?), but
> > > >
> > > > One can have a cluster and not run SolrCloud. I think from an 
> > > > operations perspective, several servers all running Solr is considered 
> > > > a cluster, no matter what tools are being used to get them to talk to 
> > > > each other.
> > > >
> > > > I think “Legacy” (also used today already in some contexts) is 
> > > > problematic because there aren’t plans to abandon it. Also “Legacy 
> > > > replication” is pretty close to exactly what PULL replicas use to poll 
> > > > leaders and pull new index segments when needed. IOW, it’s not 
> > > > “legacy”, it’s very actively being used in a growing number of 
> > > > clusters. That might be an implementation detail users aren’t aware of, 
> > > > but I feel the term is really lacking mostly in that it just doesn’t 
> > > > say anything besides “it’s older”.
> > > > > the adjective there SHOULD communicate reduced functionality because 
> > > > > there are plenty of features that are cloud (cluster) only.
> > > >
> > > > In my view, the reduced functionality of non-SolrCloud clusters is 
> > > > mostly around coordination of requests, leader election, configs, and 
> > > > other similar automated activities one does manually otherwise. So, I 
> > > > feel that sort of proves my point - a word that conveys lack of 
> > > > coordination is a good option for what it’s called. If there is a 
> > > > better antonym for “coordinated”, I’m all for considering it but 
> > > > haven’t yet been able to think of/find one.
> > > >
> > > > I think it’s important to think about what differentiates the two ways 
> > > > of managing a Solr cluster and derive the naming from that. What 
> > > > features of SolrCloud don’t exist in the non-SolrCloud approach? What 
> > > > words help us generalize those gaps and can any of them be an 
> > > > appropriate name?
> > > > >
>

Re: Badapple report

2020-08-11 Thread Erick Erickson

Great, thanks! Let me know when you push it, I can beast the test again.

> On Aug 11, 2020, at 3:48 AM, Atri Sharma  wrote:
> 
> I investigated testRequestRateLimiters and hardened the tests up:
> 
> https://github.com/apache/lucene-solr/pull/1736
> 
> This will stop testConcurrentRequests from failing and should
> hopefully stop testSlotBorrowing as well. If testSlotBorrowing
> continues to fail, I will have to rethink the test.
> 
> On Mon, Aug 10, 2020 at 8:17 PM Erick Erickson  
> wrote:
>> 
>> We’re backsliding some. I encourage people to look at: 
>> http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number 
>> of ill-behaved tests, particularly TestRequestRateLimiter, 
>> TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and 
>> TestIndexingSequenceNumbers…
>> 
>> 
>> Raw fail count by week totals, most recent week first (corresponds to bits):
>> Week: 0  had  100 failures
>> Week: 1  had  82 failures
>> Week: 2  had  94 failures
>> Week: 3  had  502 failures
>> 
>> 
>> Failures in Hoss' reports for the last 4 rollups.
>> 
>> There were 585 unannotated tests that failed in Hoss' rollups. Ordered by 
>> the date I downloaded the rollup file, newest->oldest. See above for the 
>> dates the files were collected
>> These tests were NOT BadApple'd or AwaitsFix'd
>> 
>> Failures in the last 4 reports..
>>   Report   Pct runsfails   test
>> 0123   4.4 1583 37  BasicDistributedZkTest.test
>> 0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
>> 0123   2.5 8598248  
>> CloudExitableDirectoryReaderTest.testCreepThenBite
>> 0123   1.9 1712 36  
>> CloudExitableDirectoryReaderTest.testWhitebox
>> 0123   0.5 1587 11  
>> DocValuesNotIndexedTest.testGroupingDVOnlySortLast
>> 0123   2.2 1679 82  HttpPartitionOnCommitTest.test
>> 0123   0.5 1592 16  HttpPartitionTest.test
>> 0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
>> 0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
>> 0123   7.4 1643 59  MultiThreadedOCPTest.test
>> 0123   0.3 1567  8  ReplaceNodeTest.test
>> 0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
>> 0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
>> 0123   2.1  818 19  
>> TestCircuitBreaker.testBuildingMemoryPressure
>> 0123   2.6  818 13  
>> TestCircuitBreaker.testResponseWithCBTiming
>> 0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
>> 0123   2.5 1662 33  TestDistributedGrouping.test
>> 0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
>> 0123   6.4 1614 74  TestExportWriter.testExpr
>> 0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
>> 0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
>> 0123   0.5 1607 26  TestPackages.testPluginLoading
>> 0123   0.7 1596 15  
>> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
>> 0123   1.5 1610 59  TestReRankQParserPlugin.testMinExactCount
>> 0123   0.3 1552  4  TestReplicaProperties.test
>> 0123   0.3 1556  5  
>> TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
>> 0123   0.3 1565  9  TestSolrConfigHandlerCloud.test
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> -- 
> Regards,
> 
> Atri
> Apache Concerted
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: RoadMap?

2020-08-11 Thread Jan Høydahl

Let’s revive this email thread about Roadmap.

With so many large initiatives going on, and the TLP split also, I think it 
makes perfect sense with a Roadmap.
I know we’re not used to that kind of thing - we tend to just let things play 
out as it happens to land in various releases, but this time is special, and I 
think we’d benefit from more coordination. I don’t know how to enforce such 
coordination though, other than appealing to all committers to endorse the 
roadmap and respect it when they merge things. We may not be able to set a 
release date for 9.0 right now, but we may be able to define preconditions and 
scope certain features to 9.0 or 9.1 rather than 8.7 or 8.8 - that kind of 
coarse-grained decisions. We also may need a person that «owns» the Roadmap 
confluence page and actively promotes it, tries to keep it up to date and 
reminds the rest of us about its existence. A roadmap must NOT be a brake 
slowing us down, but a tool helping us avoid silly mistakes.

Jan

> 5. jul. 2020 kl. 02:39 skrev Noble Paul :
> 
> I think the logical thing to do today is completely rip out all
> autoscaling code as it exists today.
> Let's deprecate that in 8.7 and build something for "assign-strategy".
> Austoscaling , if required, should not be a part of Solr
> 
> 
> 
> On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl  wrote:
>> 
>> +1
>> 
>> Why don’t we make a Roadmap wiki page as Cassandra suggests, and indicate 
>> what major things needs to happen when.
>> Perhaps if we can get the Solr TLP and git-split ball rolling as a pre-9.0 
>> task, then perhaps 8.8 could be the last joint release (6.6, 7.7, 8.8 hehe)?
>> That would enable Lucene to ship 9.0 without waiting for a ton of 
>> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
>> 
>> Jan
>> 
>> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
>> 
>> 
>>> I totally expect some things to bubble up when we try to release with 
>>> Gradle, the tarball being one. I don’t think that’s a very big issue, but 
>>> if you have lots of “not very big” issues they do add up.
>> 
>> 
>> Adding a tarball is literally 3-5 lines of code (you add a task that builds 
>> a tarball or a zip file from the outputs of solr/packaging toDir task)... 
>> The bigger issue with gradle is that somebody has to step up and try to 
>> identify any other issues and/or missing bits when trying to do a full 
>> release cycle.
>> 
>> D.
>> 
>> 
> 
> 
> -- 
> -
> Noble Paul
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Badapple report

2020-08-11 Thread Atri Sharma

I investigated testRequestRateLimiters and hardened the tests up:

https://github.com/apache/lucene-solr/pull/1736

This will stop testConcurrentRequests from failing and should
hopefully stop testSlotBorrowing as well. If testSlotBorrowing
continues to fail, I will have to rethink the test.

On Mon, Aug 10, 2020 at 8:17 PM Erick Erickson  wrote:
>
> We’re backsliding some. I encourage people to look at: 
> http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number 
> of ill-behaved tests, particularly TestRequestRateLimiter, 
> TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and 
> TestIndexingSequenceNumbers…
>
>
> Raw fail count by week totals, most recent week first (corresponds to bits):
> Week: 0  had  100 failures
> Week: 1  had  82 failures
> Week: 2  had  94 failures
> Week: 3  had  502 failures
>
>
> Failures in Hoss' reports for the last 4 rollups.
>
> There were 585 unannotated tests that failed in Hoss' rollups. Ordered by the 
> date I downloaded the rollup file, newest->oldest. See above for the dates 
> the files were collected
> These tests were NOT BadApple'd or AwaitsFix'd
>
> Failures in the last 4 reports..
>Report   Pct runsfails   test
>  0123   4.4 1583 37  BasicDistributedZkTest.test
>  0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
>  0123   2.5 8598248  
> CloudExitableDirectoryReaderTest.testCreepThenBite
>  0123   1.9 1712 36  
> CloudExitableDirectoryReaderTest.testWhitebox
>  0123   0.5 1587 11  
> DocValuesNotIndexedTest.testGroupingDVOnlySortLast
>  0123   2.2 1679 82  HttpPartitionOnCommitTest.test
>  0123   0.5 1592 16  HttpPartitionTest.test
>  0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
>  0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
>  0123   7.4 1643 59  MultiThreadedOCPTest.test
>  0123   0.3 1567  8  ReplaceNodeTest.test
>  0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
>  0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
>  0123   2.1  818 19  
> TestCircuitBreaker.testBuildingMemoryPressure
>  0123   2.6  818 13  
> TestCircuitBreaker.testResponseWithCBTiming
>  0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
>  0123   2.5 1662 33  TestDistributedGrouping.test
>  0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
>  0123   6.4 1614 74  TestExportWriter.testExpr
>  0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
>  0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
>  0123   0.5 1607 26  TestPackages.testPluginLoading
>  0123   0.7 1596 15  
> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
>  0123   1.5 1610 59  TestReRankQParserPlugin.testMinExactCount
>  0123   0.3 1552  4  TestReplicaProperties.test
>  0123   0.3 1556  5  
> TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
>  0123   0.3 1565  9  TestSolrConfigHandlerCloud.test
> 
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

40 matches

Mail list logo