Re: using S3 as the Directory for Solr

2020-04-24 Thread Ilan Ginzburg
Hi Rahul, I don't have a direct answer to your question as I don't know of any S3 based Directory implementation. Such an implementation would likely be more complex than an HDFS one. Reason is S3 has eventual consistency. When an S3 file is updated you might still read the old content for a

Re: Overseer documentation

2020-04-24 Thread Ilan Ginzburg
stuff Ilan. Thank you for writing and sharing with us. I intend > to take a deeper look at this next week. > > On Wed, Apr 22, 2020 at 2:36 AM Ilan Ginzburg wrote: >> >> Hello Solr devs, >> >> This is my first post here. I work at Salesforce in France, we're >&g

Overseer documentation

2020-04-21 Thread Ilan Ginzburg
Hello Solr devs, This is my first post here. I work at Salesforce in France, we're adopting SolrCloud and we need it to scale more than it currently does. I've looked at Overseer and documented my understanding. I'm sharing the result, it might help others and is a way to get feedback (I might

Re: [jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 8.0

2020-05-12 Thread Ilan Ginzburg
I could work on a PR to remove it in Solr 9, unless you think it's super tricky work. Ilan On Tue, May 12, 2020 at 3:32 PM Erick Erickson wrote: > > Definitely +1, but I won’t have the bandwidth to help. > > > On May 12, 2020, at 9:03 AM, Jan Høydahl (Jira) wrote: > > > > > >[ > >

Question on changes to /admin/zookeeper handler

2020-05-17 Thread Ilan Ginzburg
I'm in the process of removing everything /clusterstate.json from Solr 9.0 (https://issues.apache.org/jira/browse/SOLR-12823). There's a choice to be made regarding /admin/zookeeper endpoint (ZookeeperInfoHandler). Looking at http://localhost:8985/solr/admin/zookeeper?path=/clusterstate.json

Re: Question on changes to /admin/zookeeper handler

2020-05-17 Thread Ilan Ginzburg
Answering to myself: this is used from the Admin UI - Cloud - Graph page. Therefore need to stay consistent (including the "Json within Json" format for collection data), and reflect any URL changes in services.js. On Sun, May 17, 2020 at 4:30 PM Ilan Ginzburg wrote: > > I'

Gradle precommit checks

2020-05-20 Thread Ilan Ginzburg
This might have been discussed previously but since I'm seeing this behavior... Gradle precommit check does not allow code such as: log.warn("Only in tree one: " + t1); And forces changing it into: log.warn("Only in tree one: {}", t1); I do understand such constraints for debug level logs to

Re: Gradle precommit checks

2020-05-20 Thread Ilan Ginzburg
ode. Especially given how obscure logging costs are. The > difference between 'log.trace(“message {}”, object.toString())’ and > 'log.trace(“message {}”, object)’ for instance is unknown to a _lot_ of > developers. Including me before I started looking at logging in general ;) > > Best, > Erick &

Re: Solr configuration options

2020-09-03 Thread Ilan Ginzburg
totally > > > happy. Why don't you all go back, and work on your own internal fork > > > of Solr if that is all that you guys want. Why even pretend that > > > something is pluggable & offer some value to our users. > > > > > > The existing

Placement plugin PR commit - soon

2020-09-14 Thread Ilan Ginzburg
Advance notice: I plan to commit to master/9.0 coming Wednesday September 16th the "Placement plugin" PR corresponding to SOLR-14613 . This will be the first drop of code for the replacement of the

Re: Solr configuration options

2020-09-03 Thread Ilan Ginzburg
ld live somewhere in ZK. File > system access should not be required to add/remove capacity. If multiple > node configurations need to be supported we should have nodeTypes directory > in zk (similar to configsets for collections), possible node specific > configs there and a

Re: Solr Alpha (EA) release of Reference Branch

2020-10-07 Thread Ilan Ginzburg
TBH, a PR with more than 1400 changed files is hard to look at. How many of us will invest a few weeks at least to really understand it? We should assume that if we don't bring these changes piece by piece, we risk having an unstable version of SolrCloud for a while. When I look at some ongoing

Removing Overseer

2020-10-05 Thread Ilan Ginzburg
I'm sharing the initial drop of a proposal to remove the Overseer from SolrCloud. https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/ This is a structural change that I believe requires a large consensus to be successful or even started. Feedback is most welcome and

Re: Solr Alpha (EA) release of Reference Branch

2020-10-06 Thread Ilan Ginzburg
Another option to integrate this work into the main code line would be to understand what changes have been made and where (Mark's descriptions in Slack go in the right way but are still too high level), and then port or even redo them in main, one by one. I think the danger is high to treat this

Re: Index documents in async way

2020-10-09 Thread Ilan Ginzburg
I like the idea. Two (main) points are not clear for me: - Order of updates: If the current leader fails (its tlog becoming inaccessible) and another leader is elected and indexes some more, what happens when the first leader comes back? What does it do with its tlog and how to know which part

Re: Solr Alpha (EA) release of Reference Branch

2020-10-06 Thread Ilan Ginzburg
ests >> > have gotten extensive improvements of their own) and also observe the >> > effect of the improvement. IIUC, every improvement to Solr seemed to >> > require many iterations to get the tests happy. I remember Mark telling me >> > that it may not even

Solr configuration options

2020-08-28 Thread Ilan Ginzburg
I want to ramp-up/discuss/inventory configuration options in Solr. Here's my understanding of what exists and what could/should be used depending on the need. Please correct/complete as needed (or point to documentation I might have missed). *There are currently 3 sources of general

Re: Solr configuration options

2020-08-28 Thread Ilan Ginzburg
ops.json > (or other ZK node) and read it locally for stand-alone. The API could even > be used to change it if it was stored locally. > > > > That still leaves the chicken-and-egg problem if connecting to ZK in the > first place. > > > >> On Aug 28, 2020, at 7:

Re: Solr configuration options

2020-08-28 Thread Ilan Ginzburg
e we've loaded (without reference to disk or zk). They aren't >> that big and in most cases don't change that fast, so caching a simple copy >> as a string in memory (but only if THAT node loaded it) for verification >> would seem smart. Having a file on disk doesn't tell y

What is "Solr core"?

2020-10-01 Thread Ilan Ginzburg
In code review/design discussions I've seen a few time comments made about a feature or piece of code: "it doesn't belong in [Solr] core". What's the definition of Solr "core" other than it being an IntelliJ module? Does core have access to things that can't be accessed from elsewhere? (like an

Re: Backward compatability handling across major versions

2020-10-01 Thread Ilan Ginzburg
In my opinion, when we really need to break backward compatibility (be it a change of API or of how features are made available, for example Autoscaling), I think the friendly way to do it is to introduce the new implementation first (co-existing with the old one!), deprecate but keep the old way

Re: Solr Alpha (EA) release of Reference Branch

2020-10-03 Thread Ilan Ginzburg
Thanks Ishan for the initiative! I think that’s a good idea if it allows testing that branch, assuming some are ready to invest what it takes and run this in production (maybe not with user facing prod traffic?). I do not think naming it Solr 10 is a good idea though, as it is likely very

Re: BadApple report

2020-05-25 Thread Ilan Ginzburg
Where are the test failure details? On Mon, May 25, 2020 at 4:47 PM Erick Erickson wrote: > Here’s the summary: > > Raw fail count by week totals, most recent week first (corresponds to > bits): > Week: 0 had 113 failures > Week: 1 had 103 failures > Week: 2 had 102 failures > Week: 3

Re: BadApple report

2020-05-25 Thread Ilan Ginzburg
6dc98b018ad > > It’s all complicated by the fact that the failures are intermittent. > > Best, > Erick > > > On May 25, 2020, at 11:22 AM, Ilan Ginzburg wrote: > > > > Where are the test failure details? > > > > On Mon, May 25, 2020 at 4:47 PM Erick Erickson

Why MultiThreadedOCPTest (sometimes) fails

2020-05-30 Thread Ilan Ginzburg
Following Erick’s Bad  report, I looked at MultiThreadedOCPTest.test(). I've found a failure in testFillWorkQueue() in Jenkins logs (not able to reproduce locally). This test enqueues a large number of tasks (115, more than the 100 Collection API parallel executors) to the Collection API queue

Re: Why MultiThreadedOCPTest (sometimes) fails

2020-05-30 Thread Ilan Ginzburg
your fix, it provides some reassurance that your fix is working. Not > totally certain of course. Otherwise, we’ll just commit your fixes and see > if Hoss’ rollups stop showing it. > > Thanks again! > Erick > > > > > On May 30, 2020, at 1:42 PM, Ilan Ginzburg wro

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-06 Thread Ilan Ginzburg
Both "legacy" and "SolrCloud" clusters are search server clusters. Seen from far enough, they look the same. In "legacy" the management code is elsewhere (developed by the client operating the cluster, running on other machines using a diferent logic and potentially another DB than Zookeeper)

Re: RoadMap?

2020-08-11 Thread Ilan Ginzburg
Maybe also add “in progress”? So items do not disappear suddenly from the page when work really starts on them? On Tue 11 Aug 2020 at 17:15, Gus Heck wrote: > Cool, since I brought it up, I can volunteer to help manage the page. We > should get jira issue links in there wherever possible. Do we

Re: SolrCloud upgrade process

2020-06-30 Thread Ilan Ginzburg
If there could be a way to force the new version to continue writing in the previous format for a while, that would allow switching to writing the new format once all nodes have been upgraded (or more likely when the cluster admin decides so). Ilan Le mar. 30 juin 2020 à 21:34, David Smiley a

Re: The moment you've all been waiting for PLEASE READ, Gradle builds will start failing on warnings on 9x!

2020-06-24 Thread Ilan Ginzburg
Thank you Erick! This is useful and saves time (I was able to set up gradle with the assistance you gave me a while ago). I guess that also means Gradle precommit is no longer optional and likely the text initializing PR's descriptions should mention that in some way... On Wed, Jun 24, 2020 at

Re: The moment you've all been waiting for PLEASE READ, Gradle builds will start failing on warnings on 9x!

2020-06-24 Thread Ilan Ginzburg
done that at one point or another. I have to say that Gradle is > much > faster, and just being to do “gradle check” and go do something else for a > while has > made it much more likely that I’ll run it more often. > > Erick > > > On Jun 24, 2020, at 2:32 PM, Ilan Ginzb

Re: [VOTE] Lucene logo contest

2020-06-16 Thread Ilan Ginzburg
A is cleaner and more modern but C is a lot friendlier and "warmer" (and less pretentious). Depending on what the logo is expected to convey, A or C. On Tue, Jun 16, 2020 at 3:12 PM Dawid Weiss wrote: > A is nice and modern... but I still like the current logo better, so > for me it's "C". > >

Re: Welcome Ilan Ginzburg as Lucene/Solr committer

2020-06-22 Thread Ilan Ginzburg
oelsolr.blogspot.com/ > > > On Mon, Jun 22, 2020 at 9:11 AM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Welcome Ilan! >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Sun, Jun 21, 2020

SolrCloud Autoscaling implementation docs?

2020-06-18 Thread Ilan Ginzburg
Are there any docs, notes or Jiras with actual discussions of Autoscaling internal implementation classes such as Suggestion, Suggester and subclasses, Violation, Variable and its implementations, Clause, ComputedType and the like? I did get how Policy.Session works, now I need to understand how

Re: Review your squash-merge message to remove duplicate text

2020-06-24 Thread Ilan Ginzburg
I could only git show the last id in your email David. That means that for most squash and merge the dialog box should be left empty, as the PR title should already have the relevant info (Jira ID + short description), right? And when the PR title does not contain this info, we should edit it

Re: Should Solr expose OS specific file paths?

2020-06-26 Thread Ilan Ginzburg
I believe output from Solr (logs or returned strings representing paths) should conform to the host platform convention. Solr should accept either convention as input regardless of the platform it's running on. Ilan On Fri, Jun 26, 2020 at 6:11 PM David Smiley wrote: > I started a conversation

Re: StringBuffer usage

2020-06-06 Thread Ilan Ginzburg
From https://stackoverflow.com/questions/355089/difference-between-stringbuilder-and-stringbuffer “ StringBuffer is synchronized, StringBuilder is not.”

Re: Welcome Mike Drob to the PMC

2020-07-24 Thread Ilan Ginzburg
Congratulations Mike, happy to hear that! Ilan On Fri, Jul 24, 2020 at 9:56 PM Anshum Gupta wrote: > I am pleased to announce that Mike Drob has accepted the PMC's invitation > to join. > > Congratulations and welcome, Mike! > > -- > Anshum Gupta >

Re: Approach for a new Autoscaling framework

2020-07-25 Thread Ilan Ginzburg
ve concerns to >> detail - and try convince other peers. >> It’s hard for me as a spectator to know whether to agree with Noble >> without a clear picture of what the alternative API or approach would look >> like. >> I’m often a fan of loosely typed APIs since they tend t

Re: Approach for a new Autoscaling framework

2020-07-26 Thread Ilan Ginzburg
e defining the interfaces for > creating policies > > What's not clear to me is how will existing collection APIs like > create-collections/add-replica etc make use of it? Is that something that > has been discussed somewhere that I could read up on? > > > > On Sat, Jul 2

Re: 8.6.1 Release

2020-07-22 Thread Ilan Ginzburg
I didn't look at the issue, but if it is due to a default inefficient policy, instead of a new release (that as Houston points out will not even solve the issue), can't we communicate a workaround, namely a way to reset the default policy to some other value after 8.6 deploy that would make the

Re: Approach for a new Autoscaling framework

2020-07-23 Thread Ilan Ginzburg
gt;> context and inline comments are possible. Having this discussion in 4 >> places (jira, pr, slack and dev list is very hard to keep track of). >> >> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, wrote: >> >>> [I’m moving a discussion from the PR >&

Approach for a new Autoscaling framework

2020-07-23 Thread Ilan Ginzburg
[I’m moving a discussion from the PR for SOLR-14613 to the dev list for a wider audience. This is about replacing the now (in master) gone Autoscaling framework with a way for clients to write

Re: 8.6.1 Release

2020-07-22 Thread Ilan Ginzburg
Shouldn't we add a note right away to 8.6 notifying of the issue? Le mer. 22 juil. 2020 à 20:08, Atri Sharma a écrit : > +1, thanks Houston. > > On Wed, Jul 22, 2020 at 10:51 PM Houston Putman > wrote: > > > > If we agree that this warrants a patch release, I volunteer to do the > release. > >

Re: Welcome Houston Putman to the PMC

2020-12-01 Thread Ilan Ginzburg
Congratulations Houston! On Wed 2 Dec 2020 at 00:17, Munendra S N wrote: > Congratulations and welcome, Houston > > On Wed, Dec 2, 2020 at 3:37 AM Timothy Potter > wrote: > >> Welcome Houston! >> >> On Tue, Dec 1, 2020 at 2:43 PM Tomás Fernández Löbbe < >> tomasflo...@gmail.com> wrote: >> >>>

Re: [DISCUSS] Cross Data-Center Replication in Apache Solr

2020-12-05 Thread Ilan Ginzburg
That's an interesting initiative Anshum! I can see at least two different approaches here, your mention of SolrJ seems to hint at the first one: 1. Get the data as it comes from the client and fork it to local and remote data centers, 2. Create (an asynchronous) stream replicating local data

Re: Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Ilan Ginzburg
Welcome Julie and congrats! On Thu, Nov 19, 2020 at 3:51 AM Julie Tibshirani wrote: > Thank you for the warm welcome! It’s a big honor for me -- I’ve been a > Lucene fan since the start of my software career. I’m excited to contribute > to such a great project. > > I’m a developer at Elastic

Re: Old programmers do fade away

2020-12-30 Thread Ilan Ginzburg
Hey Eric, Sad and happy to read your message. You've been a clear voice in the Lucene Solr community and I was always AMAZED how willing you are to help and explain, over and over again when needed. That's the sad part. The happy part is that those squirrels do need to learn and the electric

Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-01-24 Thread Ilan Ginzburg
An aspect that would be interesting to consider IMO is upgrade and configuration changes. For example a collection in use across Solr version upgrade might require different configuration (config set) with the old and new Solr versions. Solr itself can require changes in config across updates.

Re: Welcome Greg Miller as Lucene committer

2021-05-31 Thread Ilan Ginzburg
Congrats Greg! On Sun, May 30, 2021 at 4:35 PM Greg Miller wrote: > Thanks everyone! I'm honored to have been nominated and look forward > to continuing to work with all of you on Lucene! I'm incredibly > grateful for everyone that has helped me so far. There's a lot to > learn in Lucene and

Re: Separate git repo(s) for Solr modules

2021-05-04 Thread Ilan Ginzburg
As with any dependency on any project, you update the dependency project first then consume the updated dependency in Solr. If the idea is to be able to modify Lucene and Solr in parallel, then the project split is counterproductive. >From the Solr perspective, Lucene and Zookerper are really

Re: Welcome Peter Gromov as Lucene committer

2021-04-06 Thread Ilan Ginzburg
Welcome Peter! On Tue, Apr 6, 2021 at 7:48 PM Robert Muir wrote: > I'm pleased to announce that Peter Gromov has accepted the PMC's > invitation to become a committer. > > Peter, the tradition is that new committers introduce themselves with a > brief bio. > > Congratulations and welcome! > >

Re: Branch cleaning/ archiving

2021-03-10 Thread Ilan Ginzburg
Any risk in the script that command: git push ${REMOTE} cominvent/$BRANCH:refs/tags/history/branches/lucene-solr/$BRANCH errors out in some exotic way (?) but the script continues anyway and proceeds with the delete: git push ${REMOTE} --delete $BRANCH On Wed, Mar 10, 2021 at 10:35 PM Jan

Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-20 Thread Ilan Ginzburg
Congratulations, Mike! - Ilan On Wed, Feb 17, 2021 at 10:32 PM Anshum Gupta wrote: > Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice > President position. > > This year we nominated and elected Michael Sokolov as the Chair, a > decision that the board approved in its

Re: OverseerStatusTest recent failures

2021-02-20 Thread Ilan Ginzburg
to the Collection API. I will make sure to skip returning only the stats that are related to cluster state updater and restore returning collection api stats (when running in distributed cluster updates mode, otherwise all stats are returned). Tomorrow... Ilan On Sun, Feb 21, 2021 at 12:22 AM Ilan

Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
~5% and > not 100% reproducible? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg wrote: > >> Indeed the issue is due to my changes. >> >> In Overseer

Re: OverseerStatusTest recent failures

2021-02-20 Thread Ilan Ginzburg
Thank you David for reporting this. Seems due to my recent changes. I reproduce the failure locally and will look at this tomorrow. With the distributed cluster state updates i've introduced a randomization for using either Overseer based cluster state updates or distributed cluster state

Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
t far from the expected 50% failure rate. I believe the ratio in the graph you sent David (currently at 5.7%) is averaged over a week, and includes failures from all branches (did some other stats on jenkins emails that tend to confirm this assumption). On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzb

Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
I have fixed the issue. A PR is out https://github.com/apache/lucene-solr/pull/2410/files. Most of the work was documenting what stats are actually returned. Now OverseerStatusCmd has more comment lines than code lines. Will merge it shortly. Ilan On Sun, Feb 21, 2021 at 6:05 PM Ilan Ginzburg

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Ilan Ginzburg
Congratulations Jan! On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta wrote: > > Hi everyone, > > I’d like to inform everyone that the newly formed Apache Solr PMC nominated > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice > President. This decision was approved by the board

Re: javadoc fails with no message

2021-08-06 Thread Ilan Ginzburg
Did you create the javadoc package file? Usually its absence leads to cryptic errors... On Fri, Aug 6, 2021 at 6:44 PM Michael Sokolov wrote: > Hi all, does anybody have helpful tips about how to chase down > javadoc build failures? I made some new stuff, and ./gradlew test > passes, but

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Ilan Ginzburg
Welcome Patrick and congrats! On Sun, Dec 19, 2021 at 10:36 PM Michael Sokolov wrote: > > Welcome Patrick! > > On Sun, Dec 19, 2021 at 3:27 PM Xi Chen wrote: > > > > Congratulations and welcome Haoyu! > > > > Best, > > Zach > > > > On Dec 19, 2021, at 12:05 PM, Patrick Zhai wrote: > > > >  >

Re: Lucene PMC Chair Bruno Roustant

2022-03-24 Thread Ilan Ginzburg
Congrats Bruno! Ilan On Wed, Mar 23, 2022 at 10:47 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Yes thank you Mike for handling all the fun PMC and Board issues for the > past year!! And thank you Bruno for the next year!! > > A year is a long time in a human life but it

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-09 Thread Ilan Ginzburg
Jira: ilan GitHub: murblanc (used to have murblanc as Jira id as well and changed to Ilan when I became Solr/Lucene commiter). On Tue, Aug 9, 2022, 5:36 PM Michael McCandless wrote: > OK, added! Thanks: > > >

Re: GDPR compliance

2023-11-28 Thread Ilan Ginzburg
Are larger and older segments even certain to ever be merged in practice? I was assuming that if there is not a lot of new indexed content and not a lot of older documents being deleted, large older segment might never have to be merged. On Tue 28 Nov 2023 at 20:53, Robert Muir wrote: > I

Re: GDPR compliance

2023-11-29 Thread Ilan Ginzburg
To the valid point Robert makes above about the underlying data still on the disk (old news): https://news.sophos.com/en-us/2022/09/23/morgan-stanley-fined-millions-for-selling-off-devices-full-of-customer-pii/ On Wed, Nov 29, 2023 at 11:01 AM Michael Sokolov wrote: > Another way is to ensure