Re: [VOTE] Release Lucene/Solr 8.8.0 RC2
Thanks for handing the release, Noble! +1 (binding) SUCCESS! [0:56:12.016387] Ran the smoke tester, a demo app, and checked the change log. All of that looks good. On Mon, Jan 25, 2021 at 2:22 AM Noble Paul wrote: > Please vote for release candidate 2 for Lucene/Solr 8.8.0 > > The artifacts can be downloaded from: > > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.0-RC2-revb10659f0fc18b58b90929cfdadde94544d202c4a/ > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.0-RC2-revb10659f0fc18b58b90929cfdadde94544d202c4a/ > > > > The vote will be open for at least 72 hours > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > Here is my +1 > -- > - > Noble Paul > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Anshum Gupta
Re: [VOTE] Release Lucene/Solr 8.8.0 RC2
+1 (binding) SUCCESS! [1:24:26.38423] On Mon, Jan 25, 2021 at 3:52 PM Noble Paul wrote: > > Please vote for release candidate 2 for Lucene/Solr 8.8.0 > > The artifacts can be downloaded from: > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.0-RC2-revb10659f0fc18b58b90929cfdadde94544d202c4a/ > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.0-RC2-revb10659f0fc18b58b90929cfdadde94544d202c4a/ > > > > The vote will be open for at least 72 hours > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > Here is my +1 > -- > - > Noble Paul > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Regards, Atri Apache Concerted - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 8.8.0 RC2
+1 (non-binding) Tested Lucene part of RC1 on our service, since that part is not changed so still +1. Patrick Namgyu Kim 于2021年1月27日周三 上午10:26写道: > +1 (binding) > > SUCCESS! [1:30:27.376324] > > On Tue, Jan 26, 2021 at 10:19 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> +1 (binding) >> >> >> SUCCESS! [0:43:40.201461] >> >> >> However, the first time I ran smoke tester, it failed with this: >> >>[junit4] Tests with failures [seed: D3F97A1F3602195A]: >> >>[junit4] - >> org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver >> >> >>[junit4] 2> NOTE: reproduce with: ant test >> -Dtestcase=LeaderTragicEventTest >> -Dtests.method=testLeaderFailsOver -Dtests.seed=D3F97A1F3602195A >> -Dtests.locale=ar-LB -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true >> -Dtes\ >> >> ts.file.encoding=US-ASCII >> >>[junit4] ERROR 10.9s J1 | LeaderTragicEventTest.testLeaderFailsOver >> <<< >> >>[junit4]> Throwable #1: >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error >> from server at https://127.0.0.1:33003/solr: Underlying core creation >> failed while creating collection: testLeaderFailsO\ >> >> ver >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1171) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) >> >>[junit4]>at >> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) >> >>[junit4]>at >> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) >> >>[junit4]>at >> org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver(LeaderTragicEventTest.java:80) >> >>[junit4]>at >> java.lang.Thread.run(Thread.java:748)Throwable #2: >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error >> from server at https://127.0.0.1:33003/solr: Could not find collection : >> \ >> >> testLeaderFailsOver >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1171) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) >> >>[junit4]>at >> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) >> >>[junit4]>at >> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) >> >>[junit4]>at >> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) >> >>[junit4]>at >> org.apache.solr.cloud.LeaderTragicEventTest.tearDown(LeaderTragicEventTest.java:73) >> >>[junit4]>at java.lang.Thread.run(Thread.java:748) >> >> >> I guess it was a transient failure -- I re-ran smoke tester and it passed >> the 2nd time. Is this a known Bad Apple test? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Tue, Jan 26, 2021 at 4:53 AM Ignacio Vera wrote: >> >>> +1 (binding) >>> >>> SUCCESS! [0:53:01.546134] >>> >>> On Tue, Jan 26, 2021 at 1:51 AM Tomás Fernández Löbbe < >>> tomasflo...@gmail.com> wrote: >>> Thanks Noble! And thanks for fixing that concurrency issue, I'd hit it but didn't have time to investigate it. +1 SUCCESS! [0:58:32.036482] On Mon, Jan 25, 2021 at 10:19 AM Timothy Potter wrote: > Thanks Noble! > > +1 SUCCESS! [1:24:28.212370] (my internet is super slow today) > > Re-ran all the Solr
Re: Is it Time to Deprecate the Legacy Facets API
It's worth investigating deprecating the stats component also. I believe JSON facets covers that functionality as well. It will be painful for users though to switch over unfortunately. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jan 22, 2021 at 1:14 PM Jason Gerlowski wrote: > Personally I'd love to see us stop maintaining the duplicated code of > the underlying implementations. I wouldn't mind losing the legacy > syntax as well - I'll take a clear, verbose API over a less-clear, > concise one any day. But I'm probably a minority there. > > Either way I agree with Michael when he said above that the first step > would have to be a parity investigation for features and performance. > > Best, > > Jason > > On Fri, Jan 22, 2021 at 10:05 AM Michael Gibney > wrote: > > > > I agree it would make long-term sense to consolidate the backend > implementation. I think leaving the "classic" user-facing facet API (with > JSON Facet module as a backend) would be a good idea. Either way, I think a > first step would be checking for parity between existing backend > implementations -- possibly in terms of features [1], but certainly in > terms of performance for common use cases [2]. > > > > I think removal of the "classic" user-facing API would cause a lot of > consternation in the user community. I can even see a > non-backward-compatibility argument for preserving the "classic" > user-facing API: it's simpler for simple use cases. _If_ the ultimate goal > is removal of the "classic" user-facing API (not presuming that it is), > that approach could be facilitated in the short term by enticing users > towards "JSON Facet" API ... basically with a "feature freeze" on the > legacy implementation. No new features [3], no new optimizations [4] for > "classic"; concentrate such efforts on JSON Facet. This seems to already be > the de facto case, but it could be a more intentional decision -- e.g. in > [3] it's straightforward to extend the the proposed "facet cache" to the > "classic" impl ... but I could see an argument for intentionally not doing > so. > > > > Robert, I think your concerns about UninvertedField could be addressed > by the `uninvertible="false"` property (currently defaults to "true" for > backward compatibility iiuc; but could default to "false", or at least > provide the ability to set the default for all fields to "false" at node > level solr.xml? -- I know I've wished for the latter!). Also fwiw I'm not > aware of any JSON Facet processors that work with string values in RAM ... > I do think all JSON Facet processors use OrdinalMap now, where relevant. > > > > [1] https://issues.apache.org/jira/browse/SOLR-14921 > > [2] https://issues.apache.org/jira/browse/SOLR-14764 > > [3] https://issues.apache.org/jira/browse/SOLR-13807 > > [4] https://issues.apache.org/jira/browse/SOLR-10732 > > > > On Fri, Jan 22, 2021 at 12:46 AM Robert Muir wrote: > >> > >> Do these two options conflate concerns of input format vs. actual > >> algorithm? That was always my disappointment. > >> > >> I feel like the java apis are off here at the lower level, and it > >> hurts the user. > >> I don't talk about the input format from the user, instead I mean the > >> execution of the faceting query. > >> > >> IMO: building top-level caches (e.g. uninvertedfield) or > >> on-the-fly-caches (e.g. fieldcache) is totally trappy already. > >> But with the uninvertedfield of json facets it does its own thing, > >> even if you went thru the trouble to enable docvalues at index time: > >> that's sad. > >> > >> the code by default should not give the user jvm > >> heap/garbage-collector hell. If you want to do that to yourself, for a > >> totally static index, IMO that should be opt-in. > >> > >> But for the record, it is no longer just two shitty choices like > >> "top-level vs per-segment". There are different field types, e.g. > >> numeric types where the per-segment approach works efficiently. > >> Then you have the strings, but there is a newish middle ground for > >> Strings: OrdinalMap (lucene Multi* interfaces do it) which builds > >> top-level integers structures to speed up string-faceting, but doesnt > >> need *string values* in ram. > >> It is just integers and mostly compresses as deltas. Adrien compresses > >> the shit out of it. > >> > >> So I'd hate for the user to lose the option here of using docvalues to > >> keep faceting out of heap memory, which should not be hassling them > >> already in 2021. > >> Maybe better to refactor the code such that all these concerns aren't > >> unexpectedly tied together. > >> > >> On Thu, Jan 21, 2021 at 10:08 PM David Smiley > wrote: > >> > > >> > There's a JIRA issue about this from 5 years ago: > https://issues.apache.org/jira/browse/SOLR-7296 > >> > I don't recall seeing any resistance to the idea of having the JSON > Faceting module act as a back-end to the front-end (API surface) of Solr's > common/classic/original/whatever faceting API. I don't think that simple >
Re: [VOTE] Release Lucene/Solr 8.8.0 RC2
+1 (binding) SUCCESS! [1:30:27.376324] On Tue, Jan 26, 2021 at 10:19 PM Michael McCandless < luc...@mikemccandless.com> wrote: > +1 (binding) > > > SUCCESS! [0:43:40.201461] > > > However, the first time I ran smoke tester, it failed with this: > >[junit4] Tests with failures [seed: D3F97A1F3602195A]: > >[junit4] - > org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver > > >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=LeaderTragicEventTest > -Dtests.method=testLeaderFailsOver -Dtests.seed=D3F97A1F3602195A > -Dtests.locale=ar-LB -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true > -Dtes\ > > ts.file.encoding=US-ASCII > >[junit4] ERROR 10.9s J1 | LeaderTragicEventTest.testLeaderFailsOver > <<< > >[junit4]> Throwable #1: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at https://127.0.0.1:33003/solr: Underlying core creation > failed while creating collection: testLeaderFailsO\ > > ver > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) > >[junit4]>at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) > >[junit4]>at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1171) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) > >[junit4]>at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) > >[junit4]>at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) > >[junit4]>at > org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver(LeaderTragicEventTest.java:80) > >[junit4]>at java.lang.Thread.run(Thread.java:748)Throwable > #2: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at https://127.0.0.1:33003/solr: Could not find > collection : \ > > testLeaderFailsOver > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) > >[junit4]>at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) > >[junit4]>at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) > >[junit4]>at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1171) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > >[junit4]>at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) > >[junit4]>at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) > >[junit4]>at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) > >[junit4]>at > org.apache.solr.cloud.LeaderTragicEventTest.tearDown(LeaderTragicEventTest.java:73) > >[junit4]>at java.lang.Thread.run(Thread.java:748) > > > I guess it was a transient failure -- I re-ran smoke tester and it passed > the 2nd time. Is this a known Bad Apple test? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Jan 26, 2021 at 4:53 AM Ignacio Vera wrote: > >> +1 (binding) >> >> SUCCESS! [0:53:01.546134] >> >> On Tue, Jan 26, 2021 at 1:51 AM Tomás Fernández Löbbe < >> tomasflo...@gmail.com> wrote: >> >>> Thanks Noble! And thanks for fixing that concurrency issue, I'd hit it >>> but didn't have time to investigate it. >>> >>> +1 >>> SUCCESS! [0:58:32.036482] >>> >>> On Mon, Jan 25, 2021 at 10:19 AM Timothy Potter >>> wrote: >>> Thanks Noble! +1 SUCCESS! [1:24:28.212370] (my internet is super slow today) Re-ran all the Solr operator tests and verified the Cloud graph UI renders correctly now. On Mon, Jan 25, 2021 at 3:22 AM Noble Paul wrote: > Please vote for release candidate 2 for Lucene/Solr 8.8.0 > > The artifacts can be downloaded from: > > >
Solr 7.7.2 IndexWriter is closed as a result of NullPointerException at SchemaSimilarityFactory.SchemaSimilarity.get
Issue: IndexWriter of a specific core has been closed as a result of NullPointerException at "SchemaSimilarityFactory.SchemaSimilarity.get" when updating one of the documents. After this exception, Solr stops indexing the next documents to this specific core and we have to restart the Solr process in order to reopen IndexWriter of this specific core. Cause: Raise condition between multiple threads that perform read/write to the same object. In this case, it is *volatile SolrCore core* in SchemaSimilarityFactory. One thread calls inform() of SchemaSimilarityFactory with the new object of SolrCore that is still under initialization (write operation) When at the same time, another thread performs core.getLatestSchema() (at SchemaSimilarityFactory.SchemaSimilarity.get) of this new SolrCore object that still has not been fully initialized (read operation) When inform of SchemaSimilarityFactory is called (write operation): • During the creation of the new core • During uploading transient core to transient cache • During loading non-transient core to the memory (during Solr startup by coreLoadExecutor thread) When SchemaSimilarityFactory.SchemaSimilarity.get Similarity is called (read operation) • During indexing document It seems like a bug in SolrCore.setLatestSchema() The infrom() is called before initialization of schema Stack of thread that performs inform: at org.apache.solr.search.similarities.SchemaSimilarityFactory.inform(SchemaSimilarityFactory.java:97) at org.apache.solr.core.SolrCore.setLatestSchema(SolrCore.java:319) at org.apache.solr.core.SolrCore.initSchema(SolrCore.java:1139) at org.apache.solr.core.SolrCore.(SolrCore.java:947) at org.apache.solr.core.SolrCore.(SolrCore.java:870) at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1189) at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1721) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) Stack of thread that performs get: org.apache.solr.common.SolrException: Exception writing document id *** to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:250) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:1002) at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:1233) at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$2(DistributedUpdateProcessor.java:1082) at org.apache.solr.update.processor.DistributedUpdateProcessor$$Lambda$344/A4008F60.apply(Unknown Source) at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1082) at
Re: Merging segment parts concurrently (SegmentMerger)
LOL Mike did use http://jirasearch.mikemccandless.com, our dog food Lucene search application demonstrating many of Lucene's features ( http://blog.mikemccandless.com/2016/10/jiraseseach-20-dog-food-using-lucene-to.html), but it was NOT easy to find! I think I had one lonely brain cell still insisting we had indeed talked about this somewhat recently :) Mike McCandless http://blog.mikemccandless.com On Wed, Jan 27, 2021 at 6:43 AM Michael Sokolov wrote: > I thought I remembered the discussion, searched for the issue in jira, but > could not find. Probably Mike used his souped up search? > > On Wed, Jan 27, 2021, 3:07 AM Dawid Weiss wrote: > >> Darn... I swear sometimes, when I try hard enough, I can hear my brain >> cells giving up to atrophy... Sigh. >> >> >> D. >> >> On Wed, Jan 27, 2021 at 4:44 AM David Smiley wrote: >> > >> > LOL and it was Dawid :-) Having amnesia Dawid? >> > I think I've re-explored my own ideas before too. >> > >> > ~ David Smiley >> > Apache Lucene/Solr Search Developer >> > http://www.linkedin.com/in/davidwsmiley >> > >> > >> > On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >> >> Oh I found this long ago (well, ~2 years) issue exploring this: >> https://issues.apache.org/jira/browse/LUCENE-8580 >> >> >> >> Mike McCandless >> >> >> >> http://blog.mikemccandless.com >> >> >> >> >> >> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss >> wrote: >> >>> >> >>> > +1 to make a single merge concurrent! It is horribly frustrating >> to watch that last merge running on a single core :) I have lost many >> hours of my life to this frustration. >> >>> >> >>> > Yeah... it is, isn't it? Especially on new machines where you have >> super-fast SSDs, countless cores, etc... That last merge consumes so few >> resources that the computer feels practically idle... it's hard to explain >> to people using our software (who invested in hardware) why we're basically >> doing nothing... :) >> >>> >> >>> > I do think we need to explore concurrency within terms/postings >> across fields in one segment to really see gains in the common case where >> merge time is dominated by postings. >> >>> >> >>> Yeah, probably. >> >>> >> >>> > if you want to experiment with something like that, you can >> hackishly simulate it today to quickly see the overhead, correct? its a >> small hack to PerFieldPostingsFormat to force it to emit "files-per-field" >> and then CFS will combine it all together. >> >>> >> >>> Good idea, Robert. I'll try this. >> >>> >> >>> > By default merging stored fields is super fast because Lucene can >> copy compressed data directly, but if there are deletes or index sorting is >> enabled this optimization is not applicable anymore and I wouldn't be >> surprised if stored fields started taking non negligible time. >> >>> >> >>> In this case these segments are essentially made from scratch but with >> >>> lots and lots of term vectors and postings... But the more parallel >> >>> stages we can introduce, the better. >> >>> >> >>> I have some other stuff on my plate before I can dive deep into this >> >>> but I eventually will. Thanks for the pointers, everyone - helpful! >> >>> >> >>> D. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Merging segment parts concurrently (SegmentMerger)
I thought I remembered the discussion, searched for the issue in jira, but could not find. Probably Mike used his souped up search? On Wed, Jan 27, 2021, 3:07 AM Dawid Weiss wrote: > Darn... I swear sometimes, when I try hard enough, I can hear my brain > cells giving up to atrophy... Sigh. > > > D. > > On Wed, Jan 27, 2021 at 4:44 AM David Smiley wrote: > > > > LOL and it was Dawid :-) Having amnesia Dawid? > > I think I've re-explored my own ideas before too. > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > > > > > On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> > >> Oh I found this long ago (well, ~2 years) issue exploring this: > https://issues.apache.org/jira/browse/LUCENE-8580 > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss > wrote: > >>> > >>> > +1 to make a single merge concurrent! It is horribly frustrating to > watch that last merge running on a single core :) I have lost many hours > of my life to this frustration. > >>> > >>> > Yeah... it is, isn't it? Especially on new machines where you have > super-fast SSDs, countless cores, etc... That last merge consumes so few > resources that the computer feels practically idle... it's hard to explain > to people using our software (who invested in hardware) why we're basically > doing nothing... :) > >>> > >>> > I do think we need to explore concurrency within terms/postings > across fields in one segment to really see gains in the common case where > merge time is dominated by postings. > >>> > >>> Yeah, probably. > >>> > >>> > if you want to experiment with something like that, you can > hackishly simulate it today to quickly see the overhead, correct? its a > small hack to PerFieldPostingsFormat to force it to emit "files-per-field" > and then CFS will combine it all together. > >>> > >>> Good idea, Robert. I'll try this. > >>> > >>> > By default merging stored fields is super fast because Lucene can > copy compressed data directly, but if there are deletes or index sorting is > enabled this optimization is not applicable anymore and I wouldn't be > surprised if stored fields started taking non negligible time. > >>> > >>> In this case these segments are essentially made from scratch but with > >>> lots and lots of term vectors and postings... But the more parallel > >>> stages we can introduce, the better. > >>> > >>> I have some other stuff on my plate before I can dive deep into this > >>> but I eventually will. Thanks for the pointers, everyone - helpful! > >>> > >>> D. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Merging segment parts concurrently (SegmentMerger)
Darn... I swear sometimes, when I try hard enough, I can hear my brain cells giving up to atrophy... Sigh. D. On Wed, Jan 27, 2021 at 4:44 AM David Smiley wrote: > > LOL and it was Dawid :-) Having amnesia Dawid? > I think I've re-explored my own ideas before too. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless > wrote: >> >> Oh I found this long ago (well, ~2 years) issue exploring this: >> https://issues.apache.org/jira/browse/LUCENE-8580 >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss wrote: >>> >>> > +1 to make a single merge concurrent! It is horribly frustrating to >>> > watch that last merge running on a single core :) I have lost many hours >>> > of my life to this frustration. >>> >>> > Yeah... it is, isn't it? Especially on new machines where you have >>> > super-fast SSDs, countless cores, etc... That last merge consumes so few >>> > resources that the computer feels practically idle... it's hard to >>> > explain to people using our software (who invested in hardware) why we're >>> > basically doing nothing... :) >>> >>> > I do think we need to explore concurrency within terms/postings across >>> > fields in one segment to really see gains in the common case where merge >>> > time is dominated by postings. >>> >>> Yeah, probably. >>> >>> > if you want to experiment with something like that, you can hackishly >>> > simulate it today to quickly see the overhead, correct? its a small hack >>> > to PerFieldPostingsFormat to force it to emit "files-per-field" and then >>> > CFS will combine it all together. >>> >>> Good idea, Robert. I'll try this. >>> >>> > By default merging stored fields is super fast because Lucene can copy >>> > compressed data directly, but if there are deletes or index sorting is >>> > enabled this optimization is not applicable anymore and I wouldn't be >>> > surprised if stored fields started taking non negligible time. >>> >>> In this case these segments are essentially made from scratch but with >>> lots and lots of term vectors and postings... But the more parallel >>> stages we can introduce, the better. >>> >>> I have some other stuff on my plate before I can dive deep into this >>> but I eventually will. Thanks for the pointers, everyone - helpful! >>> >>> D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org