[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925451#comment-16925451 ] Tim Owen commented on SOLR-13240: - Great! Thanks for all your work on this Christine > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Assignee: Christine Poerschke >Priority: Major > Fix For: master (9.0), 8.3 > > Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, > SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, > SOLR-13240.patch, solr-solrj-7.5.0.jar > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat >
[jira] [Commented] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary
[ https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900810#comment-16900810 ] Tim Owen commented on SOLR-13539: - Thanks Thomas, yes we're using 7.7.2 and having trouble, as we use AtomicUpdates heavily. We applied the patch from SOLR-13538 and then we applied your patch from your Github PR (although we excluded the unit tests from your patch) which fixes most thing (thank you). I have attached a further patch we had to do locally to make removeregex work (it looks like it was fixed for the single value case, but multiple values were still failing) perhaps you could add that further fix onto your larger change, or if not I can raise a separate ticket. To be honest, this whole situation with the javabin change is getting confusing, with various partial fixes and it's not clear to me which fixes are on the 7.x branch. Right now, 7.7.2 standard is effectively broken. Thanks for your efforts to try and get this back stable. > Atomic Update Multivalue remove does not work for field types UUID, Enums, > Bool and Binary > --- > > Key: SOLR-13539 > URL: https://issues.apache.org/jira/browse/SOLR-13539 > Project: Solr > Issue Type: Bug > Components: UpdateRequestProcessors >Affects Versions: 7.7.2, 8.1, 8.1.1 >Reporter: Thomas Wöckinger >Priority: Critical > Attachments: SOLR-13539.patch > > Time Spent: 6h 50m > Remaining Estimate: 0h > > When using JavaBinCodec the values of collections are of type > ByteArrayUtf8CharSequence, existing field values are Strings so the remove > Operation does not have any effect. > This is related to following field types: UUID, Enums, Bool and Binary -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary
[ https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-13539: Attachment: SOLR-13539.patch > Atomic Update Multivalue remove does not work for field types UUID, Enums, > Bool and Binary > --- > > Key: SOLR-13539 > URL: https://issues.apache.org/jira/browse/SOLR-13539 > Project: Solr > Issue Type: Bug > Components: UpdateRequestProcessors >Affects Versions: 7.7.2, 8.1, 8.1.1 >Reporter: Thomas Wöckinger >Priority: Critical > Attachments: SOLR-13539.patch > > Time Spent: 6h 50m > Remaining Estimate: 0h > > When using JavaBinCodec the values of collections are of type > ByteArrayUtf8CharSequence, existing field values are Strings so the remove > Operation does not have any effect. > This is related to following field types: UUID, Enums, Bool and Binary -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary
[ https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900197#comment-16900197 ] Tim Owen commented on SOLR-13539: - Not sure if it's a similar issue or not, but we're seeing this problem with the {{removeregex}} atomic update operation (class cast when it tries to turn the javabin values into a String, inside the doRemoveRegex method). btw I previously raised a Jira with more tests for these operations, https://issues.apache.org/jira/browse/SOLR-9505 although I don't know if those would have caught the javabin problem. > Atomic Update Multivalue remove does not work for field types UUID, Enums, > Bool and Binary > --- > > Key: SOLR-13539 > URL: https://issues.apache.org/jira/browse/SOLR-13539 > Project: Solr > Issue Type: Bug > Components: UpdateRequestProcessors >Affects Versions: 7.7.2, 8.1, 8.1.1 >Reporter: Thomas Wöckinger >Priority: Critical > Time Spent: 6h 50m > Remaining Estimate: 0h > > When using JavaBinCodec the values of collections are of type > ByteArrayUtf8CharSequence, existing field values are Strings so the remove > Operation does not have any effect. > This is related to following field types: UUID, Enums, Bool and Binary -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.
[ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890751#comment-16890751 ] Tim Owen commented on SOLR-9961: Thanks.. yes indeed, we had to cherry pick the patch for that into our build. Finally everything is working! > RestoreCore needs the option to download files in parallel. > --- > > Key: SOLR-9961 > URL: https://issues.apache.org/jira/browse/SOLR-9961 > Project: Solr > Issue Type: Improvement > Components: Backup/Restore >Affects Versions: 6.2.1 >Reporter: Timothy Potter >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, > SOLR-9961.patch, SOLR-9961.patch > > > My backup to cloud storage (Google cloud storage in this case, but I think > this is a general problem) takes 8 minutes ... the restore of the same core > takes hours. The restore loop in RestoreCore is serial and doesn't allow me > to parallelize the expensive part of this operation (the IO from the remote > cloud storage service). We need the option to parallelize the download (like > distcp). > Also, I tried downloading the same directory using gsutil and it was very > fast, like 2 minutes. So I know it's not the pipe that's limiting perf here. > Here's a very rough patch that does the parallelization. We may also want to > consider a two-step approach: 1) download in parallel to a temp dir, 2) > perform all the of the checksum validation against the local temp dir. That > will save round trips to the remote cloud storage. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.
[ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890015#comment-16890015 ] Tim Owen commented on SOLR-9961: Thanks Mikhail, we're interested in your findings too, as we do backups to HDFS and to S3 (via S3A) and are currently profiling performance of backups and restores in particular. > RestoreCore needs the option to download files in parallel. > --- > > Key: SOLR-9961 > URL: https://issues.apache.org/jira/browse/SOLR-9961 > Project: Solr > Issue Type: Improvement > Components: Backup/Restore >Affects Versions: 6.2.1 >Reporter: Timothy Potter >Priority: Major > Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, > SOLR-9961.patch, SOLR-9961.patch > > > My backup to cloud storage (Google cloud storage in this case, but I think > this is a general problem) takes 8 minutes ... the restore of the same core > takes hours. The restore loop in RestoreCore is serial and doesn't allow me > to parallelize the expensive part of this operation (the IO from the remote > cloud storage service). We need the option to parallelize the download (like > distcp). > Also, I tried downloading the same directory using gsutil and it was very > fast, like 2 minutes. So I know it's not the pipe that's limiting perf here. > Here's a very rough patch that does the parallelization. We may also want to > consider a two-step approach: 1) download in parallel to a temp dir, 2) > perform all the of the checksum validation against the local temp dir. That > will save round trips to the remote cloud storage. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.
[ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889969#comment-16889969 ] Tim Owen commented on SOLR-9961: Just curious if you tried increasing the copy buffer size as per SOLR-13029 to speed up restores? It would be good to compare the performance of making that change, vs the extra complexity of parallelisation. > RestoreCore needs the option to download files in parallel. > --- > > Key: SOLR-9961 > URL: https://issues.apache.org/jira/browse/SOLR-9961 > Project: Solr > Issue Type: Improvement > Components: Backup/Restore >Affects Versions: 6.2.1 >Reporter: Timothy Potter >Priority: Major > Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, > SOLR-9961.patch, SOLR-9961.patch > > > My backup to cloud storage (Google cloud storage in this case, but I think > this is a general problem) takes 8 minutes ... the restore of the same core > takes hours. The restore loop in RestoreCore is serial and doesn't allow me > to parallelize the expensive part of this operation (the IO from the remote > cloud storage service). We need the option to parallelize the download (like > distcp). > Also, I tried downloading the same directory using gsutil and it was very > fast, like 2 minutes. So I know it's not the pipe that's limiting perf here. > Here's a very rough patch that does the parallelization. We may also want to > consider a two-step approach: 1) download in parallel to a temp dir, 2) > perform all the of the checksum validation against the local temp dir. That > will save round trips to the remote cloud storage. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886282#comment-16886282 ] Tim Owen commented on SOLR-13240: - Yes it looks like the code fix has shown up other (autoscaling) tests that now fail, perhaps as you suggest they were relying on the previous sorting order. > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Priority: Major > Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, > SOLR-13240.patch, solr-solrj-7.5.0.jar > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat > >
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881143#comment-16881143 ] Tim Owen commented on SOLR-13240: - Thanks for following up on this Christine.. updated patch looks good to me. Disclaimer.. we've not tried this patch against the master branch, as we're using 7.x in production. > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Priority: Major > Attachments: SOLR-13240.patch, SOLR-13240.patch, solr-solrj-7.5.0.jar > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat > >
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827054#comment-16827054 ] Tim Owen commented on SOLR-13240: - Good to hear the patch worked for you! I'm not entirely sure in what circumstances it happens, or not, because you'd think it would have never worked and been a blocker for the previous release of this functionality. Clearly it must work sometimes. Maybe it depends how many different shards have replicas on a given node, i.e. when it's sorting a List of replicas from many different shards, there's likely to be more than 1 leader among those. > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Priority: Major > Attachments: SOLR-13240.patch, solr-solrj-7.5.0.jar > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat >
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826964#comment-16826964 ] Tim Owen commented on SOLR-13240: - The one jar you're changing is solr-solrj .. you should find the new jar that Ant built in this path.. /home/ubuntu/solrbuild/solr-7.5.0/solr/build/solr-solrj/ If you compare the jar file in that dir, with the one in your other two paths listed above, you'll see if it's been deployed. The /opt path is what Solr is actually running from, I would guess. So if your patched jar is in there too, you should get that fix after a restart. > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Priority: Major > Attachments: SOLR-13240.patch > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat >
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826788#comment-16826788 ] Tim Owen commented on SOLR-13240: - At first glance, I think you should have used {{ant jar}} instead of {{ant compile}} otherwise it may not have actually built the jar file from the code changes. I don't know much about the install script, but what I tend to do for testing small fixes to the code is just use Ant to build the jar and then drop that new jar in place of the existing one, inside the installation - somewhere it will have unpacked the Solr distro war file, and the library jars are in e.g. /server/solr-webapp/webapp/WEB-INF/lib/ ... you can replace the jar in there, and restart Solr. In this case, you're changing the solr-solrj package, and ant compile should have built the new jar into {{solr/build/solr-solrj/}} > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Priority: Major > Attachments: SOLR-13240.patch > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat >
[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752109#comment-16752109 ] Tim Owen commented on SOLR-13029: - Not sure - I can see someone might want parallelised file copies as well, so that ticket is still valid I think. It probably depends on how many collections you have to restore, if (like us) you have many collections to do, we just kick them off in parallel and let each one work through its files in series. But if you had 1 or 2 large collections it might be better done with the proposed change there. > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, 8.0 >Reporter: Tim Owen >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.0, 7.7, master (9.0) > > Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752062#comment-16752062 ] Tim Owen commented on SOLR-13029: - Thanks Mikhail! > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, 8.0 >Reporter: Tim Owen >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.0, 7.7, master (9.0) > > Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749806#comment-16749806 ] Tim Owen commented on SOLR-13029: - hah, I wasn't suggesting automating that.. just how I manually tested it. I've attached a newer patch, containing some unit tests for the various situations > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, 8.0 >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-13029: Attachment: SOLR-13029.patch > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, 8.0 >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746174#comment-16746174 ] Tim Owen commented on SOLR-13029: - Sure - there's not a huge amount of code logic paths to test, but I can take a look. In practice, I used a heap dump to confirm that the buffer really was the size I set in the configuration. > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, 8.0 >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-13029: Summary: Allow HDFS backup/restore buffer size to be configured (was: Allow HDFS buffer size to be configured) > Allow HDFS backup/restore buffer size to be configured > -- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, master (8.0) >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13029) Allow HDFS buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711366#comment-16711366 ] Tim Owen commented on SOLR-13029: - Updated to be specific to the copying of index files to/from HDFS during backups and restores. Would be configured in solr.xml using e.g. {noformat} .. 262144 {noformat} There is another method in {{HdfsBackupRepository}}, {{openInput}} that is only used for opening small metadata files or getting the checksum during restores, so I have left that using the default buffer size. Only the bulk whole-file copying uses the larger buffer. > Allow HDFS buffer size to be configured > --- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, master (8.0) >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13029) Allow HDFS buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-13029: Attachment: SOLR-13029.patch > Allow HDFS buffer size to be configured > --- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, master (8.0) >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch, SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13029) Allow HDFS buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706976#comment-16706976 ] Tim Owen commented on SOLR-13029: - Yes that's a fair point, I will change the patch so that it allows HdfsBackupRepository to pass a different value (via its xml config) instead of changing the shared constant. It does make me wonder if index-on-hdfs has similar issues with the small buffer size, perhaps it's less of a problem due to random seeks rather than bulk copying. We no longer use indexes on hdfs, so I can't compare - we're only using the hdfs functionality for backups and restores now. > Allow HDFS buffer size to be configured > --- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, master (8.0) >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.
[ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705257#comment-16705257 ] Tim Owen commented on SOLR-9961: We considered using this patch locally, but actually found the problem was in slow HDFS restores because of an undersized copy buffer. See SOLR-13029 for our change to alleviate that. Since we had lots of collections to restore, we did those in parallel instead of making the file restore parallelised. But the buffer patch made each file restore about 10x faster, with a 256kB buffer instead of 4k. > RestoreCore needs the option to download files in parallel. > --- > > Key: SOLR-9961 > URL: https://issues.apache.org/jira/browse/SOLR-9961 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 6.2.1 >Reporter: Timothy Potter >Priority: Major > Attachments: SOLR-9961.patch, SOLR-9961.patch > > > My backup to cloud storage (Google cloud storage in this case, but I think > this is a general problem) takes 8 minutes ... the restore of the same core > takes hours. The restore loop in RestoreCore is serial and doesn't allow me > to parallelize the expensive part of this operation (the IO from the remote > cloud storage service). We need the option to parallelize the download (like > distcp). > Also, I tried downloading the same directory using gsutil and it was very > fast, like 2 minutes. So I know it's not the pipe that's limiting perf here. > Here's a very rough patch that does the parallelization. We may also want to > consider a two-step approach: 1) download in parallel to a temp dir, 2) > perform all the of the checksum validation against the local temp dir. That > will save round trips to the remote cloud storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13029) Allow HDFS buffer size to be configured
[ https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-13029: Attachment: SOLR-13029.patch > Allow HDFS buffer size to be configured > --- > > Key: SOLR-13029 > URL: https://issues.apache.org/jira/browse/SOLR-13029 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore, hdfs >Affects Versions: 7.5, master (8.0) >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-13029.patch > > > There's a default hardcoded buffer size setting of 4096 in the HDFS code > which means in particular that restoring a backup from HDFS takes a long > time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes > is very inefficient. We changed this in our local build used in production to > 256kB and saw a 10x speed improvement when restoring a backup. Attached patch > simply makes this size configurable using a command line setting, much like > several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13029) Allow HDFS buffer size to be configured
Tim Owen created SOLR-13029: --- Summary: Allow HDFS buffer size to be configured Key: SOLR-13029 URL: https://issues.apache.org/jira/browse/SOLR-13029 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: Backup/Restore, hdfs Affects Versions: 7.5, master (8.0) Reporter: Tim Owen There's a default hardcoded buffer size setting of 4096 in the HDFS code which means in particular that restoring a backup from HDFS takes a long time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes is very inefficient. We changed this in our local build used in production to 256kB and saw a 10x speed improvement when restoring a backup. Attached patch simply makes this size configurable using a command line setting, much like several other buffer size values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7394) Make MemoryIndex immutable
[ https://issues.apache.org/jira/browse/LUCENE-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646482#comment-16646482 ] Tim Owen commented on LUCENE-7394: -- Related to this (although I am happy to raise a separate Jira as a bug report) is that mutating a MemoryIndex by calling addField you can end up with a corrupt internal state (and ArrayIndexOutOfBoundsException) if you've done a search on the index beforehand e.g. call addField, then search, then addField again, then search. This appears to be because the sortedTerms internal state gets built when the first search happens, and isn't invalidated/null'd when the next addField happens. So the second search sees a state where sortedTerms and terms are out of sync, and fails. The documentation doesn't say this is a bad sequence of usage (or prevent it) so making it immutable with a Builder would fix that situation. Alternatively, calling search could implicitly call freeze, or addField could null out sortedTerms. > Make MemoryIndex immutable > -- > > Key: LUCENE-7394 > URL: https://issues.apache.org/jira/browse/LUCENE-7394 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Martijn van Groningen >Priority: Major > > The MemoryIndex itself should just be a builder that constructs an > IndexReader instance. The whole notion of freezing a memory index should be > removed. > While we change this we should also clean this class up. There are many > methods to add a field, we should just have a single method that accepts a > `IndexableField`. > The `keywordTokenStream(...)` method is unused and untested and should be > removed and it doesn't belong with the memory index. > The `setSimilarity(...)`, `createSearcher(...)` and `search(...)` methods > should be removed, because the MemoryIndex should just be responsible for > creating an IndexReader instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7830) topdocs facet function
[ https://issues.apache.org/jira/browse/SOLR-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495223#comment-16495223 ] Tim Owen commented on SOLR-7830: I've attached a new patch, I took your original patch and updated it for the 7x branch, then added distributed search support (the merging and re-sorting). We wanted this functionality as it's really useful to fetch 1 or 2 sample documents with each bucket for some of our use-cases, and this approach of using the topdocs aggregate function works really nicely. The only limitation is that the sorting for distributed searches can only work with field sorting, not with functional sorting, and you can only sort by fields that are included in the results (otherwise it would need to include the sort values in shard responses - this could be done, but it was more complex and we didn't need that for our use-case). Also, the offset parameter isn't used, but we felt pagination of these topdocs was quite niche (but it could be added to this patch). > topdocs facet function > -- > > Key: SOLR-7830 > URL: https://issues.apache.org/jira/browse/SOLR-7830 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Yonik Seeley >Priority: Major > Attachments: ALT-SOLR-7830.patch, SOLR-7830.patch, SOLR-7830.patch > > > A topdocs() facet function would return the top N documents per facet bucket. > This would be a big step toward unifying grouping and the new facet module. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7830) topdocs facet function
[ https://issues.apache.org/jira/browse/SOLR-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-7830: --- Attachment: ALT-SOLR-7830.patch > topdocs facet function > -- > > Key: SOLR-7830 > URL: https://issues.apache.org/jira/browse/SOLR-7830 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Yonik Seeley >Priority: Major > Attachments: ALT-SOLR-7830.patch, SOLR-7830.patch, SOLR-7830.patch > > > A topdocs() facet function would return the top N documents per facet bucket. > This would be a big step toward unifying grouping and the new facet module. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11765) Ability to Facet on a Function
[ https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11765: Description: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket -and functional sorting (currently it's only sortable by the bucket label or by volume)- Example usage: {noformat} { facet : { dayOfWeek : { type : function, f : "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } {noformat} I did some refactoring in the facet parser, to hoist some common code for sort and pagination parsing. was: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) Example usage: {noformat} { facet : { dayOfWeek : { type : function, f : "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } {noformat} I did some refactoring in the facet parser, to hoist some common code for sort and pagination parsing. > Ability to Facet on a Function > -- > > Key: SOLR-11765 > URL: https://issues.apache.org/jira/browse/SOLR-11765 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-11765.patch, SOLR-11765.patch > > > This is an extension to the JSON facet functionality, to support faceting on > a function. I have extended the parsing of json.facet to allow a 4th facet > type (function) and you provide a function expression. You can also provide > sort, limit and mincount, as it behaves similarly to faceting on a field. > Subfacets work as normal - you can nest function facets anywhere you can use > other types. > The output is in the same format as field facets, but with a bucket per > distinct value produced by the function. Hence the usage of this is most > appropriate for situations where your function only produces a relatively > small number of possible values. It's also recommended to have docValues on > any field used by the function. > Our initial use-case for this is with a function that extracts a given part > from a date field's value e.g. day of week, or hour of day, where the > possible range of output values is very low. > Still TODO: documentation, unit tests, and possible extensions to support a > missing bucket -and functional sorting (currently it's only sortable by the > bucket label or by volume)- > Example usage: > {noformat} > { facet : { dayOfWeek : { type : function, f : > "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } > {noformat} > I did some refactoring in the facet parser, to hoist some common code for > sort and pagination parsing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Updated] (SOLR-11765) Ability to Facet on a Function
[ https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11765: Attachment: SOLR-11765.patch > Ability to Facet on a Function > -- > > Key: SOLR-11765 > URL: https://issues.apache.org/jira/browse/SOLR-11765 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Tim Owen >Priority: Major > Attachments: SOLR-11765.patch, SOLR-11765.patch > > > This is an extension to the JSON facet functionality, to support faceting on > a function. I have extended the parsing of json.facet to allow a 4th facet > type (function) and you provide a function expression. You can also provide > sort, limit and mincount, as it behaves similarly to faceting on a field. > Subfacets work as normal - you can nest function facets anywhere you can use > other types. > The output is in the same format as field facets, but with a bucket per > distinct value produced by the function. Hence the usage of this is most > appropriate for situations where your function only produces a relatively > small number of possible values. It's also recommended to have docValues on > any field used by the function. > Our initial use-case for this is with a function that extracts a given part > from a date field's value e.g. day of week, or hour of day, where the > possible range of output values is very low. > Still TODO: documentation, unit tests, and possible extensions to support a > missing bucket and functional sorting (currently it's only sortable by the > bucket label or by volume) > Example usage: > {noformat} > { facet : { dayOfWeek : { type : function, f : > "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } > {noformat} > I did some refactoring in the facet parser, to hoist some common code for > sort and pagination parsing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11832) Restore from backup creates old format collections
[ https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321976#comment-16321976 ] Tim Owen commented on SOLR-11832: - [~varunthacker] .. yes it does appear to be the same issue as SOLR-11586 .. sorry I had searched Jira for {{backup}} and {{restore}} and didn't see your ticket before! I'd agree the default needs changing too (as my patch is doing) but I'm less familiar with what other code paths might end up invoking that {{ClusterStateMutator}} code I've changed (it might be others as well as restored backups). Feel free to close this as a duplicate then > Restore from backup creates old format collections > -- > > Key: SOLR-11832 > URL: https://issues.apache.org/jira/browse/SOLR-11832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.2, 6.6.2 >Reporter: Tim Owen >Assignee: Varun Thacker >Priority: Minor > Attachments: SOLR-11832.patch > > > Restoring a collection from a backup always creates the new collection using > the old format state json (format 1), as a global clusterstate.json file at > top level of ZK. All new collections should be defaulting to use the newer > per-collection (format 2) in /collections/.../state.json > As we're running clusters with many collections, the old global state format > isn't good for us, so as a workaround for now we're calling > MIGRATESTATEFORMAT immediately after the RESTORE call. > This bug was mentioned in the comments of SOLR-5750 and also recently > mentioned by [~varunthacker] in SOLR-11560 > Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this > means at least 1 test class doesn't succeed anymore. From what I can tell, > the BasicDistributedZk2Test fails because it's not using the official > collection API to create a collection, it seems to be bypassing that and > manually creating cores using the core admin api instead, which I think is > not enough to ensure the correct ZK nodes are created. The test superclass > has some methods to create a collection which do use the collection api so I > could try fixing the tests (I'm just not that familiar with why those > BasicDistributed*Test classes aren't using the collection api). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11832) Restore from backup creates old format collections
[ https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318139#comment-16318139 ] Tim Owen commented on SOLR-11832: - You're quite right Erick, my mistake.. that test class has been fixed in the master branch (but is still broken in branch_6x) so with this patch the tests do complete successfully. Hence this patch can be merged to master and to branch_7x but it can't be backported to branch_6x as it stands. We're running 6.6.2 in production, so we'll just use the workaround for now until we get around to upgrading to 7. > Restore from backup creates old format collections > -- > > Key: SOLR-11832 > URL: https://issues.apache.org/jira/browse/SOLR-11832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.2, 6.6.2 >Reporter: Tim Owen >Assignee: Varun Thacker >Priority: Minor > Attachments: SOLR-11832.patch > > > Restoring a collection from a backup always creates the new collection using > the old format state json (format 1), as a global clusterstate.json file at > top level of ZK. All new collections should be defaulting to use the newer > per-collection (format 2) in /collections/.../state.json > As we're running clusters with many collections, the old global state format > isn't good for us, so as a workaround for now we're calling > MIGRATESTATEFORMAT immediately after the RESTORE call. > This bug was mentioned in the comments of SOLR-5750 and also recently > mentioned by [~varunthacker] in SOLR-11560 > Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this > means at least 1 test class doesn't succeed anymore. From what I can tell, > the BasicDistributedZk2Test fails because it's not using the official > collection API to create a collection, it seems to be bypassing that and > manually creating cores using the core admin api instead, which I think is > not enough to ensure the correct ZK nodes are created. The test superclass > has some methods to create a collection which do use the collection api so I > could try fixing the tests (I'm just not that familiar with why those > BasicDistributed*Test classes aren't using the collection api). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11832) Restore from backup creates old format collections
[ https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11832: Priority: Minor (was: Major) > Restore from backup creates old format collections > -- > > Key: SOLR-11832 > URL: https://issues.apache.org/jira/browse/SOLR-11832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.2, 6.6.2 >Reporter: Tim Owen >Priority: Minor > Attachments: SOLR-11832.patch > > > Restoring a collection from a backup always creates the new collection using > the old format state json (format 1), as a global clusterstate.json file at > top level of ZK. All new collections should be defaulting to use the newer > per-collection (format 2) in /collections/.../state.json > As we're running clusters with many collections, the old global state format > isn't good for us, so as a workaround for now we're calling > MIGRATESTATEFORMAT immediately after the RESTORE call. > This bug was mentioned in the comments of SOLR-5750 and also recently > mentioned by [~varunthacker] in SOLR-11560 > Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this > means at least 1 test class doesn't succeed anymore. From what I can tell, > the BasicDistributedZk2Test fails because it's not using the official > collection API to create a collection, it seems to be bypassing that and > manually creating cores using the core admin api instead, which I think is > not enough to ensure the correct ZK nodes are created. The test superclass > has some methods to create a collection which do use the collection api so I > could try fixing the tests (I'm just not that familiar with why those > BasicDistributed*Test classes aren't using the collection api). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11832) Restore from backup creates old format collections
[ https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11832: Attachment: SOLR-11832.patch > Restore from backup creates old format collections > -- > > Key: SOLR-11832 > URL: https://issues.apache.org/jira/browse/SOLR-11832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.2, 6.6.2 >Reporter: Tim Owen > Attachments: SOLR-11832.patch > > > Restoring a collection from a backup always creates the new collection using > the old format state json (format 1), as a global clusterstate.json file at > top level of ZK. All new collections should be defaulting to use the newer > per-collection (format 2) in /collections/.../state.json > As we're running clusters with many collections, the old global state format > isn't good for us, so as a workaround for now we're calling > MIGRATESTATEFORMAT immediately after the RESTORE call. > This bug was mentioned in the comments of SOLR-5750 and also recently > mentioned by [~varunthacker] in SOLR-11560 > Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this > means at least 1 test class doesn't succeed anymore. From what I can tell, > the BasicDistributedZk2Test fails because it's not using the official > collection API to create a collection, it seems to be bypassing that and > manually creating cores using the core admin api instead, which I think is > not enough to ensure the correct ZK nodes are created. The test superclass > has some methods to create a collection which do use the collection api so I > could try fixing the tests (I'm just not that familiar with why those > BasicDistributed*Test classes aren't using the collection api). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11832) Restore from backup creates old format collections
Tim Owen created SOLR-11832: --- Summary: Restore from backup creates old format collections Key: SOLR-11832 URL: https://issues.apache.org/jira/browse/SOLR-11832 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Backup/Restore Affects Versions: 6.6.2, 7.2 Reporter: Tim Owen Restoring a collection from a backup always creates the new collection using the old format state json (format 1), as a global clusterstate.json file at top level of ZK. All new collections should be defaulting to use the newer per-collection (format 2) in /collections/.../state.json As we're running clusters with many collections, the old global state format isn't good for us, so as a workaround for now we're calling MIGRATESTATEFORMAT immediately after the RESTORE call. This bug was mentioned in the comments of SOLR-5750 and also recently mentioned by [~varunthacker] in SOLR-11560 Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this means at least 1 test class doesn't succeed anymore. From what I can tell, the BasicDistributedZk2Test fails because it's not using the official collection API to create a collection, it seems to be bypassing that and manually creating cores using the core admin api instead, which I think is not enough to ensure the correct ZK nodes are created. The test superclass has some methods to create a collection which do use the collection api so I could try fixing the tests (I'm just not that familiar with why those BasicDistributed*Test classes aren't using the collection api). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11765) Ability to Facet on a Function
[ https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11765: Description: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) Example usage: {noformat} { facet : { dayOfWeek : { type : function, f : "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } {noformat} I did some refactoring in the facet parser, to hoist some common code for sort and pagination parsing. was: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) Example usage: { facet : { dayOfWeek : { type : function, f : "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } I did some refactoring in the facet parser, to hoist some common code for sort and pagination parsing. > Ability to Facet on a Function > -- > > Key: SOLR-11765 > URL: https://issues.apache.org/jira/browse/SOLR-11765 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Tim Owen > Attachments: SOLR-11765.patch > > > This is an extension to the JSON facet functionality, to support faceting on > a function. I have extended the parsing of json.facet to allow a 4th facet > type (function) and you provide a function expression. You can also provide > sort, limit and mincount, as it behaves similarly to faceting on a field. > Subfacets work as normal - you can nest function facets anywhere you can use > other types. > The output is in the same format as field facets, but with a bucket per > distinct value produced by the function. Hence the usage of this is most > appropriate for situations where your function only produces a relatively > small number of possible values. It's also recommended to have docValues on > any field used by the function. > Our initial use-case for this is with a function that extracts a given part > from a date field's value e.g. day of week, or hour of day, where the > possible range of output values is very low. > Still TODO: documentation, unit tests, and possible extensions to support a > missing bucket and functional sorting (currently it's only sortable by the > bucket label or by volume) > Example usage: > {noformat} > { facet : { dayOfWeek : { type : function, f : > "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } > {noformat} > I did some refactoring in the facet parser, to hoist some common code for > sort and pagination parsing. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Updated] (SOLR-11765) Ability to Facet on a Function
[ https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11765: Description: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) Example usage: { facet : { dayOfWeek : { type : function, f : "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } I did some refactoring in the facet parser, to hoist some common code for sort and pagination parsing. was: This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) > Ability to Facet on a Function > -- > > Key: SOLR-11765 > URL: https://issues.apache.org/jira/browse/SOLR-11765 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Tim Owen > Attachments: SOLR-11765.patch > > > This is an extension to the JSON facet functionality, to support faceting on > a function. I have extended the parsing of json.facet to allow a 4th facet > type (function) and you provide a function expression. You can also provide > sort, limit and mincount, as it behaves similarly to faceting on a field. > Subfacets work as normal - you can nest function facets anywhere you can use > other types. > The output is in the same format as field facets, but with a bucket per > distinct value produced by the function. Hence the usage of this is most > appropriate for situations where your function only produces a relatively > small number of possible values. It's also recommended to have docValues on > any field used by the function. > Our initial use-case for this is with a function that extracts a given part > from a date field's value e.g. day of week, or hour of day, where the > possible range of output values is very low. > Still TODO: documentation, unit tests, and possible extensions to support a > missing bucket and functional sorting (currently it's only sortable by the > bucket label or by volume) > Example usage: > { facet : { dayOfWeek : { type : function, f : > "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } } > I did some refactoring in the facet parser, to hoist some common code for > sort and pagination parsing. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11765) Ability to Facet on a Function
[ https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-11765: Attachment: SOLR-11765.patch > Ability to Facet on a Function > -- > > Key: SOLR-11765 > URL: https://issues.apache.org/jira/browse/SOLR-11765 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Tim Owen > Attachments: SOLR-11765.patch > > > This is an extension to the JSON facet functionality, to support faceting on > a function. I have extended the parsing of json.facet to allow a 4th facet > type (function) and you provide a function expression. You can also provide > sort, limit and mincount, as it behaves similarly to faceting on a field. > Subfacets work as normal - you can nest function facets anywhere you can use > other types. > The output is in the same format as field facets, but with a bucket per > distinct value produced by the function. Hence the usage of this is most > appropriate for situations where your function only produces a relatively > small number of possible values. It's also recommended to have docValues on > any field used by the function. > Our initial use-case for this is with a function that extracts a given part > from a date field's value e.g. day of week, or hour of day, where the > possible range of output values is very low. > Still TODO: documentation, unit tests, and possible extensions to support a > missing bucket and functional sorting (currently it's only sortable by the > bucket label or by volume) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11765) Ability to Facet on a Function
Tim Owen created SOLR-11765: --- Summary: Ability to Facet on a Function Key: SOLR-11765 URL: https://issues.apache.org/jira/browse/SOLR-11765 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: Facet Module, JSON Request API Reporter: Tim Owen This is an extension to the JSON facet functionality, to support faceting on a function. I have extended the parsing of json.facet to allow a 4th facet type (function) and you provide a function expression. You can also provide sort, limit and mincount, as it behaves similarly to faceting on a field. Subfacets work as normal - you can nest function facets anywhere you can use other types. The output is in the same format as field facets, but with a bucket per distinct value produced by the function. Hence the usage of this is most appropriate for situations where your function only produces a relatively small number of possible values. It's also recommended to have docValues on any field used by the function. Our initial use-case for this is with a function that extracts a given part from a date field's value e.g. day of week, or hour of day, where the possible range of output values is very low. Still TODO: documentation, unit tests, and possible extensions to support a missing bucket and functional sorting (currently it's only sortable by the bucket label or by volume) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168150#comment-16168150 ] Tim Owen commented on SOLR-10826: - Hi - can this be backported to the 6.x branch? We're using it built locally on top of 6.6 in the meantime. I thought it might be included in 6.6.1 but didn't notice it. > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.4, 6.5.1, 6.6 >Reporter: Tim Owen >Assignee: Varun Thacker > Fix For: 7.0 > > Attachments: SOLR-10826.patch, SOLR-10826.patch, SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-10826: Attachment: SOLR-10826.patch OK I've expanded the test a bit, it now creates a second collection, and alias for it, and a combined alias spanning both. Then it tests the various combinations of {{collection=...}} values work as expected. Again, these tests do fail without the code fix. > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.4, 6.5.1, 6.6 >Reporter: Tim Owen >Assignee: Varun Thacker > Attachments: SOLR-10826.patch, SOLR-10826.patch, SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066740#comment-16066740 ] Tim Owen commented on SOLR-10826: - Updated patch with some extra assertions. Without the code fix, those extra lines fail the test as expected, but pass with the fix. I had a look at the AliasIntegrationTest but it essential does the same kind of thing the CloudSolrClientTest is doing. > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.4, 6.5.1, 6.6 >Reporter: Tim Owen >Assignee: Varun Thacker > Attachments: SOLR-10826.patch, SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-10826: Attachment: SOLR-10826.patch > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.4, 6.5.1, 6.6 >Reporter: Tim Owen >Assignee: Varun Thacker > Attachments: SOLR-10826.patch, SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061084#comment-16061084 ] Tim Owen commented on SOLR-10826: - Hi Varun, yes good point I will add some more tests for this code, next week. Do you think it doesn't affect the master branch, as you've removed that from the Affects field? The code is the same in master too, still. > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.4, 6.5.1, 6.6 >Reporter: Tim Owen >Assignee: Varun Thacker > Attachments: SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-10826: Affects Version/s: 6.5.1 > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.5.1, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
Tim Owen created SOLR-10826: --- Summary: CloudSolrClient using unsplit collection list when expanding aliases Key: SOLR-10826 URL: https://issues.apache.org/jira/browse/SOLR-10826 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Affects Versions: master (7.0) Reporter: Tim Owen Attachments: SOLR-10826.patch Some recent refactoring seems to have introduced a bug in SolrJ's CloudSolrClient, when it's expanding a collection list and resolving aliases, it's using the wrong local variable for the alias lookup. This leads to an exception because the value is not an alias. E.g. suppose you made a request with {{=x,y}} where either or both of {{x}} and {{y}} are not real collection names but valid aliases. This will fail, incorrectly, because the lookup is using {{x,y}} as a potential alias name lookup. Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases
[ https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-10826: Attachment: SOLR-10826.patch > CloudSolrClient using unsplit collection list when expanding aliases > > > Key: SOLR-10826 > URL: https://issues.apache.org/jira/browse/SOLR-10826 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: master (7.0) >Reporter: Tim Owen > Attachments: SOLR-10826.patch > > > Some recent refactoring seems to have introduced a bug in SolrJ's > CloudSolrClient, when it's expanding a collection list and resolving aliases, > it's using the wrong local variable for the alias lookup. This leads to an > exception because the value is not an alias. > E.g. suppose you made a request with {{=x,y}} where either or both > of {{x}} and {{y}} are not real collection names but valid aliases. This will > fail, incorrectly, because the lookup is using {{x,y}} as a potential alias > name lookup. > Patch to fix this attached, which was tested locally and fixed the issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections
[ https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925821#comment-15925821 ] Tim Owen commented on SOLR-7191: Admittedly not thousands of collections, but another anecdote. Each of our clusters are 12 hosts running 6 nodes each, with 165 collections of 16 shards each, 3x replication. So around 7900 cores spread over 72 nodes (roughly 100 each). To get stable restarts we throttle the recovery thread pool size, see ticket I raised with our patch, SOLR-9936 - without that, the amount of recovery just kills the network and disks and the cluster status never settles. Also we avoid restarting all nodes at once, we bring up a few at a time and wait for their recovery to finish before starting more. We need to automate this, e.g. using a Zookeeper lock pool so that nodes will wait to startup. > Improve stability and startup performance of SolrCloud with thousands of > collections > > > Key: SOLR-7191 > URL: https://issues.apache.org/jira/browse/SOLR-7191 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 >Reporter: Shawn Heisey >Assignee: Noble Paul > Labels: performance, scalability > Fix For: 6.3 > > Attachments: lots-of-zkstatereader-updates-branch_5x.log, > SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, > SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch > > > A user on the mailing list with thousands of collections (5000 on 4.10.3, > 4000 on 5.0) is having severe problems with getting Solr to restart. > I tried as hard as I could to duplicate the user setup, but I ran into many > problems myself even before I was able to get 4000 collections created on a > 5.0 example cloud setup. Restarting Solr takes a very long time, and it is > not very stable once it's up and running. > This kind of setup is very much pushing the envelope on SolrCloud performance > and scalability. It doesn't help that I'm running both Solr nodes on one > machine (I started with 'bin/solr -e cloud') and that ZK is embedded. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size
Title: Message Title Tim Owen updated an issue Solr / SOLR-9936 Allow configuration for recoveryExecutor thread pool size Just uploaded a replacement patch that builds against the master branch (the previous one was a patch against 6.3 and wouldn't merge to master because of all the changes to metrics) Change By: Tim Owen Attachment: SOLR-9936.patch Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] [Commented] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size
[ https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811200#comment-15811200 ] Tim Owen commented on SOLR-9936: Thanks - given the comment with the updateExecutor code and Yonik's reply in ticket SOLR-8205 I was wary of changing this, but I couldn't see a scenario where it could deadlock. Would certainly appreciate some further input from people who've worked on the recovery code e.g. [~shalinmangar] in SOLR-7280. > Allow configuration for recoveryExecutor thread pool size > - > > Key: SOLR-9936 > URL: https://issues.apache.org/jira/browse/SOLR-9936 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 6.3 >Reporter: Tim Owen > Attachments: SOLR-9936.patch > > > There are two executor services in {{UpdateShardHandler}}, the > {{updateExecutor}} whose size is unbounded for reasons explained in the code > comments. There is also the {{recoveryExecutor}} which was added later, and > is the one that executes the {{RecoveryStrategy}} code to actually fetch > index files and store to disk, eventually calling an {{fsync}} thread to > ensure the data is written. > We found that with a fast network such as 10GbE it's very easy to overload > the local disk storage when doing a restart of Solr instances after some > downtime, if they have many cores to load. Typically we have each physical > server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir > on a dedicated SSD. With 100+ cores (shard replicas) on each instance, > startup can really hammer the SSD as it's writing in parallel from as many > cores as Solr is recovering. This made recovery time bad enough that replicas > were down for a long time, and even shards marked as down if none of its > replicas have recovered (usually when many machines have been restarted). The > very slow IO times (10s of seconds or worse) also made the JVM pause, so that > disconnects from ZK, which didn't help recovery either. > This patch allowed us to throttle how much parallelism there would be writing > to a disk - in practice we're using a pool size of 4 threads, to prevent the > SSD getting overloaded, and that worked well enough to make recovery of all > cores in reasonable time. > Due to the comment on the other thread pool size, I'd like some comments on > whether it's OK to do this for the {{recoveryExecutor}} though? > It's configured in solr.xml with e.g. > {noformat} > > ${solr.recovery.threads:4} > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size
[ https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9936: --- Description: There are two executor services in {{UpdateShardHandler}}, the {{updateExecutor}} whose size is unbounded for reasons explained in the code comments. There is also the {{recoveryExecutor}} which was added later, and is the one that executes the {{RecoveryStrategy}} code to actually fetch index files and store to disk, eventually calling an {{fsync}} thread to ensure the data is written. We found that with a fast network such as 10GbE it's very easy to overload the local disk storage when doing a restart of Solr instances after some downtime, if they have many cores to load. Typically we have each physical server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can really hammer the SSD as it's writing in parallel from as many cores as Solr is recovering. This made recovery time bad enough that replicas were down for a long time, and even shards marked as down if none of its replicas have recovered (usually when many machines have been restarted). The very slow IO times (10s of seconds or worse) also made the JVM pause, so that disconnects from ZK, which didn't help recovery either. This patch allowed us to throttle how much parallelism there would be writing to a disk - in practice we're using a pool size of 4 threads, to prevent the SSD getting overloaded, and that worked well enough to make recovery of all cores in reasonable time. Due to the comment on the other thread pool size, I'd like some comments on whether it's OK to do this for the {{recoveryExecutor}} though? It's configured in solr.xml with e.g. {noformat} ${solr.recovery.threads:4} {noformat} was: There are two executor services in {{UpdateShardHandler}}, the {{updateExecutor}} whose size is unbounded for reasons explained in the code comments. There is also the {{recoveryExecutor}} which was added later, and is the one that executes the {{RecoveryStrategy}} code to actually fetch index files and store to disk, eventually calling an {{fsync}} thread to ensure the data is written. We found that with a fast network such as 10GbE it's very easy to overload the local disk storage when doing a restart of Solr instances after some downtime, if they have many cores to load. Typically we have each physical server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can really hammer the SSD as it's writing in parallel from as many cores as Solr is recovering. This made recovery time bad enough that replicas were down for a long time, and even shards marked as down if none of its replicas have recovered (usually when many machines have been restarted). This patch allowed us to throttle how much parallelism there would be writing to a disk - in practice we're using a pool size of 4 threads, to prevent the SSD getting overloaded, and that worked well enough to make recovery of all cores in reasonable time. Due to the comment on the other thread pool size, I'd like some comments on whether it's OK to do this for the {{recoveryExecutor}} though? It's configured in solr.xml with e.g. {noformat} ${solr.recovery.threads:4} {noformat} > Allow configuration for recoveryExecutor thread pool size > - > > Key: SOLR-9936 > URL: https://issues.apache.org/jira/browse/SOLR-9936 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 6.3 >Reporter: Tim Owen > Attachments: SOLR-9936.patch > > > There are two executor services in {{UpdateShardHandler}}, the > {{updateExecutor}} whose size is unbounded for reasons explained in the code > comments. There is also the {{recoveryExecutor}} which was added later, and > is the one that executes the {{RecoveryStrategy}} code to actually fetch > index files and store to disk, eventually calling an {{fsync}} thread to > ensure the data is written. > We found that with a fast network such as 10GbE it's very easy to overload > the local disk storage when doing a restart of Solr instances after some > downtime, if they have many cores to load. Typically we have each physical > server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir > on a dedicated SSD. With 100+ cores (shard replicas) on each instance, > startup can really hammer the SSD as it's writing in parallel from as many > cores as Solr is recovering. This made recovery time bad enough that replicas > were down for a long time, and
[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size
[ https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9936: --- Attachment: SOLR-9936.patch > Allow configuration for recoveryExecutor thread pool size > - > > Key: SOLR-9936 > URL: https://issues.apache.org/jira/browse/SOLR-9936 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 6.3 >Reporter: Tim Owen > Attachments: SOLR-9936.patch > > > There are two executor services in {{UpdateShardHandler}}, the > {{updateExecutor}} whose size is unbounded for reasons explained in the code > comments. There is also the {{recoveryExecutor}} which was added later, and > is the one that executes the {{RecoveryStrategy}} code to actually fetch > index files and store to disk, eventually calling an {{fsync}} thread to > ensure the data is written. > We found that with a fast network such as 10GbE it's very easy to overload > the local disk storage when doing a restart of Solr instances after some > downtime, if they have many cores to load. Typically we have each physical > server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir > on a dedicated SSD. With 100+ cores (shard replicas) on each instance, > startup can really hammer the SSD as it's writing in parallel from as many > cores as Solr is recovering. This made recovery time bad enough that replicas > were down for a long time, and even shards marked as down if none of its > replicas have recovered (usually when many machines have been restarted). > This patch allowed us to throttle how much parallelism there would be writing > to a disk - in practice we're using a pool size of 4 threads, to prevent the > SSD getting overloaded, and that worked well enough to make recovery of all > cores in reasonable time. > Due to the comment on the other thread pool size, I'd like some comments on > whether it's OK to do this for the {{recoveryExecutor}} though? > It's configured in solr.xml with e.g. > {noformat} > > ${solr.recovery.threads:4} > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size
Tim Owen created SOLR-9936: -- Summary: Allow configuration for recoveryExecutor thread pool size Key: SOLR-9936 URL: https://issues.apache.org/jira/browse/SOLR-9936 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: replication (java) Affects Versions: 6.3 Reporter: Tim Owen There are two executor services in {{UpdateShardHandler}}, the {{updateExecutor}} whose size is unbounded for reasons explained in the code comments. There is also the {{recoveryExecutor}} which was added later, and is the one that executes the {{RecoveryStrategy}} code to actually fetch index files and store to disk, eventually calling an {{fsync}} thread to ensure the data is written. We found that with a fast network such as 10GbE it's very easy to overload the local disk storage when doing a restart of Solr instances after some downtime, if they have many cores to load. Typically we have each physical server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can really hammer the SSD as it's writing in parallel from as many cores as Solr is recovering. This made recovery time bad enough that replicas were down for a long time, and even shards marked as down if none of its replicas have recovered (usually when many machines have been restarted). This patch allowed us to throttle how much parallelism there would be writing to a disk - in practice we're using a pool size of 4 threads, to prevent the SSD getting overloaded, and that worked well enough to make recovery of all cores in reasonable time. Due to the comment on the other thread pool size, I'd like some comments on whether it's OK to do this for the {{recoveryExecutor}} though? It's configured in solr.xml with e.g. {noformat} ${solr.recovery.threads:4} {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804336#comment-15804336 ] Tim Owen commented on SOLR-9918: OK I see what you mean, I can explain our use-case if that helps to understand why we developed this processor, and when it might prove useful. We have a Kafka queue of messages, which are a mixture of Create, Update and Delete operations, and these are consumed and fed into two different storage systems - Solr and a RDBMS. We want the behaviour to be consistent, so that the two systems are in sync, and the way the Database storage app works is that Create operations are implemented as effectively {{INSERT IF NOT EXISTS ...}} and Update operations are the typical SQL {{UPDATE .. WHERE id = ..}} that quietly do nothing if there is no row for {{id}}. So we want the Solr storage to behave in the same way. There can occasionally be duplicate messages that Create the same {{id}} due to the hundreds of instances of the app that adds messages to Kafka, and small race conditions that mean two or more of them will do some duplicate work. We chose to accept this situation and de-dupe downstream by having both storage apps behave as above. Another scenario is that, since we have the Kafka queue as a buffer, if there's any problems downstream we can always stop the storage apps, restore last night's backup, rewind the Kafka consumer offset (slightly beyond the backup point) and then replay. In this situation we don't want a lot of index churn for the overlap Create messages. With updates, the apps which add Update messages only have best-effort knowledge of which document/row {{id}}s are relevant to the field/column being changed by the update message. So we quite commonly have messages that are optimistic updates, for a document that doesn't in fact exist (now). The database storage handles this quietly, so we wanted the same behaviour in Solr. Initially what happened in Solr was we'd get newly-created documents containing only the fields changed in the AtomicUpdate, so we added a required field to avoid that happening, which works but is noisy as we get a Solr exception each time (and then batch updates are messy because we have to split and retry). I looked at {{DocBasedVersionConstraintsProcessor}} but we don't have explicitly-managed versioning for our documents in Solr. Then I looked at {{SignatureUpdateProcessor}} but that does churn the index and overwrites documents, which we didn't want. Also considered {{TolerantUpdateProcessor}} but that isn't really solving the issue for inserts, it just would make some update batches less noisy. I'd say this processor is useful in situations where you have documents that don't have any concept of multiple versions that can be assigned by the app, and don't have any kind of fuzzy-ness about similar documents i.e. each document has a strong identity, akin to what a Database unique key is. > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from
[jira] [Updated] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules
[ https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9503: --- Attachment: SOLR-9503.patch I went through the tests and found that if I added another rule to the existing test for the overseer-role, it would fail as expected with the previous code. That test now passes with the fix, so I've updated my patch with that test change. > NPE in Replica Placement Rules when using Overseer Role with other rules > > > Key: SOLR-9503 > URL: https://issues.apache.org/jira/browse/SOLR-9503 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Rules, SolrCloud >Affects Versions: 6.2, master (7.0) >Reporter: Tim Owen >Assignee: Noble Paul > Attachments: SOLR-9503.patch, SOLR-9503.patch > > > The overseer role introduced in SOLR-9251 works well if there's only a single > Rule for replica placement e.g. {code}rule=role:!overseer{code} but when > combined with another rule, e.g. > {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result > in a NullPointerException (in Rule.tryAssignNodeToShard) > This happens because the code builds up a nodeVsTags map, but it only has > entries for nodes that have values for *all* tags used among the rules. This > means not enough information is available to other rules when they are being > checked during replica assignment. In the example rules above, if we have a > cluster of 12 nodes and only 3 are given the Overseer role, the others do not > have any entry in the nodeVsTags map because they only have the host tag > value and not the role tag value. > Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only > keeping entries that fulfil the constraint of having values for all tags used > in the rules. Possibly this constraint was suitable when rules were > originally introduced, but the Role tag (used for Overseers) is unlikely to > be present for all nodes in the cluster, and similarly for sysprop tags which > may or not be set for a node. > My patch removes this constraint, so the nodeVsTags map contains everything > known about all nodes, even if they have no value for a given tag. This > allows the rule combination above to work, and doesn't appear to cause any > problems with the code paths that use the nodeVsTags map. They handle null > values quite well, and the tests pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798916#comment-15798916 ] Tim Owen commented on SOLR-9918: Fair points Koji - I have updated the patch with a bit more documentation. I've also added the example configuration in the Javadoc comment. Probably the [Confluence page|https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors#UpdateRequestProcessors-UpdateRequestProcessorFactories] is the best place to put that kind of guideline notes on which processors to choose for different situations. In the particular case of the SignatureUpdateProcessor, that class will cause the new document to overwrite/replace any existing document, not skip it, which is why I didn't use it for our use-case. > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9918: --- Attachment: SOLR-9918.patch > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules
[ https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798416#comment-15798416 ] Tim Owen commented on SOLR-9503: Is anyone able to take a look at this fix - maybe [~noble.paul]? I hope the assumptions I've made in the diff are correct. We've been using it in production for a few months, in our custom build of Solr. Would be nice to roll it in upstream. > NPE in Replica Placement Rules when using Overseer Role with other rules > > > Key: SOLR-9503 > URL: https://issues.apache.org/jira/browse/SOLR-9503 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Rules, SolrCloud >Affects Versions: 6.2, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9503.patch > > > The overseer role introduced in SOLR-9251 works well if there's only a single > Rule for replica placement e.g. {code}rule=role:!overseer{code} but when > combined with another rule, e.g. > {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result > in a NullPointerException (in Rule.tryAssignNodeToShard) > This happens because the code builds up a nodeVsTags map, but it only has > entries for nodes that have values for *all* tags used among the rules. This > means not enough information is available to other rules when they are being > checked during replica assignment. In the example rules above, if we have a > cluster of 12 nodes and only 3 are given the Overseer role, the others do not > have any entry in the nodeVsTags map because they only have the host tag > value and not the role tag value. > Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only > keeping entries that fulfil the constraint of having values for all tags used > in the rules. Possibly this constraint was suitable when rules were > originally introduced, but the Role tag (used for Overseers) is unlikely to > be present for all nodes in the cluster, and similarly for sysprop tags which > may or not be set for a node. > My patch removes this constraint, so the nodeVsTags map contains everything > known about all nodes, even if they have no value for a given tag. This > allows the rule combination above to work, and doesn't appear to cause any > problems with the code paths that use the nodeVsTags map. They handle null > values quite well, and the tests pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9918: --- Attachment: SOLR-9918.patch > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
Tim Owen created SOLR-9918: -- Summary: An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs Key: SOLR-9918 URL: https://issues.apache.org/jira/browse/SOLR-9918 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: update Reporter: Tim Owen This is an UpdateRequestProcessor and Factory that we have been using in production, to handle 2 common cases that were awkward to achieve using the existing update pipeline and current processor classes: * When inserting document(s), if some already exist then quietly skip the new document inserts - do not churn the index by replacing the existing documents and do not throw a noisy exception that breaks the batch of inserts. By analogy with SQL, {{insert if not exists}}. In our use-case, multiple application instances can (rarely) process the same input so it's easier for us to de-dupe these at Solr insert time than to funnel them into a global ordered queue first. * When applying AtomicUpdate documents, if a document being updated does not exist, quietly do nothing - do not create a new partially-populated document and do not throw a noisy exception about missing required fields. By analogy with SQL, {{update where id = ..}}. Our use-case relies on this because we apply updates optimistically and have best-effort knowledge about what documents will exist, so it's easiest to skip the updates (in the same way a Database would). I would have kept this in our own package hierarchy but it relies on some package-scoped methods, and seems like it could be useful to others if they choose to configure it. Some bits of the code were borrowed from {{DocBasedVersionConstraintsProcessorFactory}}. Attached patch has unit tests to confirm the behaviour. This class can be used by configuring solrconfig.xml like so.. {noformat} true false {noformat} and initParams defaults of {noformat} skipexisting {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9915) PeerSync alreadyInSync check is not backwards compatible
[ https://issues.apache.org/jira/browse/SOLR-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9915: --- Attachment: SOLR-9915.patch > PeerSync alreadyInSync check is not backwards compatible > > > Key: SOLR-9915 > URL: https://issues.apache.org/jira/browse/SOLR-9915 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 6.3 >Reporter: Tim Owen > Attachments: SOLR-9915.patch > > > The fingerprint check added to PeerSync in SOLR-9446 works fine when all > servers are running 6.3 but this means it's hard to do a rolling upgrade from > e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to > get a fingerprint and then gets a NPE because the older server doesn't return > the expected field in its response. > This leads to the PeerSync completely failing, and results in a full index > replication from scratch, copying all index files over the network. We > noticed this happening when we tried to do a rolling upgrade on one of our > 6.2.1 clusters to 6.3. Unfortunately this amount of replication was hammering > our disks and network, so we had to do a full shutdown, upgrade all to 6.3 > and restart, which was not ideal for a production cluster. > The attached patch should behave more gracefully in this situation, as it > will typically return false for alreadyInSync() and then carry on doing the > normal re-sync based on versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9915) PeerSync alreadyInSync check is not backwards compatible
Tim Owen created SOLR-9915: -- Summary: PeerSync alreadyInSync check is not backwards compatible Key: SOLR-9915 URL: https://issues.apache.org/jira/browse/SOLR-9915 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: replication (java) Affects Versions: 6.3 Reporter: Tim Owen The fingerprint check added to PeerSync in SOLR-9446 works fine when all servers are running 6.3 but this means it's hard to do a rolling upgrade from e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to get a fingerprint and then gets a NPE because the older server doesn't return the expected field in its response. This leads to the PeerSync completely failing, and results in a full index replication from scratch, copying all index files over the network. We noticed this happening when we tried to do a rolling upgrade on one of our 6.2.1 clusters to 6.3. Unfortunately this amount of replication was hammering our disks and network, so we had to do a full shutdown, upgrade all to 6.3 and restart, which was not ideal for a production cluster. The attached patch should behave more gracefully in this situation, as it will typically return false for alreadyInSync() and then carry on doing the normal re-sync based on versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8793) Fix stale commit files' size computation in LukeRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701538#comment-15701538 ] Tim Owen commented on SOLR-8793: We get this using Solr 6.3.0 because it's still logged at WARN level, which seems a bit alarmist to me. For indexes that are changing rapidly, it happens a lot. We're going to increase our logging threshold for that class to ERROR, because these messages are just filling up the logs and there's no action we can actually take to prevent it, because they're expected to happen sometimes. Personally I would make this message INFO level. > Fix stale commit files' size computation in LukeRequestHandler > -- > > Key: SOLR-8793 > URL: https://issues.apache.org/jira/browse/SOLR-8793 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.5 >Reporter: Shai Erera >Assignee: Shai Erera >Priority: Minor > Fix For: 5.5.1, 6.0 > > Attachments: SOLR-8793.patch > > > SOLR-8587 added segments file information and its size to core admin status > API. However in case of stale commits, calling that API may result on > {{FileNotFoundException}} or {{NoSuchFileException}}, if the segments file no > longer exists due to a new commit. We should fix that by returning a proper > value for the file's length in this case, maybe -1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9490) BoolField always returning false for non-DV fields when javabin involved (via solrj, or intra node communication)
[ https://issues.apache.org/jira/browse/SOLR-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644382#comment-15644382 ] Tim Owen commented on SOLR-9490: Just to add to this, if anyone was using 6.2.0 and doing document updates, this bug affected Atomic Updates and will have reset all boolean fields in the document to false when updating other fields of the document i.e. the actually-stored and indexed values are changed. We discovered this just recently and noticed some documents had lost their original boolean value, because we had been doing Atomic updates during the period we were running 6.2.0 and that had reset the values in the document itself. Even though we've now upgraded to 6.2.1 so the displayed values are shown correctly, the stored values have now been changed. > BoolField always returning false for non-DV fields when javabin involved (via > solrj, or intra node communication) > - > > Key: SOLR-9490 > URL: https://issues.apache.org/jira/browse/SOLR-9490 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.2 >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Critical > Fix For: 6.2.1, 6.3, master (7.0) > > Attachments: SOLR-9490.patch, SOLR-9490.patch, Solr9490.java > > > 2 diff users posted comments in SOLR-9187 indicating that changes introduced > in that issue have broken BoolFields that do *not* use DocValues... > [~cjcowie]... > {quote} > Hi, I've just picked up 6.2.0. It seems that the change to toExternal() in > BoolField now means that booleans without DocValues return null, which then > turns into Boolean.FALSE in toObject() regardless of whether the value is > true or false. > e.g. with this schema, facet counts are correct, the returned values are > wrong. > {code} > required="false" multiValued="false"/> > > {code} > {code} > "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ > { > "id":"124", > "f_EVE64":false, > "_version_":1544828487600177152}, > { > "id":"123", > "f_EVE64":false, > "_version_":1544828492458229760}] > }, > "facet_counts":{ > "facet_queries":{}, > "facet_fields":{ > "f_EVE64":[ > "false",1, > "true",1]}, > {code} > Could toExternal() perhaps fallback to how it originally behaved? e.g. > {code} > if (f.binaryValue() == null) { > return indexedToReadable(f.stringValue()); > } > {code} > {quote} > [~pavan_shetty]... > {quote} > I downloaded solr version 6.2.0 (6.2.0 > 764d0f19151dbff6f5fcd9fc4b2682cf934590c5 - mike - 2016-08-20 05:41:37) and > installed my core. > In my schema.xml i have an field like following : > multiValued="false"/> > Now i am accessing this field using SolrJ (6.1.0). But i am always getting > false value for above field even though it contains true boolean value. This > is happening for all boolean fields. > http://localhost:8983/solr...wt=javabin=2 HTTP/1.1 > It is working fine in other response writer. > If i change the solr version to 6.1.0, with same SolrJ, it starts working. So > clearly this is a bug in version 6.2.0. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585826#comment-15585826 ] Tim Owen commented on SOLR-5750: [~dsmiley] you mentioned in the mailing list back in March that you'd fixed the situation where restored collections are created using the old stateFormat=1 but it still seems to be doing that ... did that fix not make it into this ticket before merging? We've been trying out the backup/restore and noticed it's putting the collection's state into the global clusterstate.json instead of where it should be. > Backup/Restore API for SolrCloud > > > Key: SOLR-5750 > URL: https://issues.apache.org/jira/browse/SOLR-5750 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Varun Thacker > Fix For: 6.1 > > Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, > SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch > > > We should have an easy way to do backups and restores in SolrCloud. The > ReplicationHandler supports a backup command which can create snapshots of > the index but that is too little. > The command should be able to backup: > # Snapshots of all indexes or indexes from the leader or the shards > # Config set > # Cluster state > # Cluster properties > # Aliases > # Overseer work queue? > A restore should be able to completely restore the cloud i.e. no manual steps > required other than bringing nodes back up or setting up a new cloud cluster. > SOLR-5340 will be a part of this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585823#comment-15585823 ] Tim Owen commented on SOLR-5750: [~dsmiley] you mentioned in the mailing list back in March that you'd fixed the situation where restored collections are created using the old stateFormat=1 but it still seems to be doing that ... did that fix not make it into this ticket before merging? We've been trying out the backup/restore and noticed it's putting the collection's state into the global clusterstate.json instead of where it should be. > Backup/Restore API for SolrCloud > > > Key: SOLR-5750 > URL: https://issues.apache.org/jira/browse/SOLR-5750 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Varun Thacker > Fix For: 6.1 > > Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, > SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch > > > We should have an easy way to do backups and restores in SolrCloud. The > ReplicationHandler supports a backup command which can create snapshots of > the index but that is too little. > The command should be able to backup: > # Snapshots of all indexes or indexes from the leader or the shards > # Config set > # Cluster state > # Cluster properties > # Aliases > # Overseer work queue? > A restore should be able to completely restore the cloud i.e. no manual steps > required other than bringing nodes back up or setting up a new cloud cluster. > SOLR-5340 will be a part of this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9505) Extra tests to confirm Atomic Update remove behaviour
[ https://issues.apache.org/jira/browse/SOLR-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9505: --- Attachment: SOLR-9505.patch > Extra tests to confirm Atomic Update remove behaviour > - > > Key: SOLR-9505 > URL: https://issues.apache.org/jira/browse/SOLR-9505 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Tim Owen >Priority: Minor > Attachments: SOLR-9505.patch > > > The behaviour of the Atomic Update {{remove}} operation in the code doesn't > match the description in the Confluence documentation, which has been > questioned already. From looking at the source code, and using curl to > confirm, the {{remove}} operation only removes the first occurrence of a > value from a multi-valued field, it does not remove all occurrences. The > {{removeregex}} operation does remove all, however. > There are unit tests for Atomic Updates, but they didn't assert this > behaviour, so I've added some extra assertions to confirm that, and a couple > of extra tests including one that checks that {{removeregex}} does a Regex > match of the whole value, not just a find-anywhere operation. > I think it's the documentation that needs clarifying - the code behaves as > expected (assuming {{remove}} was intended to work that way?) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9505) Extra tests to confirm Atomic Update remove behaviour
Tim Owen created SOLR-9505: -- Summary: Extra tests to confirm Atomic Update remove behaviour Key: SOLR-9505 URL: https://issues.apache.org/jira/browse/SOLR-9505 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (7.0) Reporter: Tim Owen Priority: Minor The behaviour of the Atomic Update {{remove}} operation in the code doesn't match the description in the Confluence documentation, which has been questioned already. From looking at the source code, and using curl to confirm, the {{remove}} operation only removes the first occurrence of a value from a multi-valued field, it does not remove all occurrences. The {{removeregex}} operation does remove all, however. There are unit tests for Atomic Updates, but they didn't assert this behaviour, so I've added some extra assertions to confirm that, and a couple of extra tests including one that checks that {{removeregex}} does a Regex match of the whole value, not just a find-anywhere operation. I think it's the documentation that needs clarifying - the code behaves as expected (assuming {{remove}} was intended to work that way?) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules
[ https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484252#comment-15484252 ] Tim Owen commented on SOLR-9503: As an aside, I noticed that `Rule.Operand.GREATER_THAN` seems to be missing an override for `public int compare(Object n1Val, Object n2Val)` .. but compare only appears to be used when sorting the live nodes, so maybe it's not a big deal? > NPE in Replica Placement Rules when using Overseer Role with other rules > > > Key: SOLR-9503 > URL: https://issues.apache.org/jira/browse/SOLR-9503 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Rules, SolrCloud >Affects Versions: 6.2, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9503.patch > > > The overseer role introduced in SOLR-9251 works well if there's only a single > Rule for replica placement e.g. {code}rule=role:!overseer{code} but when > combined with another rule, e.g. > {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result > in a NullPointerException (in Rule.tryAssignNodeToShard) > This happens because the code builds up a nodeVsTags map, but it only has > entries for nodes that have values for *all* tags used among the rules. This > means not enough information is available to other rules when they are being > checked during replica assignment. In the example rules above, if we have a > cluster of 12 nodes and only 3 are given the Overseer role, the others do not > have any entry in the nodeVsTags map because they only have the host tag > value and not the role tag value. > Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only > keeping entries that fulfil the constraint of having values for all tags used > in the rules. Possibly this constraint was suitable when rules were > originally introduced, but the Role tag (used for Overseers) is unlikely to > be present for all nodes in the cluster, and similarly for sysprop tags which > may or not be set for a node. > My patch removes this constraint, so the nodeVsTags map contains everything > known about all nodes, even if they have no value for a given tag. This > allows the rule combination above to work, and doesn't appear to cause any > problems with the code paths that use the nodeVsTags map. They handle null > values quite well, and the tests pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules
[ https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9503: --- Attachment: SOLR-9503.patch > NPE in Replica Placement Rules when using Overseer Role with other rules > > > Key: SOLR-9503 > URL: https://issues.apache.org/jira/browse/SOLR-9503 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Rules, SolrCloud >Affects Versions: 6.2, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9503.patch > > > The overseer role introduced in SOLR-9251 works well if there's only a single > Rule for replica placement e.g. {code}rule=role:!overseer{code} but when > combined with another rule, e.g. > {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result > in a NullPointerException (in Rule.tryAssignNodeToShard) > This happens because the code builds up a nodeVsTags map, but it only has > entries for nodes that have values for *all* tags used among the rules. This > means not enough information is available to other rules when they are being > checked during replica assignment. In the example rules above, if we have a > cluster of 12 nodes and only 3 are given the Overseer role, the others do not > have any entry in the nodeVsTags map because they only have the host tag > value and not the role tag value. > Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only > keeping entries that fulfil the constraint of having values for all tags used > in the rules. Possibly this constraint was suitable when rules were > originally introduced, but the Role tag (used for Overseers) is unlikely to > be present for all nodes in the cluster, and similarly for sysprop tags which > may or not be set for a node. > My patch removes this constraint, so the nodeVsTags map contains everything > known about all nodes, even if they have no value for a given tag. This > allows the rule combination above to work, and doesn't appear to cause any > problems with the code paths that use the nodeVsTags map. They handle null > values quite well, and the tests pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules
Tim Owen created SOLR-9503: -- Summary: NPE in Replica Placement Rules when using Overseer Role with other rules Key: SOLR-9503 URL: https://issues.apache.org/jira/browse/SOLR-9503 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Rules, SolrCloud Affects Versions: 6.2, master (7.0) Reporter: Tim Owen The overseer role introduced in SOLR-9251 works well if there's only a single Rule for replica placement e.g. {code}rule=role:!overseer{code} but when combined with another rule, e.g. {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result in a NullPointerException (in Rule.tryAssignNodeToShard) This happens because the code builds up a nodeVsTags map, but it only has entries for nodes that have values for *all* tags used among the rules. This means not enough information is available to other rules when they are being checked during replica assignment. In the example rules above, if we have a cluster of 12 nodes and only 3 are given the Overseer role, the others do not have any entry in the nodeVsTags map because they only have the host tag value and not the role tag value. Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only keeping entries that fulfil the constraint of having values for all tags used in the rules. Possibly this constraint was suitable when rules were originally introduced, but the Role tag (used for Overseers) is unlikely to be present for all nodes in the cluster, and similarly for sysprop tags which may or not be set for a node. My patch removes this constraint, so the nodeVsTags map contains everything known about all nodes, even if they have no value for a given tag. This allows the rule combination above to work, and doesn't appear to cause any problems with the code paths that use the nodeVsTags map. They handle null values quite well, and the tests pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456179#comment-15456179 ] Tim Owen commented on SOLR-9389: Thanks for the advice David, I'll take a look at the concurrency setting, we'll need to test out using fewer shards and see how that compares for our use-case. Since we create new collections weekly, we always have the option to increase the shard count later if we do hit situations of large merges happening. Although I'm a bit surprised that this model is considered 'truly massive' .. I'd have expected many large Solr installations will have thousands of shards across all their collections. > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Mark Miller > Fix For: master (7.0), 6.3 > > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure tlogs were closed for > writes once finished. > The patch introduces an extra lifecycle state for the tlog, so it can be > closed for writes and free up the HDFS resources, while still being available > for reading. I've tried to make it as unobtrusive as I could, but there's > probably a better way. I have not changed the behaviour of the local disk > tlog implementation, because it only consumes a file descriptor regardless of > read or write. > nb We have decided not to use Solr-on-HDFS now, we're using local disk (for > various reasons). So I don't have a HDFS cluster to do further testing on > this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9381) Snitch for freedisk uses root path not Solr home
[ https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9381: --- Attachment: SOLR-9381.patch > Snitch for freedisk uses root path not Solr home > > > Key: SOLR-9381 > URL: https://issues.apache.org/jira/browse/SOLR-9381 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Noble Paul > Attachments: SOLR-9381.patch, SOLR-9381.patch > > > The path used for the freedisk snitch value is hardcoded to / whereas it > should be using Solr home. It's fairly common to use hardware for Solr with > multiple physical disks on different mount points, with multiple Solr > instances running on the box, each pointing its Solr home to a different > disk. In this case, the value reported for the freedisk snitch value is > wrong, because it's based on the root filesystem space. > Patch changes this to use solr home from the CoreContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452531#comment-15452531 ] Tim Owen commented on SOLR-9389: We're using Solr 6.1 (on local disk now, as mentioned). The first production cluster we had hoped to get stable was 40 boxes, each running 5 or 6 Solr JVMs, with a dedicated ZK cluster on 3 other boxes, and 100 shards per collection. That was problematic, we had a lot of Zookeeper traffic during normal writes, but especially whenever one or more boxes were deliberately killed as many Solr instances restarted all at once, leading to a large overseer queue and shards in recovery for a long time. Right now we're testing two scaled-down clusters: 24 boxes, and 12 boxes, with correspondingly reduced number of shards, to see at what point it can be stable when we do destructive testing by killing machines and whole racks, to see how it copes. 12 boxes is looking a lot more stable so far. We'll have to consider running multiple of these smaller clusters instead of 1 large one - is that best practice? There was some discussion on SOLR-5872 and SOLR-5475 about scaling the overseer with large numbers of collections and shards, although it's clearly a tricky problem. > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Mark Miller > Fix For: master (7.0), 6.3 > > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure tlogs were closed for > writes once finished. > The patch introduces an extra lifecycle state for the tlog, so it can be > closed for writes and free up the HDFS resources, while still being available > for reading. I've tried to make it as unobtrusive as I could, but there's > probably a better way. I have not changed the behaviour of the local disk > tlog implementation, because it only consumes a file descriptor regardless of > read or write. > nb We have decided not to use Solr-on-HDFS now, we're using local disk (for > various reasons). So I don't have a HDFS cluster to do further testing on > this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9381) Snitch for freedisk uses root path not Solr home
[ https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452260#comment-15452260 ] Tim Owen commented on SOLR-9381: Thanks, I'll make that change soon and replace the patch. > Snitch for freedisk uses root path not Solr home > > > Key: SOLR-9381 > URL: https://issues.apache.org/jira/browse/SOLR-9381 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Noble Paul > Attachments: SOLR-9381.patch > > > The path used for the freedisk snitch value is hardcoded to / whereas it > should be using Solr home. It's fairly common to use hardware for Solr with > multiple physical disks on different mount points, with multiple Solr > instances running on the box, each pointing its Solr home to a different > disk. In this case, the value reported for the freedisk snitch value is > wrong, because it's based on the root filesystem space. > Patch changes this to use solr home from the CoreContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9381) Snitch for freedisk uses root path not Solr home
[ https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451652#comment-15451652 ] Tim Owen commented on SOLR-9381: This code was last changed by [~andyetitmoves] and [~noble.paul] .. do you think this fix is appropriate in all cases? > Snitch for freedisk uses root path not Solr home > > > Key: SOLR-9381 > URL: https://issues.apache.org/jira/browse/SOLR-9381 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9381.patch > > > The path used for the freedisk snitch value is hardcoded to / whereas it > should be using Solr home. It's fairly common to use hardware for Solr with > multiple physical disks on different mount points, with multiple Solr > instances running on the box, each pointing its Solr home to a different > disk. In this case, the value reported for the freedisk snitch value is > wrong, because it's based on the root filesystem space. > Patch changes this to use solr home from the CoreContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451626#comment-15451626 ] Tim Owen commented on SOLR-9389: [~dsmiley] We're now running 6 Solr JVMs per box, as the machines in production have 6 SSDs installed, so yeah it works out at around 200 Solr cores being served by each Solr JVM. That seems to run fine, and we've had our staging environment for another Solr installation with hundreds of cores per JVM for several years. The reason for many shards is that we do frequent updates and deletes, and want to keep the Lucene index size below a manageable level e.g. 5GB, to avoid a potentially slow merge that would block writes for too long. With composite routing, our queries never touch all shards in a collection - just a few. The problem still is with SolrCloud and the Overseer/Zookeeper, which become overloaded with traffic once there's any kind of problem e.g. machine failure, or worse an entire rack losing power - this causes a flood of overseer queue events and all the nodes feverishly downloading state.json repeatedly. Happy to talk to anyone who's working on that problem! > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Mark Miller > Fix For: master (7.0), 6.3 > > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure tlogs were closed for > writes once finished. > The patch introduces an extra lifecycle state for the tlog, so it can be > closed for writes and free up the HDFS resources, while still being available > for reading. I've tried to make it as unobtrusive as I could, but there's > probably a better way. I have not changed the behaviour of the local disk > tlog implementation, because it only consumes a file descriptor regardless of > read or write. > nb We have decided not to use Solr-on-HDFS now, we're using local disk (for > various reasons). So I don't have a HDFS cluster to do further testing on > this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache
[ https://issues.apache.org/jira/browse/SOLR-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448437#comment-15448437 ] Tim Owen commented on SOLR-9374: No problem, thanks for merging! > Speed up Jmx MBean retrieval for FieldCache > --- > > Key: SOLR-9374 > URL: https://issues.apache.org/jira/browse/SOLR-9374 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: JMX, web gui >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: master (7.0), 6.3 > > Attachments: SOLR-9374.patch > > > The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip > displaying the full contents of FieldCache entries, and just return the count. > However, it still computes all the field cache entry info but throws it away > and uses only the number of entries. This can make the Jmx MBean retrieval > quite slow which is not ideal for regular polling for monitoring purposes. > We've typically found the Jmx call took over 1 minute to complete, and jstack > output showed that building the stats for this bean was the culprit. > With this patch, the time is much reduced, usually less than 10 seconds. The > response contents are unchanged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448406#comment-15448406 ] Tim Owen commented on SOLR-9389: Great, thanks for reviewing and testing this Mark :) > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Assignee: Mark Miller > Fix For: master (7.0), 6.3 > > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure tlogs were closed for > writes once finished. > The patch introduces an extra lifecycle state for the tlog, so it can be > closed for writes and free up the HDFS resources, while still being available > for reading. I've tried to make it as unobtrusive as I could, but there's > probably a better way. I have not changed the behaviour of the local disk > tlog implementation, because it only consumes a file descriptor regardless of > read or write. > nb We have decided not to use Solr-on-HDFS now, we're using local disk (for > various reasons). So I don't have a HDFS cluster to do further testing on > this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9389: --- Description: The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open for its whole lifetime, which consumes two threads on the HDFS data node server (dataXceiver and packetresponder) even once the Solr tlog has finished being written to. This means for a cluster with many indexes on HDFS, the number of Xceivers can keep growing and eventually hit the limit of 4096 on the data nodes. It's especially likely for indexes that have low write rates, because Solr keeps enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). There's also the issue that attempting to write to a finished tlog would be a major bug, so closing it for writes helps catch that. Our cluster during testing had 100+ collections with 100 shards each, spread across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x replication for the tlog files, this meant we hit the xceiver limit fairly easily and had to use the attached patch to ensure tlogs were closed for writes once finished. The patch introduces an extra lifecycle state for the tlog, so it can be closed for writes and free up the HDFS resources, while still being available for reading. I've tried to make it as unobtrusive as I could, but there's probably a better way. I have not changed the behaviour of the local disk tlog implementation, because it only consumes a file descriptor regardless of read or write. nb We have decided not to use Solr-on-HDFS now, we're using local disk (for various reasons). So I don't have a HDFS cluster to do further testing on this, I'm just contributing the patch which worked for us. was: The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open for its whole lifetime, which consumes two threads on the HDFS data node server (dataXceiver and packetresponder) even once the Solr tlog has finished being written to. This means for a cluster with many indexes on HDFS, the number of Xceivers can keep growing and eventually hit the limit of 4096 on the data nodes. It's especially likely for indexes that have low write rates, because Solr keeps enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). There's also the issue that attempting to write to a finished tlog would be a major bug, so closing it for writes helps catch that. Our cluster during testing had 100+ collections with 100 shards each, spread across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x replication for the tlog files, this meant we hit the xceiver limit fairly easily and had to use the attached patch to ensure tlogs were closed for writes once finished. The patch introduces an extra lifecycle state for the tlog, so it can be closed for writes and free up the HDFS resources, while still being available for reading. I've tried to make it as unobtrusive as I could, but there's probably a better way. I have not changed the behaviour of the local disk tlog implementation, because it only consumes a file descriptor regardless of read or write. nb We have decided not to use Solr-on-HDFS now, we're using local disk (for various reasons). So I don't have a HDFS cluster to do further testing on this, I'm just contributing the patch which worked for us. > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure
[jira] [Updated] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
[ https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9389: --- Attachment: SOLR-9389.patch > HDFS Transaction logs stay open for writes which leaks Xceivers > --- > > Key: SOLR-9389 > URL: https://issues.apache.org/jira/browse/SOLR-9389 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Hadoop Integration, hdfs >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9389.patch > > > The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open > for its whole lifetime, which consumes two threads on the HDFS data node > server (dataXceiver and packetresponder) even once the Solr tlog has finished > being written to. > This means for a cluster with many indexes on HDFS, the number of Xceivers > can keep growing and eventually hit the limit of 4096 on the data nodes. It's > especially likely for indexes that have low write rates, because Solr keeps > enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). > There's also the issue that attempting to write to a finished tlog would be a > major bug, so closing it for writes helps catch that. > Our cluster during testing had 100+ collections with 100 shards each, spread > across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x > replication for the tlog files, this meant we hit the xceiver limit fairly > easily and had to use the attached patch to ensure tlogs were closed for > writes once finished. > The patch introduces an extra lifecycle state for the tlog, so it can be > closed for writes and free up the HDFS resources, while still being available > for reading. I've tried to make it as unobtrusive as I could, but there's > probably a better way. I have not changed the behaviour of the local disk > tlog implementation, because it only consumes a file descriptor regardless of > read or write. > nb We have decided not to use Solr-on-HDFS now, we're using local disk (for > various reasons). So I don't have a HDFS cluster to do further testing on > this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers
Tim Owen created SOLR-9389: -- Summary: HDFS Transaction logs stay open for writes which leaks Xceivers Key: SOLR-9389 URL: https://issues.apache.org/jira/browse/SOLR-9389 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Hadoop Integration, hdfs Affects Versions: 6.1, master (7.0) Reporter: Tim Owen The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open for its whole lifetime, which consumes two threads on the HDFS data node server (dataXceiver and packetresponder) even once the Solr tlog has finished being written to. This means for a cluster with many indexes on HDFS, the number of Xceivers can keep growing and eventually hit the limit of 4096 on the data nodes. It's especially likely for indexes that have low write rates, because Solr keeps enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). There's also the issue that attempting to write to a finished tlog would be a major bug, so closing it for writes helps catch that. Our cluster during testing had 100+ collections with 100 shards each, spread across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x replication for the tlog files, this meant we hit the xceiver limit fairly easily and had to use the attached patch to ensure tlogs were closed for writes once finished. The patch introduces an extra lifecycle state for the tlog, so it can be closed for writes and free up the HDFS resources, while still being available for reading. I've tried to make it as unobtrusive as I could, but there's probably a better way. I have not changed the behaviour of the local disk tlog implementation, because it only consumes a file descriptor regardless of read or write. nb We have decided not to use Solr-on-HDFS now, we're using local disk (for various reasons). So I don't have a HDFS cluster to do further testing on this, I'm just contributing the patch which worked for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9381) Snitch for freedisk uses root path not Solr home
[ https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9381: --- Attachment: SOLR-9381.patch > Snitch for freedisk uses root path not Solr home > > > Key: SOLR-9381 > URL: https://issues.apache.org/jira/browse/SOLR-9381 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen > Attachments: SOLR-9381.patch > > > The path used for the freedisk snitch value is hardcoded to / whereas it > should be using Solr home. It's fairly common to use hardware for Solr with > multiple physical disks on different mount points, with multiple Solr > instances running on the box, each pointing its Solr home to a different > disk. In this case, the value reported for the freedisk snitch value is > wrong, because it's based on the root filesystem space. > Patch changes this to use solr home from the CoreContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9381) Snitch for freedisk uses root path not Solr home
Tim Owen created SOLR-9381: -- Summary: Snitch for freedisk uses root path not Solr home Key: SOLR-9381 URL: https://issues.apache.org/jira/browse/SOLR-9381 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 6.1, master (7.0) Reporter: Tim Owen The path used for the freedisk snitch value is hardcoded to / whereas it should be using Solr home. It's fairly common to use hardware for Solr with multiple physical disks on different mount points, with multiple Solr instances running on the box, each pointing its Solr home to a different disk. In this case, the value reported for the freedisk snitch value is wrong, because it's based on the root filesystem space. Patch changes this to use solr home from the CoreContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache
[ https://issues.apache.org/jira/browse/SOLR-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Owen updated SOLR-9374: --- Attachment: SOLR-9374.patch > Speed up Jmx MBean retrieval for FieldCache > --- > > Key: SOLR-9374 > URL: https://issues.apache.org/jira/browse/SOLR-9374 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: JMX, web gui >Affects Versions: 6.1, master (7.0) >Reporter: Tim Owen >Priority: Minor > Attachments: SOLR-9374.patch > > > The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip > displaying the full contents of FieldCache entries, and just return the count. > However, it still computes all the field cache entry info but throws it away > and uses only the number of entries. This can make the Jmx MBean retrieval > quite slow which is not ideal for regular polling for monitoring purposes. > We've typically found the Jmx call took over 1 minute to complete, and jstack > output showed that building the stats for this bean was the culprit. > With this patch, the time is much reduced, usually less than 10 seconds. The > response contents are unchanged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache
Tim Owen created SOLR-9374: -- Summary: Speed up Jmx MBean retrieval for FieldCache Key: SOLR-9374 URL: https://issues.apache.org/jira/browse/SOLR-9374 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: JMX, web gui Affects Versions: 6.1, master (7.0) Reporter: Tim Owen Priority: Minor The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip displaying the full contents of FieldCache entries, and just return the count. However, it still computes all the field cache entry info but throws it away and uses only the number of entries. This can make the Jmx MBean retrieval quite slow which is not ideal for regular polling for monitoring purposes. We've typically found the Jmx call took over 1 minute to complete, and jstack output showed that building the stats for this bean was the culprit. With this patch, the time is much reduced, usually less than 10 seconds. The response contents are unchanged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org