[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-09-09 Thread Tim Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925451#comment-16925451
 ] 

Tim Owen commented on SOLR-13240:
-

Great! Thanks for all your work on this Christine

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Assignee: Christine Poerschke
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> 

[jira] [Commented] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary

2019-08-06 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900810#comment-16900810
 ] 

Tim Owen commented on SOLR-13539:
-

Thanks Thomas, yes we're using 7.7.2 and having trouble, as we use 
AtomicUpdates heavily. We applied the patch from SOLR-13538 and then we applied 
your patch from your Github PR (although we excluded the unit tests from your 
patch) which fixes most thing (thank you). I have attached a further patch we 
had to do locally to make removeregex work (it looks like it was fixed for the 
single value case, but multiple values were still failing) perhaps you could 
add that further fix onto your larger change, or if not I can raise a separate 
ticket.

To be honest, this whole situation with the javabin change is getting 
confusing, with various partial fixes and it's not clear to me which fixes are 
on the 7.x branch. Right now, 7.7.2 standard is effectively broken. Thanks for 
your efforts to try and get this back stable.

> Atomic Update Multivalue remove does not work for field types UUID, Enums, 
> Bool  and Binary
> ---
>
> Key: SOLR-13539
> URL: https://issues.apache.org/jira/browse/SOLR-13539
> Project: Solr
>  Issue Type: Bug
>  Components: UpdateRequestProcessors
>Affects Versions: 7.7.2, 8.1, 8.1.1
>Reporter: Thomas Wöckinger
>Priority: Critical
> Attachments: SOLR-13539.patch
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When using JavaBinCodec the values of collections are of type 
> ByteArrayUtf8CharSequence, existing field values are Strings so the remove 
> Operation does not have any effect.
>  This is related to following field types: UUID, Enums, Bool and Binary



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary

2019-08-06 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-13539:

Attachment: SOLR-13539.patch

> Atomic Update Multivalue remove does not work for field types UUID, Enums, 
> Bool  and Binary
> ---
>
> Key: SOLR-13539
> URL: https://issues.apache.org/jira/browse/SOLR-13539
> Project: Solr
>  Issue Type: Bug
>  Components: UpdateRequestProcessors
>Affects Versions: 7.7.2, 8.1, 8.1.1
>Reporter: Thomas Wöckinger
>Priority: Critical
> Attachments: SOLR-13539.patch
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When using JavaBinCodec the values of collections are of type 
> ByteArrayUtf8CharSequence, existing field values are Strings so the remove 
> Operation does not have any effect.
>  This is related to following field types: UUID, Enums, Bool and Binary



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13539) Atomic Update Multivalue remove does not work for field types UUID, Enums, Bool and Binary

2019-08-05 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900197#comment-16900197
 ] 

Tim Owen commented on SOLR-13539:
-

Not sure if it's a similar issue or not, but we're seeing this problem with the 
{{removeregex}} atomic update operation (class cast when it tries to turn the 
javabin values into a String, inside the doRemoveRegex method).

btw I previously raised a Jira with more tests for these operations, 
https://issues.apache.org/jira/browse/SOLR-9505 although I don't know if those 
would have caught the javabin problem.

> Atomic Update Multivalue remove does not work for field types UUID, Enums, 
> Bool  and Binary
> ---
>
> Key: SOLR-13539
> URL: https://issues.apache.org/jira/browse/SOLR-13539
> Project: Solr
>  Issue Type: Bug
>  Components: UpdateRequestProcessors
>Affects Versions: 7.7.2, 8.1, 8.1.1
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When using JavaBinCodec the values of collections are of type 
> ByteArrayUtf8CharSequence, existing field values are Strings so the remove 
> Operation does not have any effect.
>  This is related to following field types: UUID, Enums, Bool and Binary



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-23 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890751#comment-16890751
 ] 

Tim Owen commented on SOLR-9961:


Thanks.. yes indeed, we had to cherry pick the patch for that into our build. 
Finally everything is working!

 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-22 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890015#comment-16890015
 ] 

Tim Owen commented on SOLR-9961:


Thanks Mikhail, we're interested in your findings too, as we do backups to HDFS 
and to S3 (via S3A) and are currently profiling performance of backups and 
restores in particular.

 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-22 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889969#comment-16889969
 ] 

Tim Owen commented on SOLR-9961:


Just curious if you tried increasing the copy buffer size as per SOLR-13029 to 
speed up restores? It would be good to compare the performance of making that 
change, vs the extra complexity of parallelisation.

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-07-16 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886282#comment-16886282
 ] 

Tim Owen commented on SOLR-13240:
-

Yes it looks like the code fix has shown up other (autoscaling) tests that now 
fail, perhaps as you suggest they were relying on the previous sorting order.

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Priority: Major
> Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> 

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-07-09 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881143#comment-16881143
 ] 

Tim Owen commented on SOLR-13240:
-

Thanks for following up on this Christine.. updated patch looks good to me.

Disclaimer.. we've not tried this patch against the master branch, as we're 
using 7.x in production.

 

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Priority: Major
> Attachments: SOLR-13240.patch, SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> 

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-04-26 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827054#comment-16827054
 ] 

Tim Owen commented on SOLR-13240:
-

Good to hear the patch worked for you!

I'm not entirely sure in what circumstances it happens, or not, because you'd 
think it would have never worked and been a blocker for the previous release of 
this functionality. Clearly it must work sometimes. Maybe it depends how many 
different shards have replicas on a given node, i.e. when it's sorting a List 
of replicas from many different shards, there's likely to be more than 1 leader 
among those.

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Priority: Major
> Attachments: SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-04-26 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826964#comment-16826964
 ] 

Tim Owen commented on SOLR-13240:
-

The one jar you're changing is solr-solrj .. you should find the new jar that 
Ant built in this path..

/home/ubuntu/solrbuild/solr-7.5.0/solr/build/solr-solrj/

If you compare the jar file in that dir, with the one in your other two paths 
listed above, you'll see if it's been deployed. The /opt path is what Solr is 
actually running from, I would guess. So if your patched jar is in there too, 
you should get that fix after a restart.

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Priority: Major
> Attachments: SOLR-13240.patch
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-04-26 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826788#comment-16826788
 ] 

Tim Owen commented on SOLR-13240:
-

At first glance, I think you should have used {{ant jar}} instead of {{ant 
compile}} otherwise it may not have actually built the jar file from the code 
changes.

I don't know much about the install script, but what I tend to do for testing 
small fixes to the code is just use Ant to build the jar and then drop that new 
jar in place of the existing one, inside the installation - somewhere it will 
have unpacked the Solr distro war file, and the library jars are in e.g. 
/server/solr-webapp/webapp/WEB-INF/lib/ ... you can 
replace the jar in there, and restart Solr.

In this case, you're changing the solr-solrj package, and ant compile should 
have built the new jar into {{solr/build/solr-solrj/}}

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Priority: Major
> Attachments: SOLR-13240.patch
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> 

[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2019-01-25 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752109#comment-16752109
 ] 

Tim Owen commented on SOLR-13029:
-

Not sure - I can see someone might want parallelised file copies as well, so 
that ticket is still valid I think. It probably depends on how many collections 
you have to restore, if (like us) you have many collections to do, we just kick 
them off in parallel and let each one work through its files in series. But if 
you had 1 or 2 large collections it might be better done with the proposed 
change there.

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, 8.0
>Reporter: Tim Owen
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.0, 7.7, master (9.0)
>
> Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2019-01-25 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752062#comment-16752062
 ] 

Tim Owen commented on SOLR-13029:
-

Thanks Mikhail!

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, 8.0
>Reporter: Tim Owen
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.0, 7.7, master (9.0)
>
> Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2019-01-23 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749806#comment-16749806
 ] 

Tim Owen commented on SOLR-13029:
-

hah, I wasn't suggesting automating that.. just how I manually tested it.

I've attached a newer patch, containing some unit tests for the various 
situations

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, 8.0
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2019-01-23 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-13029:

Attachment: SOLR-13029.patch

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, 8.0
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2019-01-18 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746174#comment-16746174
 ] 

Tim Owen commented on SOLR-13029:
-

Sure - there's not a huge amount of code logic paths to test, but I can take a 
look. In practice, I used a heap dump to confirm that the buffer really was the 
size I set in the configuration.

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, 8.0
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13029) Allow HDFS backup/restore buffer size to be configured

2018-12-10 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-13029:

Summary: Allow HDFS backup/restore buffer size to be configured  (was: 
Allow HDFS buffer size to be configured)

> Allow HDFS backup/restore buffer size to be configured
> --
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, master (8.0)
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13029) Allow HDFS buffer size to be configured

2018-12-06 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711366#comment-16711366
 ] 

Tim Owen commented on SOLR-13029:
-

Updated to be specific to the copying of index files to/from HDFS during 
backups and restores. Would be configured in solr.xml using e.g.
{noformat}
  

  ..
  262144
{noformat}
There is another method in {{HdfsBackupRepository}}, {{openInput}} that is only 
used for opening small metadata files or getting the checksum during restores, 
so I have left that using the default buffer size. Only the bulk whole-file 
copying uses the larger buffer.

> Allow HDFS buffer size to be configured
> ---
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, master (8.0)
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13029) Allow HDFS buffer size to be configured

2018-12-06 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-13029:

Attachment: SOLR-13029.patch

> Allow HDFS buffer size to be configured
> ---
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, master (8.0)
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch, SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13029) Allow HDFS buffer size to be configured

2018-12-03 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706976#comment-16706976
 ] 

Tim Owen commented on SOLR-13029:
-

Yes that's a fair point, I will change the patch so that it allows 
HdfsBackupRepository to pass a different value (via its xml config) instead of 
changing the shared constant.

It does make me wonder if index-on-hdfs has similar issues with the small 
buffer size, perhaps it's less of a problem due to random seeks rather than 
bulk copying. We no longer use indexes on hdfs, so I can't compare - we're only 
using the hdfs functionality for backups and restores now.

> Allow HDFS buffer size to be configured
> ---
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, master (8.0)
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2018-11-30 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705257#comment-16705257
 ] 

Tim Owen commented on SOLR-9961:


We considered using this patch locally, but actually found the problem was in 
slow HDFS restores because of an undersized copy buffer. See SOLR-13029 for our 
change to alleviate that. Since we had lots of collections to restore, we did 
those in parallel instead of making the file restore parallelised. But the 
buffer patch made each file restore about 10x faster, with a 256kB buffer 
instead of 4k.

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13029) Allow HDFS buffer size to be configured

2018-11-30 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-13029:

Attachment: SOLR-13029.patch

> Allow HDFS buffer size to be configured
> ---
>
> Key: SOLR-13029
> URL: https://issues.apache.org/jira/browse/SOLR-13029
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, hdfs
>Affects Versions: 7.5, master (8.0)
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-13029.patch
>
>
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13029) Allow HDFS buffer size to be configured

2018-11-30 Thread Tim Owen (JIRA)
Tim Owen created SOLR-13029:
---

 Summary: Allow HDFS buffer size to be configured
 Key: SOLR-13029
 URL: https://issues.apache.org/jira/browse/SOLR-13029
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Backup/Restore, hdfs
Affects Versions: 7.5, master (8.0)
Reporter: Tim Owen


There's a default hardcoded buffer size setting of 4096 in the HDFS code which 
means in particular that restoring a backup from HDFS takes a long time. 
Copying multi-GB files from HDFS using a buffer as small as 4096 bytes is very 
inefficient. We changed this in our local build used in production to 256kB and 
saw a 10x speed improvement when restoring a backup. Attached patch simply 
makes this size configurable using a command line setting, much like several 
other buffer size values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7394) Make MemoryIndex immutable

2018-10-11 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646482#comment-16646482
 ] 

Tim Owen commented on LUCENE-7394:
--

Related to this (although I am happy to raise a separate Jira as a bug report) 
is that mutating a MemoryIndex by calling addField you can end up with a 
corrupt internal state (and ArrayIndexOutOfBoundsException) if you've done a 
search on the index beforehand e.g. call addField, then search, then addField 
again, then search. This appears to be because the sortedTerms internal state 
gets built when the first search happens, and isn't invalidated/null'd when the 
next addField happens. So the second search sees a state where sortedTerms and 
terms are out of sync, and fails.

The documentation doesn't say this is a bad sequence of usage (or prevent it) 
so making it immutable with a Builder would fix that situation. Alternatively, 
calling search could implicitly call freeze, or addField could null out 
sortedTerms.

> Make MemoryIndex immutable
> --
>
> Key: LUCENE-7394
> URL: https://issues.apache.org/jira/browse/LUCENE-7394
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Martijn van Groningen
>Priority: Major
>
> The MemoryIndex itself should just be a builder that constructs an 
> IndexReader instance. The whole notion of freezing a memory index should be 
> removed.
> While we change this we should also clean this class up. There are many 
> methods to add a field, we should just have a single method that accepts a 
> `IndexableField`.
> The `keywordTokenStream(...)` method is unused and untested and should be 
> removed and it doesn't belong with the memory index.
> The `setSimilarity(...)`, `createSearcher(...)` and `search(...)` methods 
> should be removed, because the MemoryIndex should just be responsible for 
> creating an IndexReader instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7830) topdocs facet function

2018-05-30 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495223#comment-16495223
 ] 

Tim Owen commented on SOLR-7830:


I've attached a new patch, I took your original patch and updated it for the 7x 
branch, then added distributed search support (the merging and re-sorting).

We wanted this functionality as it's really useful to fetch 1 or 2 sample 
documents with each bucket for some of our use-cases, and this approach of 
using the topdocs aggregate function works really nicely.

The only limitation is that the sorting for distributed searches can only work 
with field sorting, not with functional sorting, and you can only sort by 
fields that are included in the results (otherwise it would need to include the 
sort values in shard responses - this could be done, but it was more complex 
and we didn't need that for our use-case). Also, the offset parameter isn't 
used, but we felt pagination of these topdocs was quite niche (but it could be 
added to this patch).

> topdocs facet function
> --
>
> Key: SOLR-7830
> URL: https://issues.apache.org/jira/browse/SOLR-7830
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Yonik Seeley
>Priority: Major
> Attachments: ALT-SOLR-7830.patch, SOLR-7830.patch, SOLR-7830.patch
>
>
> A topdocs() facet function would return the top N documents per facet bucket.
> This would be a big step toward unifying grouping and the new facet module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7830) topdocs facet function

2018-05-30 Thread Tim Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-7830:
---
Attachment: ALT-SOLR-7830.patch

> topdocs facet function
> --
>
> Key: SOLR-7830
> URL: https://issues.apache.org/jira/browse/SOLR-7830
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Yonik Seeley
>Priority: Major
> Attachments: ALT-SOLR-7830.patch, SOLR-7830.patch, SOLR-7830.patch
>
>
> A topdocs() facet function would return the top N documents per facet bucket.
> This would be a big step toward unifying grouping and the new facet module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11765) Ability to Facet on a Function

2018-02-08 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11765:

Description: 
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket -and functional sorting (currently it's only sortable by the 
bucket label or by volume)-

Example usage:
{noformat}
{ facet : { dayOfWeek : { type : function, f : 
"chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
{noformat}
I did some refactoring in the facet parser, to hoist some common code for sort 
and pagination parsing.

  was:
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)

Example usage:
{noformat}
{ facet : { dayOfWeek : { type : function, f : 
"chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
{noformat}

I did some refactoring in the facet parser, to hoist some common code for sort 
and pagination parsing.


> Ability to Facet on a Function
> --
>
> Key: SOLR-11765
> URL: https://issues.apache.org/jira/browse/SOLR-11765
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, JSON Request API
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-11765.patch, SOLR-11765.patch
>
>
> This is an extension to the JSON facet functionality, to support faceting on 
> a function. I have extended the parsing of json.facet to allow a 4th facet 
> type (function) and you provide a function expression. You can also provide 
> sort, limit and mincount, as it behaves similarly to faceting on a field. 
> Subfacets work as normal - you can nest function facets anywhere you can use 
> other types.
> The output is in the same format as field facets, but with a bucket per 
> distinct value produced by the function. Hence the usage of this is most 
> appropriate for situations where your function only produces a relatively 
> small number of possible values. It's also recommended to have docValues on 
> any field used by the function.
> Our initial use-case for this is with a function that extracts a given part 
> from a date field's value e.g. day of week, or hour of day, where the 
> possible range of output values is very low.
> Still TODO: documentation, unit tests, and possible extensions to support a 
> missing bucket -and functional sorting (currently it's only sortable by the 
> bucket label or by volume)-
> Example usage:
> {noformat}
> { facet : { dayOfWeek : { type : function, f : 
> "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
> {noformat}
> I did some refactoring in the facet parser, to hoist some common code for 
> sort and pagination parsing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SOLR-11765) Ability to Facet on a Function

2018-02-08 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11765:

Attachment: SOLR-11765.patch

> Ability to Facet on a Function
> --
>
> Key: SOLR-11765
> URL: https://issues.apache.org/jira/browse/SOLR-11765
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, JSON Request API
>Reporter: Tim Owen
>Priority: Major
> Attachments: SOLR-11765.patch, SOLR-11765.patch
>
>
> This is an extension to the JSON facet functionality, to support faceting on 
> a function. I have extended the parsing of json.facet to allow a 4th facet 
> type (function) and you provide a function expression. You can also provide 
> sort, limit and mincount, as it behaves similarly to faceting on a field. 
> Subfacets work as normal - you can nest function facets anywhere you can use 
> other types.
> The output is in the same format as field facets, but with a bucket per 
> distinct value produced by the function. Hence the usage of this is most 
> appropriate for situations where your function only produces a relatively 
> small number of possible values. It's also recommended to have docValues on 
> any field used by the function.
> Our initial use-case for this is with a function that extracts a given part 
> from a date field's value e.g. day of week, or hour of day, where the 
> possible range of output values is very low.
> Still TODO: documentation, unit tests, and possible extensions to support a 
> missing bucket and functional sorting (currently it's only sortable by the 
> bucket label or by volume)
> Example usage:
> {noformat}
> { facet : { dayOfWeek : { type : function, f : 
> "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
> {noformat}
> I did some refactoring in the facet parser, to hoist some common code for 
> sort and pagination parsing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11832) Restore from backup creates old format collections

2018-01-11 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321976#comment-16321976
 ] 

Tim Owen commented on SOLR-11832:
-

[~varunthacker] .. yes it does appear to be the same issue as SOLR-11586 .. 
sorry I had searched Jira for {{backup}} and {{restore}} and didn't see your 
ticket before!

I'd agree the default needs changing too (as my patch is doing) but I'm less 
familiar with what other code paths might end up invoking that 
{{ClusterStateMutator}} code I've changed (it might be others as well as 
restored backups).

Feel free to close this as a duplicate then


> Restore from backup creates old format collections
> --
>
> Key: SOLR-11832
> URL: https://issues.apache.org/jira/browse/SOLR-11832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.2, 6.6.2
>Reporter: Tim Owen
>Assignee: Varun Thacker
>Priority: Minor
> Attachments: SOLR-11832.patch
>
>
> Restoring a collection from a backup always creates the new collection using 
> the old format state json (format 1), as a global clusterstate.json file at 
> top level of ZK. All new collections should be defaulting to use the newer 
> per-collection (format 2) in /collections/.../state.json
> As we're running clusters with many collections, the old global state format 
> isn't good for us, so as a workaround for now we're calling 
> MIGRATESTATEFORMAT immediately after the RESTORE call.
> This bug was mentioned in the comments of SOLR-5750 and also recently 
> mentioned by [~varunthacker] in SOLR-11560
> Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this 
> means at least 1 test class doesn't succeed anymore. From what I can tell, 
> the BasicDistributedZk2Test fails because it's not using the official 
> collection API to create a collection, it seems to be bypassing that and 
> manually creating cores using the core admin api instead, which I think is 
> not enough to ensure the correct ZK nodes are created. The test superclass 
> has some methods to create a collection which do use the collection api so I 
> could try fixing the tests (I'm just not that familiar with why those 
> BasicDistributed*Test classes aren't using the collection api).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11832) Restore from backup creates old format collections

2018-01-09 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318139#comment-16318139
 ] 

Tim Owen commented on SOLR-11832:
-

You're quite right Erick, my mistake.. that test class has been fixed in the 
master branch (but is still broken in branch_6x) so with this patch the tests 
do complete successfully. Hence this patch can be merged to master and to 
branch_7x but it can't be backported to branch_6x as it stands.

We're running 6.6.2 in production, so we'll just use the workaround for now 
until we get around to upgrading to 7.


> Restore from backup creates old format collections
> --
>
> Key: SOLR-11832
> URL: https://issues.apache.org/jira/browse/SOLR-11832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.2, 6.6.2
>Reporter: Tim Owen
>Assignee: Varun Thacker
>Priority: Minor
> Attachments: SOLR-11832.patch
>
>
> Restoring a collection from a backup always creates the new collection using 
> the old format state json (format 1), as a global clusterstate.json file at 
> top level of ZK. All new collections should be defaulting to use the newer 
> per-collection (format 2) in /collections/.../state.json
> As we're running clusters with many collections, the old global state format 
> isn't good for us, so as a workaround for now we're calling 
> MIGRATESTATEFORMAT immediately after the RESTORE call.
> This bug was mentioned in the comments of SOLR-5750 and also recently 
> mentioned by [~varunthacker] in SOLR-11560
> Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this 
> means at least 1 test class doesn't succeed anymore. From what I can tell, 
> the BasicDistributedZk2Test fails because it's not using the official 
> collection API to create a collection, it seems to be bypassing that and 
> manually creating cores using the core admin api instead, which I think is 
> not enough to ensure the correct ZK nodes are created. The test superclass 
> has some methods to create a collection which do use the collection api so I 
> could try fixing the tests (I'm just not that familiar with why those 
> BasicDistributed*Test classes aren't using the collection api).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11832) Restore from backup creates old format collections

2018-01-08 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11832:

Priority: Minor  (was: Major)

> Restore from backup creates old format collections
> --
>
> Key: SOLR-11832
> URL: https://issues.apache.org/jira/browse/SOLR-11832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.2, 6.6.2
>Reporter: Tim Owen
>Priority: Minor
> Attachments: SOLR-11832.patch
>
>
> Restoring a collection from a backup always creates the new collection using 
> the old format state json (format 1), as a global clusterstate.json file at 
> top level of ZK. All new collections should be defaulting to use the newer 
> per-collection (format 2) in /collections/.../state.json
> As we're running clusters with many collections, the old global state format 
> isn't good for us, so as a workaround for now we're calling 
> MIGRATESTATEFORMAT immediately after the RESTORE call.
> This bug was mentioned in the comments of SOLR-5750 and also recently 
> mentioned by [~varunthacker] in SOLR-11560
> Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this 
> means at least 1 test class doesn't succeed anymore. From what I can tell, 
> the BasicDistributedZk2Test fails because it's not using the official 
> collection API to create a collection, it seems to be bypassing that and 
> manually creating cores using the core admin api instead, which I think is 
> not enough to ensure the correct ZK nodes are created. The test superclass 
> has some methods to create a collection which do use the collection api so I 
> could try fixing the tests (I'm just not that familiar with why those 
> BasicDistributed*Test classes aren't using the collection api).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11832) Restore from backup creates old format collections

2018-01-08 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11832:

Attachment: SOLR-11832.patch

> Restore from backup creates old format collections
> --
>
> Key: SOLR-11832
> URL: https://issues.apache.org/jira/browse/SOLR-11832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.2, 6.6.2
>Reporter: Tim Owen
> Attachments: SOLR-11832.patch
>
>
> Restoring a collection from a backup always creates the new collection using 
> the old format state json (format 1), as a global clusterstate.json file at 
> top level of ZK. All new collections should be defaulting to use the newer 
> per-collection (format 2) in /collections/.../state.json
> As we're running clusters with many collections, the old global state format 
> isn't good for us, so as a workaround for now we're calling 
> MIGRATESTATEFORMAT immediately after the RESTORE call.
> This bug was mentioned in the comments of SOLR-5750 and also recently 
> mentioned by [~varunthacker] in SOLR-11560
> Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this 
> means at least 1 test class doesn't succeed anymore. From what I can tell, 
> the BasicDistributedZk2Test fails because it's not using the official 
> collection API to create a collection, it seems to be bypassing that and 
> manually creating cores using the core admin api instead, which I think is 
> not enough to ensure the correct ZK nodes are created. The test superclass 
> has some methods to create a collection which do use the collection api so I 
> could try fixing the tests (I'm just not that familiar with why those 
> BasicDistributed*Test classes aren't using the collection api).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11832) Restore from backup creates old format collections

2018-01-08 Thread Tim Owen (JIRA)
Tim Owen created SOLR-11832:
---

 Summary: Restore from backup creates old format collections
 Key: SOLR-11832
 URL: https://issues.apache.org/jira/browse/SOLR-11832
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Backup/Restore
Affects Versions: 6.6.2, 7.2
Reporter: Tim Owen


Restoring a collection from a backup always creates the new collection using 
the old format state json (format 1), as a global clusterstate.json file at top 
level of ZK. All new collections should be defaulting to use the newer 
per-collection (format 2) in /collections/.../state.json

As we're running clusters with many collections, the old global state format 
isn't good for us, so as a workaround for now we're calling MIGRATESTATEFORMAT 
immediately after the RESTORE call.

This bug was mentioned in the comments of SOLR-5750 and also recently mentioned 
by [~varunthacker] in SOLR-11560

Code patch attached, but as per [~dsmiley]'s comment in the code, fixing this 
means at least 1 test class doesn't succeed anymore. From what I can tell, the 
BasicDistributedZk2Test fails because it's not using the official collection 
API to create a collection, it seems to be bypassing that and manually creating 
cores using the core admin api instead, which I think is not enough to ensure 
the correct ZK nodes are created. The test superclass has some methods to 
create a collection which do use the collection api so I could try fixing the 
tests (I'm just not that familiar with why those BasicDistributed*Test classes 
aren't using the collection api).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11765) Ability to Facet on a Function

2017-12-15 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11765:

Description: 
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)

Example usage:
{noformat}
{ facet : { dayOfWeek : { type : function, f : 
"chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
{noformat}

I did some refactoring in the facet parser, to hoist some common code for sort 
and pagination parsing.

  was:
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)

Example usage:

{ facet : { dayOfWeek : { type : function, f : 
"chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }

I did some refactoring in the facet parser, to hoist some common code for sort 
and pagination parsing.


> Ability to Facet on a Function
> --
>
> Key: SOLR-11765
> URL: https://issues.apache.org/jira/browse/SOLR-11765
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, JSON Request API
>Reporter: Tim Owen
> Attachments: SOLR-11765.patch
>
>
> This is an extension to the JSON facet functionality, to support faceting on 
> a function. I have extended the parsing of json.facet to allow a 4th facet 
> type (function) and you provide a function expression. You can also provide 
> sort, limit and mincount, as it behaves similarly to faceting on a field. 
> Subfacets work as normal - you can nest function facets anywhere you can use 
> other types.
> The output is in the same format as field facets, but with a bucket per 
> distinct value produced by the function. Hence the usage of this is most 
> appropriate for situations where your function only produces a relatively 
> small number of possible values. It's also recommended to have docValues on 
> any field used by the function.
> Our initial use-case for this is with a function that extracts a given part 
> from a date field's value e.g. day of week, or hour of day, where the 
> possible range of output values is very low.
> Still TODO: documentation, unit tests, and possible extensions to support a 
> missing bucket and functional sorting (currently it's only sortable by the 
> bucket label or by volume)
> Example usage:
> {noformat}
> { facet : { dayOfWeek : { type : function, f : 
> "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
> {noformat}
> I did some refactoring in the facet parser, to hoist some common code for 
> sort and pagination parsing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SOLR-11765) Ability to Facet on a Function

2017-12-15 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11765:

Description: 
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)

Example usage:

{ facet : { dayOfWeek : { type : function, f : 
"chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }

I did some refactoring in the facet parser, to hoist some common code for sort 
and pagination parsing.

  was:
This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)


> Ability to Facet on a Function
> --
>
> Key: SOLR-11765
> URL: https://issues.apache.org/jira/browse/SOLR-11765
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, JSON Request API
>Reporter: Tim Owen
> Attachments: SOLR-11765.patch
>
>
> This is an extension to the JSON facet functionality, to support faceting on 
> a function. I have extended the parsing of json.facet to allow a 4th facet 
> type (function) and you provide a function expression. You can also provide 
> sort, limit and mincount, as it behaves similarly to faceting on a field. 
> Subfacets work as normal - you can nest function facets anywhere you can use 
> other types.
> The output is in the same format as field facets, but with a bucket per 
> distinct value produced by the function. Hence the usage of this is most 
> appropriate for situations where your function only produces a relatively 
> small number of possible values. It's also recommended to have docValues on 
> any field used by the function.
> Our initial use-case for this is with a function that extracts a given part 
> from a date field's value e.g. day of week, or hour of day, where the 
> possible range of output values is very low.
> Still TODO: documentation, unit tests, and possible extensions to support a 
> missing bucket and functional sorting (currently it's only sortable by the 
> bucket label or by volume)
> Example usage:
> { facet : { dayOfWeek : { type : function, f : 
> "chronofield(my_date_field,DAY_OF_WEEK)", sort : "count desc" } } }
> I did some refactoring in the facet parser, to hoist some common code for 
> sort and pagination parsing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11765) Ability to Facet on a Function

2017-12-15 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-11765:

Attachment: SOLR-11765.patch

> Ability to Facet on a Function
> --
>
> Key: SOLR-11765
> URL: https://issues.apache.org/jira/browse/SOLR-11765
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, JSON Request API
>Reporter: Tim Owen
> Attachments: SOLR-11765.patch
>
>
> This is an extension to the JSON facet functionality, to support faceting on 
> a function. I have extended the parsing of json.facet to allow a 4th facet 
> type (function) and you provide a function expression. You can also provide 
> sort, limit and mincount, as it behaves similarly to faceting on a field. 
> Subfacets work as normal - you can nest function facets anywhere you can use 
> other types.
> The output is in the same format as field facets, but with a bucket per 
> distinct value produced by the function. Hence the usage of this is most 
> appropriate for situations where your function only produces a relatively 
> small number of possible values. It's also recommended to have docValues on 
> any field used by the function.
> Our initial use-case for this is with a function that extracts a given part 
> from a date field's value e.g. day of week, or hour of day, where the 
> possible range of output values is very low.
> Still TODO: documentation, unit tests, and possible extensions to support a 
> missing bucket and functional sorting (currently it's only sortable by the 
> bucket label or by volume)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11765) Ability to Facet on a Function

2017-12-15 Thread Tim Owen (JIRA)
Tim Owen created SOLR-11765:
---

 Summary: Ability to Facet on a Function
 Key: SOLR-11765
 URL: https://issues.apache.org/jira/browse/SOLR-11765
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Facet Module, JSON Request API
Reporter: Tim Owen


This is an extension to the JSON facet functionality, to support faceting on a 
function. I have extended the parsing of json.facet to allow a 4th facet type 
(function) and you provide a function expression. You can also provide sort, 
limit and mincount, as it behaves similarly to faceting on a field. Subfacets 
work as normal - you can nest function facets anywhere you can use other types.

The output is in the same format as field facets, but with a bucket per 
distinct value produced by the function. Hence the usage of this is most 
appropriate for situations where your function only produces a relatively small 
number of possible values. It's also recommended to have docValues on any field 
used by the function.

Our initial use-case for this is with a function that extracts a given part 
from a date field's value e.g. day of week, or hour of day, where the possible 
range of output values is very low.

Still TODO: documentation, unit tests, and possible extensions to support a 
missing bucket and functional sorting (currently it's only sortable by the 
bucket label or by volume)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-09-15 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168150#comment-16168150
 ] 

Tim Owen commented on SOLR-10826:
-

Hi - can this be backported to the 6.x branch? We're using it built locally on 
top of 6.6 in the meantime. I thought it might be included in 6.6.1 but didn't 
notice it.

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.4, 6.5.1, 6.6
>Reporter: Tim Owen
>Assignee: Varun Thacker
> Fix For: 7.0
>
> Attachments: SOLR-10826.patch, SOLR-10826.patch, SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-29 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-10826:

Attachment: SOLR-10826.patch

OK I've expanded the test a bit, it now creates a second collection, and alias 
for it, and a combined alias spanning both. Then it tests the various 
combinations of {{collection=...}} values work as expected. Again, these tests 
do fail without the code fix.

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.4, 6.5.1, 6.6
>Reporter: Tim Owen
>Assignee: Varun Thacker
> Attachments: SOLR-10826.patch, SOLR-10826.patch, SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-28 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066740#comment-16066740
 ] 

Tim Owen commented on SOLR-10826:
-

Updated patch with some extra assertions. Without the code fix, those extra 
lines fail the test as expected, but pass with the fix.

I had a look at the AliasIntegrationTest but it essential does the same kind of 
thing the CloudSolrClientTest is doing.

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.4, 6.5.1, 6.6
>Reporter: Tim Owen
>Assignee: Varun Thacker
> Attachments: SOLR-10826.patch, SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-28 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-10826:

Attachment: SOLR-10826.patch

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.4, 6.5.1, 6.6
>Reporter: Tim Owen
>Assignee: Varun Thacker
> Attachments: SOLR-10826.patch, SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-23 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061084#comment-16061084
 ] 

Tim Owen commented on SOLR-10826:
-

Hi Varun, yes good point I will add some more tests for this code, next week.

Do you think it doesn't affect the master branch, as you've removed that from 
the Affects field? The code is the same in master too, still.

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.4, 6.5.1, 6.6
>Reporter: Tim Owen
>Assignee: Varun Thacker
> Attachments: SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-07 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-10826:

Affects Version/s: 6.5.1

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.5.1, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-06 Thread Tim Owen (JIRA)
Tim Owen created SOLR-10826:
---

 Summary: CloudSolrClient using unsplit collection list when 
expanding aliases
 Key: SOLR-10826
 URL: https://issues.apache.org/jira/browse/SOLR-10826
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Affects Versions: master (7.0)
Reporter: Tim Owen
 Attachments: SOLR-10826.patch

Some recent refactoring seems to have introduced a bug in SolrJ's 
CloudSolrClient, when it's expanding a collection list and resolving aliases, 
it's using the wrong local variable for the alias lookup. This leads to an 
exception because the value is not an alias.

E.g. suppose you made a request with {{=x,y}} where either or both 
of {{x}} and {{y}} are not real collection names but valid aliases. This will 
fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
name lookup.

Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10826) CloudSolrClient using unsplit collection list when expanding aliases

2017-06-06 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-10826:

Attachment: SOLR-10826.patch

> CloudSolrClient using unsplit collection list when expanding aliases
> 
>
> Key: SOLR-10826
> URL: https://issues.apache.org/jira/browse/SOLR-10826
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-10826.patch
>
>
> Some recent refactoring seems to have introduced a bug in SolrJ's 
> CloudSolrClient, when it's expanding a collection list and resolving aliases, 
> it's using the wrong local variable for the alias lookup. This leads to an 
> exception because the value is not an alias.
> E.g. suppose you made a request with {{=x,y}} where either or both 
> of {{x}} and {{y}} are not real collection names but valid aliases. This will 
> fail, incorrectly, because the lookup is using {{x,y}} as a potential alias 
> name lookup.
> Patch to fix this attached, which was tested locally and fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2017-03-15 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925821#comment-15925821
 ] 

Tim Owen commented on SOLR-7191:


Admittedly not thousands of collections, but another anecdote. Each of our 
clusters are 12 hosts running 6 nodes each, with 165 collections of 16 shards 
each, 3x replication. So around 7900 cores spread over 72 nodes (roughly 100 
each).

To get stable restarts we throttle the recovery thread pool size, see ticket I 
raised with our patch, SOLR-9936 - without that, the amount of recovery just 
kills the network and disks and the cluster status never settles.

Also we avoid restarting all nodes at once, we bring up a few at a time and 
wait for their recovery to finish before starting more. We need to automate 
this, e.g. using a Zookeeper lock pool so that nodes will wait to startup.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Noble Paul
>  Labels: performance, scalability
> Fix For: 6.3
>
> Attachments: lots-of-zkstatereader-updates-branch_5x.log, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-30 Thread Tim Owen (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Tim Owen updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Solr /  SOLR-9936 
 
 
 
  Allow configuration for recoveryExecutor thread pool size  
 
 
 
 
 
 
 
 
 
 
Just uploaded a replacement patch that builds against the master branch (the previous one was a patch against 6.3 and wouldn't merge to master because of all the changes to metrics) 
 
 
 
 
 
 
 
 
 

Change By:
 
 Tim Owen 
 
 
 

Attachment:
 
 SOLR-9936.patch 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Commented] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-09 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811200#comment-15811200
 ] 

Tim Owen commented on SOLR-9936:


Thanks - given the comment with the updateExecutor code and Yonik's reply in 
ticket SOLR-8205 I was wary of changing this, but I couldn't see a scenario 
where it could deadlock. Would certainly appreciate some further input from 
people who've worked on the recovery code e.g. [~shalinmangar] in SOLR-7280.

> Allow configuration for recoveryExecutor thread pool size
> -
>
> Key: SOLR-9936
> URL: https://issues.apache.org/jira/browse/SOLR-9936
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9936.patch
>
>
> There are two executor services in {{UpdateShardHandler}}, the 
> {{updateExecutor}} whose size is unbounded for reasons explained in the code 
> comments. There is also the {{recoveryExecutor}} which was added later, and 
> is the one that executes the {{RecoveryStrategy}} code to actually fetch 
> index files and store to disk, eventually calling an {{fsync}} thread to 
> ensure the data is written.
> We found that with a fast network such as 10GbE it's very easy to overload 
> the local disk storage when doing a restart of Solr instances after some 
> downtime, if they have many cores to load. Typically we have each physical 
> server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir 
> on a dedicated SSD. With 100+ cores (shard replicas) on each instance, 
> startup can really hammer the SSD as it's writing in parallel from as many 
> cores as Solr is recovering. This made recovery time bad enough that replicas 
> were down for a long time, and even shards marked as down if none of its 
> replicas have recovered (usually when many machines have been restarted). The 
> very slow IO times (10s of seconds or worse) also made the JVM pause, so that 
> disconnects from ZK, which didn't help recovery either.
> This patch allowed us to throttle how much parallelism there would be writing 
> to a disk - in practice we're using a pool size of 4 threads, to prevent the 
> SSD getting overloaded, and that worked well enough to make recovery of all 
> cores in reasonable time.
> Due to the comment on the other thread pool size, I'd like some comments on 
> whether it's OK to do this for the {{recoveryExecutor}} though?
> It's configured in solr.xml with e.g.
> {noformat}
>   
> ${solr.recovery.threads:4}
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9936:
---
Description: 
There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted). The very slow IO 
times (10s of seconds or worse) also made the JVM pause, so that disconnects 
from ZK, which didn't help recovery either.

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}


  was:
There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted).

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}



> Allow configuration for recoveryExecutor thread pool size
> -
>
> Key: SOLR-9936
> URL: https://issues.apache.org/jira/browse/SOLR-9936
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9936.patch
>
>
> There are two executor services in {{UpdateShardHandler}}, the 
> {{updateExecutor}} whose size is unbounded for reasons explained in the code 
> comments. There is also the {{recoveryExecutor}} which was added later, and 
> is the one that executes the {{RecoveryStrategy}} code to actually fetch 
> index files and store to disk, eventually calling an {{fsync}} thread to 
> ensure the data is written.
> We found that with a fast network such as 10GbE it's very easy to overload 
> the local disk storage when doing a restart of Solr instances after some 
> downtime, if they have many cores to load. Typically we have each physical 
> server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir 
> on a dedicated SSD. With 100+ cores (shard replicas) on each instance, 
> startup can really hammer the SSD as it's writing in parallel from as many 
> cores as Solr is recovering. This made recovery time bad enough that replicas 
> were down for a long time, and 

[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9936:
---
Attachment: SOLR-9936.patch

> Allow configuration for recoveryExecutor thread pool size
> -
>
> Key: SOLR-9936
> URL: https://issues.apache.org/jira/browse/SOLR-9936
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9936.patch
>
>
> There are two executor services in {{UpdateShardHandler}}, the 
> {{updateExecutor}} whose size is unbounded for reasons explained in the code 
> comments. There is also the {{recoveryExecutor}} which was added later, and 
> is the one that executes the {{RecoveryStrategy}} code to actually fetch 
> index files and store to disk, eventually calling an {{fsync}} thread to 
> ensure the data is written.
> We found that with a fast network such as 10GbE it's very easy to overload 
> the local disk storage when doing a restart of Solr instances after some 
> downtime, if they have many cores to load. Typically we have each physical 
> server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir 
> on a dedicated SSD. With 100+ cores (shard replicas) on each instance, 
> startup can really hammer the SSD as it's writing in parallel from as many 
> cores as Solr is recovering. This made recovery time bad enough that replicas 
> were down for a long time, and even shards marked as down if none of its 
> replicas have recovered (usually when many machines have been restarted).
> This patch allowed us to throttle how much parallelism there would be writing 
> to a disk - in practice we're using a pool size of 4 threads, to prevent the 
> SSD getting overloaded, and that worked well enough to make recovery of all 
> cores in reasonable time.
> Due to the comment on the other thread pool size, I'd like some comments on 
> whether it's OK to do this for the {{recoveryExecutor}} though?
> It's configured in solr.xml with e.g.
> {noformat}
>   
> ${solr.recovery.threads:4}
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9936:
--

 Summary: Allow configuration for recoveryExecutor thread pool size
 Key: SOLR-9936
 URL: https://issues.apache.org/jira/browse/SOLR-9936
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: replication (java)
Affects Versions: 6.3
Reporter: Tim Owen


There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted).

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-06 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804336#comment-15804336
 ] 

Tim Owen commented on SOLR-9918:


OK I see what you mean, I can explain our use-case if that helps to understand 
why we developed this processor, and when it might prove useful.

We have a Kafka queue of messages, which are a mixture of Create, Update and 
Delete operations, and these are consumed and fed into two different storage 
systems - Solr and a RDBMS. We want the behaviour to be consistent, so that the 
two systems are in sync, and the way the Database storage app works is that 
Create operations are implemented as effectively {{INSERT IF NOT EXISTS ...}} 
and Update operations are the typical SQL {{UPDATE .. WHERE id = ..}} that 
quietly do nothing if there is no row for {{id}}. So we want the Solr storage 
to behave in the same way.

There can occasionally be duplicate messages that Create the same {{id}} due to 
the hundreds of instances of the app that adds messages to Kafka, and small 
race conditions that mean two or more of them will do some duplicate work. We 
chose to accept this situation and de-dupe downstream by having both storage 
apps behave as above.

Another scenario is that, since we have the Kafka queue as a buffer, if there's 
any problems downstream we can always stop the storage apps, restore last 
night's backup, rewind the Kafka consumer offset (slightly beyond the backup 
point) and then replay. In this situation we don't want a lot of index churn 
for the overlap Create messages.

With updates, the apps which add Update messages only have best-effort 
knowledge of which document/row {{id}}s are relevant to the field/column being 
changed by the update message. So we quite commonly have messages that are 
optimistic updates, for a document that doesn't in fact exist (now). The 
database storage handles this quietly, so we wanted the same behaviour in Solr. 
Initially what happened in Solr was we'd get newly-created documents containing 
only the fields changed in the AtomicUpdate, so we added a required field to 
avoid that happening, which works but is noisy as we get a Solr exception each 
time (and then batch updates are messy because we have to split and retry).

I looked at {{DocBasedVersionConstraintsProcessor}} but we don't have 
explicitly-managed versioning for our documents in Solr. Then I looked at 
{{SignatureUpdateProcessor}} but that does churn the index and overwrites 
documents, which we didn't want. Also considered {{TolerantUpdateProcessor}} 
but that isn't really solving the issue for inserts, it just would make some 
update batches less noisy.

I'd say this processor is useful in situations where you have documents that 
don't have any concept of multiple versions that can be assigned by the app, 
and don't have any kind of fuzzy-ness about similar documents i.e. each 
document has a strong identity, akin to what a Database unique key is.

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 

[jira] [Updated] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2017-01-05 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9503:
---
Attachment: SOLR-9503.patch

I went through the tests and found that if I added another rule to the existing 
test for the overseer-role, it would fail as expected with the previous code. 
That test now passes with the fix, so I've updated my patch with that test 
change.

> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
>Assignee: Noble Paul
> Attachments: SOLR-9503.patch, SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-04 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798916#comment-15798916
 ] 

Tim Owen commented on SOLR-9918:


Fair points Koji - I have updated the patch with a bit more documentation. I've 
also added the example configuration in the Javadoc comment.

Probably the [Confluence 
page|https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors#UpdateRequestProcessors-UpdateRequestProcessorFactories]
 is the best place to put that kind of guideline notes on which processors to 
choose for different situations.

In the particular case of the SignatureUpdateProcessor, that class will cause 
the new document to overwrite/replace any existing document, not skip it, which 
is why I didn't use it for our use-case.

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-04 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9918:
---
Attachment: SOLR-9918.patch

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2017-01-04 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798416#comment-15798416
 ] 

Tim Owen commented on SOLR-9503:


Is anyone able to take a look at this fix - maybe [~noble.paul]? I hope the 
assumptions I've made in the diff are correct.

We've been using it in production for a few months, in our custom build of 
Solr. Would be nice to roll it in upstream.


> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-03 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9918:
---
Attachment: SOLR-9918.patch

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-03 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9918:
--

 Summary: An UpdateRequestProcessor to skip duplicate inserts and 
ignore updates to missing docs
 Key: SOLR-9918
 URL: https://issues.apache.org/jira/browse/SOLR-9918
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: update
Reporter: Tim Owen


This is an UpdateRequestProcessor and Factory that we have been using in 
production, to handle 2 common cases that were awkward to achieve using the 
existing update pipeline and current processor classes:

* When inserting document(s), if some already exist then quietly skip the new 
document inserts - do not churn the index by replacing the existing documents 
and do not throw a noisy exception that breaks the batch of inserts. By analogy 
with SQL, {{insert if not exists}}. In our use-case, multiple application 
instances can (rarely) process the same input so it's easier for us to de-dupe 
these at Solr insert time than to funnel them into a global ordered queue first.
* When applying AtomicUpdate documents, if a document being updated does not 
exist, quietly do nothing - do not create a new partially-populated document 
and do not throw a noisy exception about missing required fields. By analogy 
with SQL, {{update where id = ..}}. Our use-case relies on this because we 
apply updates optimistically and have best-effort knowledge about what 
documents will exist, so it's easiest to skip the updates (in the same way a 
Database would).

I would have kept this in our own package hierarchy but it relies on some 
package-scoped methods, and seems like it could be useful to others if they 
choose to configure it. Some bits of the code were borrowed from 
{{DocBasedVersionConstraintsProcessorFactory}}.

Attached patch has unit tests to confirm the behaviour.

This class can be used by configuring solrconfig.xml like so..

{noformat}
  


  true
  false 



  
{noformat}

and initParams defaults of

{noformat}
  skipexisting
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9915) PeerSync alreadyInSync check is not backwards compatible

2017-01-03 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9915:
---
Attachment: SOLR-9915.patch

> PeerSync alreadyInSync check is not backwards compatible
> 
>
> Key: SOLR-9915
> URL: https://issues.apache.org/jira/browse/SOLR-9915
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9915.patch
>
>
> The fingerprint check added to PeerSync in SOLR-9446 works fine when all 
> servers are running 6.3 but this means it's hard to do a rolling upgrade from 
> e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to 
> get a fingerprint and then gets a NPE because the older server doesn't return 
> the expected field in its response.
> This leads to the PeerSync completely failing, and results in a full index 
> replication from scratch, copying all index files over the network. We 
> noticed this happening when we tried to do a rolling upgrade on one of our 
> 6.2.1 clusters to 6.3. Unfortunately this amount of replication was hammering 
> our disks and network, so we had to do a full shutdown, upgrade all to 6.3 
> and restart, which was not ideal for a production cluster.
> The attached patch should behave more gracefully in this situation, as it 
> will typically return false for alreadyInSync() and then carry on doing the 
> normal re-sync based on versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9915) PeerSync alreadyInSync check is not backwards compatible

2017-01-03 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9915:
--

 Summary: PeerSync alreadyInSync check is not backwards compatible
 Key: SOLR-9915
 URL: https://issues.apache.org/jira/browse/SOLR-9915
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: replication (java)
Affects Versions: 6.3
Reporter: Tim Owen


The fingerprint check added to PeerSync in SOLR-9446 works fine when all 
servers are running 6.3 but this means it's hard to do a rolling upgrade from 
e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to 
get a fingerprint and then gets a NPE because the older server doesn't return 
the expected field in its response.

This leads to the PeerSync completely failing, and results in a full index 
replication from scratch, copying all index files over the network. We noticed 
this happening when we tried to do a rolling upgrade on one of our 6.2.1 
clusters to 6.3. Unfortunately this amount of replication was hammering our 
disks and network, so we had to do a full shutdown, upgrade all to 6.3 and 
restart, which was not ideal for a production cluster.

The attached patch should behave more gracefully in this situation, as it will 
typically return false for alreadyInSync() and then carry on doing the normal 
re-sync based on versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8793) Fix stale commit files' size computation in LukeRequestHandler

2016-11-28 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701538#comment-15701538
 ] 

Tim Owen commented on SOLR-8793:


We get this using Solr 6.3.0 because it's still logged at WARN level, which 
seems a bit alarmist to me. For indexes that are changing rapidly, it happens a 
lot. We're going to increase our logging threshold for that class to ERROR, 
because these messages are just filling up the logs and there's no action we 
can actually take to prevent it, because they're expected to happen sometimes. 
Personally I would make this message INFO level.

> Fix stale commit files' size computation in LukeRequestHandler
> --
>
> Key: SOLR-8793
> URL: https://issues.apache.org/jira/browse/SOLR-8793
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.5
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 5.5.1, 6.0
>
> Attachments: SOLR-8793.patch
>
>
> SOLR-8587 added segments file information and its size to core admin status 
> API. However in case of stale commits, calling that API may result on 
> {{FileNotFoundException}} or {{NoSuchFileException}}, if the segments file no 
> longer exists due to a new commit. We should fix that by returning a proper 
> value for the file's length in this case, maybe -1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9490) BoolField always returning false for non-DV fields when javabin involved (via solrj, or intra node communication)

2016-11-07 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644382#comment-15644382
 ] 

Tim Owen commented on SOLR-9490:


Just to add to this, if anyone was using 6.2.0 and doing document updates, this 
bug affected Atomic Updates and will have reset all boolean fields in the 
document to false when updating other fields of the document i.e. the 
actually-stored and indexed values are changed. We discovered this just 
recently and noticed some documents had lost their original boolean value, 
because we had been doing Atomic updates during the period we were running 
6.2.0 and that had reset the values in the document itself. Even though we've 
now upgraded to 6.2.1 so the displayed values are shown correctly, the stored 
values have now been changed.


> BoolField always returning false for non-DV fields when javabin involved (via 
> solrj, or intra node communication)
> -
>
> Key: SOLR-9490
> URL: https://issues.apache.org/jira/browse/SOLR-9490
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.2
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Critical
> Fix For: 6.2.1, 6.3, master (7.0)
>
> Attachments: SOLR-9490.patch, SOLR-9490.patch, Solr9490.java
>
>
> 2 diff users posted comments in SOLR-9187 indicating that changes introduced 
> in that issue have broken BoolFields that do *not* use DocValues...
> [~cjcowie]...
> {quote}
> Hi, I've just picked up 6.2.0. It seems that the change to toExternal() in 
> BoolField now means that booleans without DocValues return null, which then 
> turns into Boolean.FALSE in toObject() regardless of whether the value is 
> true or false.
> e.g. with this schema, facet counts are correct, the returned values are 
> wrong.
> {code}
>  required="false" multiValued="false"/>
> 
> {code}
> {code}
> "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"124",
> "f_EVE64":false,
> "_version_":1544828487600177152},
>   {
> "id":"123",
> "f_EVE64":false,
> "_version_":1544828492458229760}]
>   },
>   "facet_counts":{
> "facet_queries":{},
> "facet_fields":{
>   "f_EVE64":[
> "false",1,
> "true",1]},
> {code}
> Could toExternal() perhaps fallback to how it originally behaved? e.g.
> {code}
> if (f.binaryValue() == null) {
>   return indexedToReadable(f.stringValue());
> }
> {code}
> {quote}
> [~pavan_shetty]...
> {quote}
> I downloaded solr version 6.2.0 (6.2.0 
> 764d0f19151dbff6f5fcd9fc4b2682cf934590c5 - mike - 2016-08-20 05:41:37) and 
> installed my core.
> In my schema.xml i have an field like following :
>  multiValued="false"/>
> Now i am accessing this field using SolrJ (6.1.0). But i am always getting 
> false value for above field even though it contains true boolean value. This 
> is happening for all boolean fields.
> http://localhost:8983/solr...wt=javabin=2 HTTP/1.1
> It is working fine in other response writer.
> If i change the solr version to 6.1.0, with same SolrJ, it starts working. So 
> clearly this is a bug in version 6.2.0.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud

2016-10-18 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585826#comment-15585826
 ] 

Tim Owen commented on SOLR-5750:


[~dsmiley] you mentioned in the mailing list back in March that you'd fixed the 
situation where restored collections are created using the old stateFormat=1 
but it still seems to be doing that ... did that fix not make it into this 
ticket before merging? We've been trying out the backup/restore and noticed 
it's putting the collection's state into the global clusterstate.json instead 
of where it should be.


> Backup/Restore API for SolrCloud
> 
>
> Key: SOLR-5750
> URL: https://issues.apache.org/jira/browse/SOLR-5750
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Varun Thacker
> Fix For: 6.1
>
> Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, 
> SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch
>
>
> We should have an easy way to do backups and restores in SolrCloud. The 
> ReplicationHandler supports a backup command which can create snapshots of 
> the index but that is too little.
> The command should be able to backup:
> # Snapshots of all indexes or indexes from the leader or the shards
> # Config set
> # Cluster state
> # Cluster properties
> # Aliases
> # Overseer work queue?
> A restore should be able to completely restore the cloud i.e. no manual steps 
> required other than bringing nodes back up or setting up a new cloud cluster.
> SOLR-5340 will be a part of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud

2016-10-18 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585823#comment-15585823
 ] 

Tim Owen commented on SOLR-5750:


[~dsmiley] you mentioned in the mailing list back in March that you'd fixed the 
situation where restored collections are created using the old stateFormat=1 
but it still seems to be doing that ... did that fix not make it into this 
ticket before merging? We've been trying out the backup/restore and noticed 
it's putting the collection's state into the global clusterstate.json instead 
of where it should be.


> Backup/Restore API for SolrCloud
> 
>
> Key: SOLR-5750
> URL: https://issues.apache.org/jira/browse/SOLR-5750
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Varun Thacker
> Fix For: 6.1
>
> Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, 
> SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch
>
>
> We should have an easy way to do backups and restores in SolrCloud. The 
> ReplicationHandler supports a backup command which can create snapshots of 
> the index but that is too little.
> The command should be able to backup:
> # Snapshots of all indexes or indexes from the leader or the shards
> # Config set
> # Cluster state
> # Cluster properties
> # Aliases
> # Overseer work queue?
> A restore should be able to completely restore the cloud i.e. no manual steps 
> required other than bringing nodes back up or setting up a new cloud cluster.
> SOLR-5340 will be a part of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9505) Extra tests to confirm Atomic Update remove behaviour

2016-09-13 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9505:
---
Attachment: SOLR-9505.patch

> Extra tests to confirm Atomic Update remove behaviour
> -
>
> Key: SOLR-9505
> URL: https://issues.apache.org/jira/browse/SOLR-9505
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Tim Owen
>Priority: Minor
> Attachments: SOLR-9505.patch
>
>
> The behaviour of the Atomic Update {{remove}} operation in the code doesn't 
> match the description in the Confluence documentation, which has been 
> questioned already. From looking at the source code, and using curl to 
> confirm, the {{remove}} operation only removes the first occurrence of a 
> value from a multi-valued field, it does not remove all occurrences. The 
> {{removeregex}} operation does remove all, however.
> There are unit tests for Atomic Updates, but they didn't assert this 
> behaviour, so I've added some extra assertions to confirm that, and a couple 
> of extra tests including one that checks that {{removeregex}} does a Regex 
> match of the whole value, not just a find-anywhere operation.
> I think it's the documentation that needs clarifying - the code behaves as 
> expected (assuming {{remove}} was intended to work that way?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9505) Extra tests to confirm Atomic Update remove behaviour

2016-09-13 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9505:
--

 Summary: Extra tests to confirm Atomic Update remove behaviour
 Key: SOLR-9505
 URL: https://issues.apache.org/jira/browse/SOLR-9505
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (7.0)
Reporter: Tim Owen
Priority: Minor


The behaviour of the Atomic Update {{remove}} operation in the code doesn't 
match the description in the Confluence documentation, which has been 
questioned already. From looking at the source code, and using curl to confirm, 
the {{remove}} operation only removes the first occurrence of a value from a 
multi-valued field, it does not remove all occurrences. The {{removeregex}} 
operation does remove all, however.

There are unit tests for Atomic Updates, but they didn't assert this behaviour, 
so I've added some extra assertions to confirm that, and a couple of extra 
tests including one that checks that {{removeregex}} does a Regex match of the 
whole value, not just a find-anywhere operation.

I think it's the documentation that needs clarifying - the code behaves as 
expected (assuming {{remove}} was intended to work that way?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2016-09-12 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484252#comment-15484252
 ] 

Tim Owen commented on SOLR-9503:


As an aside, I noticed that `Rule.Operand.GREATER_THAN` seems to be missing an 
override for `public int compare(Object n1Val, Object n2Val)` .. but compare 
only appears to be used when sorting the live nodes, so maybe it's not a big 
deal?

> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2016-09-12 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9503:
---
Attachment: SOLR-9503.patch

> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2016-09-12 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9503:
--

 Summary: NPE in Replica Placement Rules when using Overseer Role 
with other rules
 Key: SOLR-9503
 URL: https://issues.apache.org/jira/browse/SOLR-9503
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Rules, SolrCloud
Affects Versions: 6.2, master (7.0)
Reporter: Tim Owen


The overseer role introduced in SOLR-9251 works well if there's only a single 
Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
combined with another rule, e.g. 
{code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result in 
a NullPointerException (in Rule.tryAssignNodeToShard)

This happens because the code builds up a nodeVsTags map, but it only has 
entries for nodes that have values for *all* tags used among the rules. This 
means not enough information is available to other rules when they are being 
checked during replica assignment. In the example rules above, if we have a 
cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
have any entry in the nodeVsTags map because they only have the host tag value 
and not the role tag value.

Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
keeping entries that fulfil the constraint of having values for all tags used 
in the rules. Possibly this constraint was suitable when rules were originally 
introduced, but the Role tag (used for Overseers) is unlikely to be present for 
all nodes in the cluster, and similarly for sysprop tags which may or not be 
set for a node.

My patch removes this constraint, so the nodeVsTags map contains everything 
known about all nodes, even if they have no value for a given tag. This allows 
the rule combination above to work, and doesn't appear to cause any problems 
with the code paths that use the nodeVsTags map. They handle null values quite 
well, and the tests pass.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-09-01 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456179#comment-15456179
 ] 

Tim Owen commented on SOLR-9389:


Thanks for the advice David, I'll take a look at the concurrency setting, we'll 
need to test out using fewer shards and see how that compares for our use-case. 
Since we create new collections weekly, we always have the option to increase 
the shard count later if we do hit situations of large merges happening.

Although I'm a bit surprised that this model is considered 'truly massive' .. 
I'd have expected many large Solr installations will have thousands of shards 
across all their collections.

> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Mark Miller
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9381) Snitch for freedisk uses root path not Solr home

2016-09-01 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9381:
---
Attachment: SOLR-9381.patch

> Snitch for freedisk uses root path not Solr home
> 
>
> Key: SOLR-9381
> URL: https://issues.apache.org/jira/browse/SOLR-9381
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Noble Paul
> Attachments: SOLR-9381.patch, SOLR-9381.patch
>
>
> The path used for the freedisk snitch value is hardcoded to / whereas it 
> should be using Solr home. It's fairly common to use hardware for Solr with 
> multiple physical disks on different mount points, with multiple Solr 
> instances running on the box, each pointing its Solr home to a different 
> disk. In this case, the value reported for the freedisk snitch value is 
> wrong, because it's based on the root filesystem space.
> Patch changes this to use solr home from the CoreContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-31 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452531#comment-15452531
 ] 

Tim Owen commented on SOLR-9389:


We're using Solr 6.1 (on local disk now, as mentioned). The first production 
cluster we had hoped to get stable was 40 boxes, each running 5 or 6 Solr JVMs, 
with a dedicated ZK cluster on 3 other boxes, and 100 shards per collection. 
That was problematic, we had a lot of Zookeeper traffic during normal writes, 
but especially whenever one or more boxes were deliberately killed as many Solr 
instances restarted all at once, leading to a large overseer queue and shards 
in recovery for a long time.

Right now we're testing two scaled-down clusters: 24 boxes, and 12 boxes, with 
correspondingly reduced number of shards, to see at what point it can be stable 
when we do destructive testing by killing machines and whole racks, to see how 
it copes. 12 boxes is looking a lot more stable so far.

We'll have to consider running multiple of these smaller clusters instead of 1 
large one - is that best practice? There was some discussion on SOLR-5872 and 
SOLR-5475 about scaling the overseer with large numbers of collections and 
shards, although it's clearly a tricky problem.


> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Mark Miller
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9381) Snitch for freedisk uses root path not Solr home

2016-08-31 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452260#comment-15452260
 ] 

Tim Owen commented on SOLR-9381:


Thanks, I'll make that change soon and replace the patch.

> Snitch for freedisk uses root path not Solr home
> 
>
> Key: SOLR-9381
> URL: https://issues.apache.org/jira/browse/SOLR-9381
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Noble Paul
> Attachments: SOLR-9381.patch
>
>
> The path used for the freedisk snitch value is hardcoded to / whereas it 
> should be using Solr home. It's fairly common to use hardware for Solr with 
> multiple physical disks on different mount points, with multiple Solr 
> instances running on the box, each pointing its Solr home to a different 
> disk. In this case, the value reported for the freedisk snitch value is 
> wrong, because it's based on the root filesystem space.
> Patch changes this to use solr home from the CoreContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9381) Snitch for freedisk uses root path not Solr home

2016-08-31 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451652#comment-15451652
 ] 

Tim Owen commented on SOLR-9381:


This code was last changed by [~andyetitmoves] and [~noble.paul] .. do you 
think this fix is appropriate in all cases?

> Snitch for freedisk uses root path not Solr home
> 
>
> Key: SOLR-9381
> URL: https://issues.apache.org/jira/browse/SOLR-9381
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9381.patch
>
>
> The path used for the freedisk snitch value is hardcoded to / whereas it 
> should be using Solr home. It's fairly common to use hardware for Solr with 
> multiple physical disks on different mount points, with multiple Solr 
> instances running on the box, each pointing its Solr home to a different 
> disk. In this case, the value reported for the freedisk snitch value is 
> wrong, because it's based on the root filesystem space.
> Patch changes this to use solr home from the CoreContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-31 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451626#comment-15451626
 ] 

Tim Owen commented on SOLR-9389:


[~dsmiley] We're now running 6 Solr JVMs per box, as the machines in production 
have 6 SSDs installed, so yeah it works out at around 200 Solr cores being 
served by each Solr JVM. That seems to run fine, and we've had our staging 
environment for another Solr installation with hundreds of cores per JVM for 
several years. The reason for many shards is that we do frequent updates and 
deletes, and want to keep the Lucene index size below a manageable level e.g. 
5GB, to avoid a potentially slow merge that would block writes for too long. 
With composite routing, our queries never touch all shards in a collection - 
just a few.

The problem still is with SolrCloud and the Overseer/Zookeeper, which become 
overloaded with traffic once there's any kind of problem e.g. machine failure, 
or worse an entire rack losing power - this causes a flood of overseer queue 
events and all the nodes feverishly downloading state.json repeatedly. Happy to 
talk to anyone who's working on that problem!

> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Mark Miller
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache

2016-08-30 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448437#comment-15448437
 ] 

Tim Owen commented on SOLR-9374:


No problem, thanks for merging!

> Speed up Jmx MBean retrieval for FieldCache
> ---
>
> Key: SOLR-9374
> URL: https://issues.apache.org/jira/browse/SOLR-9374
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: JMX, web gui
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-9374.patch
>
>
> The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip 
> displaying the full contents of FieldCache entries, and just return the count.
> However, it still computes all the field cache entry info but throws it away 
> and uses only the number of entries. This can make the Jmx MBean retrieval 
> quite slow which is not ideal for regular polling for monitoring purposes. 
> We've typically found the Jmx call took over 1 minute to complete, and jstack 
> output showed that building the stats for this bean was the culprit.
> With this patch, the time is much reduced, usually less than 10 seconds. The 
> response contents are unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-30 Thread Tim Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448406#comment-15448406
 ] 

Tim Owen commented on SOLR-9389:


Great, thanks for reviewing and testing this Mark :)

> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Assignee: Mark Miller
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-05 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9389:
---
Description: 
The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
for its whole lifetime, which consumes two threads on the HDFS data node server 
(dataXceiver and packetresponder) even once the Solr tlog has finished being 
written to.

This means for a cluster with many indexes on HDFS, the number of Xceivers can 
keep growing and eventually hit the limit of 4096 on the data nodes. It's 
especially likely for indexes that have low write rates, because Solr keeps 
enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
There's also the issue that attempting to write to a finished tlog would be a 
major bug, so closing it for writes helps catch that.

Our cluster during testing had 100+ collections with 100 shards each, spread 
across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
replication for the tlog files, this meant we hit the xceiver limit fairly 
easily and had to use the attached patch to ensure tlogs were closed for writes 
once finished.

The patch introduces an extra lifecycle state for the tlog, so it can be closed 
for writes and free up the HDFS resources, while still being available for 
reading. I've tried to make it as unobtrusive as I could, but there's probably 
a better way. I have not changed the behaviour of the local disk tlog 
implementation, because it only consumes a file descriptor regardless of read 
or write.

nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
various reasons). So I don't have a HDFS cluster to do further testing on this, 
I'm just contributing the patch which worked for us.

  was:
The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
for its whole lifetime, which consumes two threads on the HDFS data node server 
(dataXceiver and packetresponder) even once the Solr tlog has finished being 
written to.

This means for a cluster with many indexes on HDFS, the number of Xceivers can 
keep growing and eventually hit the limit of 4096 on the data nodes. It's 
especially likely for indexes that have low write rates, because Solr keeps 
enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
There's also the issue that attempting to write to a finished tlog would be a 
major bug, so closing it for writes helps catch that.

Our cluster during testing had 100+ collections with 100 shards each, spread 
across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
replication for the tlog files, this meant we hit the xceiver limit fairly 
easily and had to use the attached patch to ensure tlogs were closed for writes 
once finished.

The patch introduces an extra lifecycle state for the tlog, so it can be closed 
for writes and free up the HDFS resources, while still being available for 
reading. I've tried to make it as unobtrusive as I could, but there's probably 
a better way. I have not changed the behaviour of the local disk tlog 
implementation, because it only consumes a file descriptor regardless of read 
or write.

nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
various reasons). So I don't have a HDFS cluster to do further testing on this, 
I'm just contributing the patch which worked for us.


> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure 

[jira] [Updated] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-05 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9389:
---
Attachment: SOLR-9389.patch

> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Hadoop Integration, hdfs
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9389) HDFS Transaction logs stay open for writes which leaks Xceivers

2016-08-05 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9389:
--

 Summary: HDFS Transaction logs stay open for writes which leaks 
Xceivers
 Key: SOLR-9389
 URL: https://issues.apache.org/jira/browse/SOLR-9389
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Hadoop Integration, hdfs
Affects Versions: 6.1, master (7.0)
Reporter: Tim Owen


The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
for its whole lifetime, which consumes two threads on the HDFS data node server 
(dataXceiver and packetresponder) even once the Solr tlog has finished being 
written to.

This means for a cluster with many indexes on HDFS, the number of Xceivers can 
keep growing and eventually hit the limit of 4096 on the data nodes. It's 
especially likely for indexes that have low write rates, because Solr keeps 
enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
There's also the issue that attempting to write to a finished tlog would be a 
major bug, so closing it for writes helps catch that.

Our cluster during testing had 100+ collections with 100 shards each, spread 
across 40 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
replication for the tlog files, this meant we hit the xceiver limit fairly 
easily and had to use the attached patch to ensure tlogs were closed for writes 
once finished.

The patch introduces an extra lifecycle state for the tlog, so it can be closed 
for writes and free up the HDFS resources, while still being available for 
reading. I've tried to make it as unobtrusive as I could, but there's probably 
a better way. I have not changed the behaviour of the local disk tlog 
implementation, because it only consumes a file descriptor regardless of read 
or write.

nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
various reasons). So I don't have a HDFS cluster to do further testing on this, 
I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9381) Snitch for freedisk uses root path not Solr home

2016-08-04 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9381:
---
Attachment: SOLR-9381.patch

> Snitch for freedisk uses root path not Solr home
> 
>
> Key: SOLR-9381
> URL: https://issues.apache.org/jira/browse/SOLR-9381
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
> Attachments: SOLR-9381.patch
>
>
> The path used for the freedisk snitch value is hardcoded to / whereas it 
> should be using Solr home. It's fairly common to use hardware for Solr with 
> multiple physical disks on different mount points, with multiple Solr 
> instances running on the box, each pointing its Solr home to a different 
> disk. In this case, the value reported for the freedisk snitch value is 
> wrong, because it's based on the root filesystem space.
> Patch changes this to use solr home from the CoreContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9381) Snitch for freedisk uses root path not Solr home

2016-08-04 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9381:
--

 Summary: Snitch for freedisk uses root path not Solr home
 Key: SOLR-9381
 URL: https://issues.apache.org/jira/browse/SOLR-9381
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 6.1, master (7.0)
Reporter: Tim Owen


The path used for the freedisk snitch value is hardcoded to / whereas it should 
be using Solr home. It's fairly common to use hardware for Solr with multiple 
physical disks on different mount points, with multiple Solr instances running 
on the box, each pointing its Solr home to a different disk. In this case, the 
value reported for the freedisk snitch value is wrong, because it's based on 
the root filesystem space.

Patch changes this to use solr home from the CoreContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache

2016-08-03 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9374:
---
Attachment: SOLR-9374.patch

> Speed up Jmx MBean retrieval for FieldCache
> ---
>
> Key: SOLR-9374
> URL: https://issues.apache.org/jira/browse/SOLR-9374
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: JMX, web gui
>Affects Versions: 6.1, master (7.0)
>Reporter: Tim Owen
>Priority: Minor
> Attachments: SOLR-9374.patch
>
>
> The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip 
> displaying the full contents of FieldCache entries, and just return the count.
> However, it still computes all the field cache entry info but throws it away 
> and uses only the number of entries. This can make the Jmx MBean retrieval 
> quite slow which is not ideal for regular polling for monitoring purposes. 
> We've typically found the Jmx call took over 1 minute to complete, and jstack 
> output showed that building the stats for this bean was the culprit.
> With this patch, the time is much reduced, usually less than 10 seconds. The 
> response contents are unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9374) Speed up Jmx MBean retrieval for FieldCache

2016-08-03 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9374:
--

 Summary: Speed up Jmx MBean retrieval for FieldCache
 Key: SOLR-9374
 URL: https://issues.apache.org/jira/browse/SOLR-9374
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: JMX, web gui
Affects Versions: 6.1, master (7.0)
Reporter: Tim Owen
Priority: Minor


The change made in SOLR-8892 allowed for Jmx requests for MBean info to skip 
displaying the full contents of FieldCache entries, and just return the count.

However, it still computes all the field cache entry info but throws it away 
and uses only the number of entries. This can make the Jmx MBean retrieval 
quite slow which is not ideal for regular polling for monitoring purposes. 
We've typically found the Jmx call took over 1 minute to complete, and jstack 
output showed that building the stats for this bean was the culprit.

With this patch, the time is much reduced, usually less than 10 seconds. The 
response contents are unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org