[jira] [Commented] (SOLR-11949) Create Time Routed Alias stress-test

2018-02-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376480#comment-16376480
 ] 

David Smiley commented on SOLR-11949:
-

Yeah I was going to mention that -- I agree "ERROR" is a poor choice.  
HttpClusterStateProvider is a bit new and I suspect isn't used widely yet.

> Create Time Routed Alias stress-test
> 
>
> Key: SOLR-11949
> URL: https://issues.apache.org/jira/browse/SOLR-11949
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Priority: Major
>
> It would be nice to have a scalability test / stress test of sorts for Time 
> Routed Aliases to help identify any problems that may exist.  At least at the 
> moment, I'm thinking of a test that would never get run automatically (by say 
> Jenkins or "ant test"), but I could change my mind.  We already have some TRA 
> tests of course but except for one of them, the tests are more about 
> functionality rather than proving out possible race conditions & other 
> scalability bugs.
> Something that creates one TRA up front then beats on it for awhile, then 
> shuts down
> * configurable # nodes, and TRA statistics.  Maybe 10-sec interval 
> collections, with deleting collections older than a minute.
> * May randomly update the interval part-way through
> * sends data in multiple threads.  
> * sends data to nodes randomly via HttpSolrClient or 
> ConcurrentUpdateSolrClient or CloudSolrClient randomly (test infra can do 
> this already except CUSC), or 
> * sends data in batches of configurable sizes.
> * at the end verifies that the collections only hold the documents they 
> should (one of my TRA tests has code that can be used here)
> Using this test, it'd be interesting to see what happens when a core for the 
> oldest collection is receiving documents while simultaneously it is getting 
> deleted (for being old).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11949) Create Time Routed Alias stress-test

2018-02-25 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376191#comment-16376191
 ] 

Gus Heck commented on SOLR-11949:
-

Did some research into the above stack trace, which seems to occur in the event 
that cluster state is requested for an alias. 
org.apache.solr.client.solrj.impl.HttpClusterStateProvider#getState seems to 
actually parse the exception message (yuck!!) to detect this... But it gets 
logged as an error on the server side which is somewhat confusing since an 
ERROR is usually something actually wrong that needs fixing...  but as such, 
for the purposes of this ticket the above stack trace does not represent a 
problem.

> Create Time Routed Alias stress-test
> 
>
> Key: SOLR-11949
> URL: https://issues.apache.org/jira/browse/SOLR-11949
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Priority: Major
>
> It would be nice to have a scalability test / stress test of sorts for Time 
> Routed Aliases to help identify any problems that may exist.  At least at the 
> moment, I'm thinking of a test that would never get run automatically (by say 
> Jenkins or "ant test"), but I could change my mind.  We already have some TRA 
> tests of course but except for one of them, the tests are more about 
> functionality rather than proving out possible race conditions & other 
> scalability bugs.
> Something that creates one TRA up front then beats on it for awhile, then 
> shuts down
> * configurable # nodes, and TRA statistics.  Maybe 10-sec interval 
> collections, with deleting collections older than a minute.
> * May randomly update the interval part-way through
> * sends data in multiple threads.  
> * sends data to nodes randomly via HttpSolrClient or 
> ConcurrentUpdateSolrClient or CloudSolrClient randomly (test infra can do 
> this already except CUSC), or 
> * sends data in batches of configurable sizes.
> * at the end verifies that the collections only hold the documents they 
> should (one of my TRA tests has code that can be used here)
> Using this test, it'd be interesting to see what happens when a core for the 
> oldest collection is receiving documents while simultaneously it is getting 
> deleted (for being old).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11949) Create Time Routed Alias stress-test

2018-02-18 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368612#comment-16368612
 ] 

Gus Heck commented on SOLR-11949:
-

Decided not to make it a patch yet, given how much there is to do, but the gist 
of it is that it sets up threads, each dedicated to sending documents that 
should fall in a single partition. These threads start before any partitions 
are valid (and thus adding fails), and then as time passes, the valid range of 
partitions changes and the idea is that threads succeed and then start failing 
again as they pass beyond the maximum time that the TRA supports. The actual 
testing of stuff is quite light right now and there's much to be done there, 
but I'm still trying to get it to work in both the miniSolrCloud case and while 
talking to an external cluster (so I can also manually explore the results 
after the fact with queries). At one point it looked like I had my threads 
under control and had some clean success runs, but something I did has let them 
leak again... or perhaps I got lucky.

> Create Time Routed Alias stress-test
> 
>
> Key: SOLR-11949
> URL: https://issues.apache.org/jira/browse/SOLR-11949
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Priority: Major
>
> It would be nice to have a scalability test / stress test of sorts for Time 
> Routed Aliases to help identify any problems that may exist.  At least at the 
> moment, I'm thinking of a test that would never get run automatically (by say 
> Jenkins or "ant test"), but I could change my mind.  We already have some TRA 
> tests of course but except for one of them, the tests are more about 
> functionality rather than proving out possible race conditions & other 
> scalability bugs.
> Something that creates one TRA up front then beats on it for awhile, then 
> shuts down
> * configurable # nodes, and TRA statistics.  Maybe 10-sec interval 
> collections, with deleting collections older than a minute.
> * May randomly update the interval part-way through
> * sends data in multiple threads.  
> * sends data to nodes randomly via HttpSolrClient or 
> ConcurrentUpdateSolrClient or CloudSolrClient randomly (test infra can do 
> this already except CUSC), or 
> * sends data in batches of configurable sizes.
> * at the end verifies that the collections only hold the documents they 
> should (one of my TRA tests has code that can be used here)
> Using this test, it'd be interesting to see what happens when a core for the 
> oldest collection is receiving documents while simultaneously it is getting 
> deleted (for being old).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11949) Create Time Routed Alias stress-test

2018-02-18 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368611#comment-16368611
 ] 

Gus Heck commented on SOLR-11949:
-

Here's a first version of this, still WIP: 
https://github.com/nsoft/lucene-solr/commit/c87e3dbf9bc1378ae4f4b0c08fced29b14915dd7

> Create Time Routed Alias stress-test
> 
>
> Key: SOLR-11949
> URL: https://issues.apache.org/jira/browse/SOLR-11949
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Priority: Major
>
> It would be nice to have a scalability test / stress test of sorts for Time 
> Routed Aliases to help identify any problems that may exist.  At least at the 
> moment, I'm thinking of a test that would never get run automatically (by say 
> Jenkins or "ant test"), but I could change my mind.  We already have some TRA 
> tests of course but except for one of them, the tests are more about 
> functionality rather than proving out possible race conditions & other 
> scalability bugs.
> Something that creates one TRA up front then beats on it for awhile, then 
> shuts down
> * configurable # nodes, and TRA statistics.  Maybe 10-sec interval 
> collections, with deleting collections older than a minute.
> * May randomly update the interval part-way through
> * sends data in multiple threads.  
> * sends data to nodes randomly via HttpSolrClient or 
> ConcurrentUpdateSolrClient or CloudSolrClient randomly (test infra can do 
> this already except CUSC), or 
> * sends data in batches of configurable sizes.
> * at the end verifies that the collections only hold the documents they 
> should (one of my TRA tests has code that can be used here)
> Using this test, it'd be interesting to see what happens when a core for the 
> oldest collection is receiving documents while simultaneously it is getting 
> deleted (for being old).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11949) Create Time Routed Alias stress-test

2018-02-18 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368568#comment-16368568
 ] 

Gus Heck commented on SOLR-11949:
-

I'm close to ready to put up a patch for an initial version of this (only 
achieves some of the above bullets at this point) but I think it may have found 
a possible bug in the ClusterStatus operation... here status has been requested 
for an alias named testForDoubleValidRange.
{code:java}
115974 ERROR (qtp1307921764-77) [n:127.0.0.1:33603_solr ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Collection: 
testForDoubleValidRange not found
at 
org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:102)
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$23(CollectionsHandler.java:805)
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1102)
at 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:244)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:232)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:139)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:527)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:530)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
at java.lang.Thread.run(Thread.java:748)
{code}

> Create Time Routed Alias stress-test
> 
>
> Key: SOLR-11949
> URL: https://issues.apache.org/jira/browse/SOLR-11949
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Priority: Major
>
> It would be nice to have a scalability test / stress test of sorts for Time 
> Routed Aliases to help identify any problems that may exist.  At least at the 
> moment, I'm thinking of a test that would never get run automatically (by say 
> Jenkins or "ant test"), but I could change my mind.  We already have some TRA 
> tests of course but except for one of them, the tests are more about 
> functionality rather than proving out possible race conditions &