[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072155#comment-15072155 ] Mark Miller commented on SOLR-6971: --- bq. I get Jenkins failures in my inbox and haven't seen this in a really long time FWIW. Yeah, but you said the same thing last Jan, and this happened all the time on Jenkins :) There is so much mail and much of it doesn't have the error front and center - really hard to stay on top of for me. bq. Wait, where? I monitor the build fail messages sent to the dev list since I committed the patch and haven't seen any, is this happening to you locally? So I did a gmail search to see if this has stopped and it seems something fixed it or worked around it around September 27. On August 31st, [~shalinmangar] noticed it on the dev list and said: bq. That's a strange one. Looks like something other than a proper string was stored in the ZK node. It happened all year before that. I don't know why it went away in September, but I have not seen it locally in a long time and I have nothing in email with that fail after Sept 27. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071972#comment-15071972 ] Erick Erickson commented on SOLR-6971: -- I get Jenkins failures in my inbox and haven't seen this in a really long time FWIW. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071305#comment-15071305 ] Mark Miller commented on SOLR-6971: --- It showed up on Jenkins as much as for me locally when I saw it. Been awhile since I've paid attention to this fail though. We should search through jenkins email and see when if it was around and if it went away. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071300#comment-15071300 ] Erick Erickson commented on SOLR-6971: -- I just beasted this thing 2,000 times on 5x and don't see any fails. Trunk seems to have issues related to the upgrade of Jetty though, but that's not germane to this JIRA IMO. [~markrmil...@gmail.com] You seem to be the only one who sees this fail, any light to shed on this? If not, and particularly given the history here, I'm thinking of closing this as "can't reproduce" > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589926#comment-14589926 ] Mark Miller commented on SOLR-6971: --- Yeah, I've seen it within the last few weeks. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589896#comment-14589896 ] Erick Erickson commented on SOLR-6971: -- [~markrmil...@gmail.com] Are you seeing this lately? I haven't seen anything go by Jenkins with this error and I've not seen it happen for me locally. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316944#comment-14316944 ] Mark Miller commented on SOLR-6971: --- Thanks Erick - I'll try to get to this soon. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-6971-dumper.patch > > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312731#comment-14312731 ] Erick Erickson commented on SOLR-6971: -- Well, one of the main _points_ of unit tests is to hit cases you didn't explicitly know to test in the first place ;)... Anyway, I have a long boring plane flight ahead of me, I'll see if I can hack up some kind of dump when this happens for testing only, then ask you to put it on locally to see if we can gather some kind of information about where this originates. If that goes well, a patch probably tomorrow. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312633#comment-14312633 ] Mark Miller commented on SOLR-6971: --- I'm seeing it elsewhere too I think. In any case, I'm not sure it's related to the test or what it tests - but it happens to hit this. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310789#comment-14310789 ] Erick Erickson commented on SOLR-6971: -- On the face of it it's very strange. Here's the code leading up to that assert from DistributedQueue.java[120] {noformat} List childNames = zookeeper.getChildren(dir, null, true); stats.setQueueLength(childNames.size()); for (String childName : childNames) { if (childName != null) { try { byte[] data = zookeeper.getData(dir + "/" + childName, null, null, true); if (data != null) { ZkNodeProps message = ZkNodeProps.load(data); // nocommit, called by CollectionsHandler, 687. Trips assert in ByteUtils. {noformat} The assert itself from ByteUtils is this bit of code: {noformat} if (b < 0xc0) { assert b < 0x80; out[out_offset++] = (char)b; {noformat} where b is marching through the buffer passed in that has just been read from ZK. So on the face of it, this looks like somehow the data read from ZK is bad since this is being tripped by data read from ZK, not data passed in. Seems like we need to dump the data in UTF8toUTF16, something like below. Is there precedent, i.e. some nifty buffer dumping already coded up somewhere I can use that would allow us to dump the UTF8 buffer in this case? {noformat} if (b < 0xc0) { if (b < 0x80) { dump lots of stuff here, current bytes decoded, the raw bytes, offset of offending character and all that } assert b < 0x80; out[out_offset++] = (char)b; {noformat} I'm traveling through Monday and won't have a lot of time to pursue this before then. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310358#comment-14310358 ] Mark Miller commented on SOLR-6971: --- So I still see this fail a fair amount, but lately it's been the same thing: {noformat} [junit4]> Error 500 {trace=java.lang.AssertionError [junit4]>at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:36) [junit4]>at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:64) [junit4]>at org.apache.solr.common.cloud.ZkStateReader.fromJSON(ZkStateReader.java:140) [junit4]>at org.apache.solr.common.cloud.ZkNodeProps.load(ZkNodeProps.java:92) [junit4]>at org.apache.solr.cloud.DistributedQueue.containsTaskWithRequestId(DistributedQueue.java:127) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.overseerCollectionQueueContains(CollectionsHandler.java:687) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:712) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:692) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.rejoinElection(CollectionsHandler.java:487) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.insurePreferredIsLeader(CollectionsHandler.java:402) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.handleBalanceLeaders(CollectionsHandler.java:309) [junit4]>at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:275) [junit4]>at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) [junit4]>at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736) [junit4]>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261) {noformat} > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson >Priority: Minor > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275425#comment-14275425 ] Mark Miller commented on SOLR-6971: --- Yup - both on my jenkins and dev machine. I'll post more data as I collect it. > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Erick Erickson > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6971) TestRebalanceLeaders fails too often.
[ https://issues.apache.org/jira/browse/SOLR-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275408#comment-14275408 ] Erick Erickson commented on SOLR-6971: -- Wait, where? I monitor the build fail messages sent to the dev list since I committed the patch and haven't seen any, is this happening to you locally? > TestRebalanceLeaders fails too often. > - > > Key: SOLR-6971 > URL: https://issues.apache.org/jira/browse/SOLR-6971 > Project: Solr > Issue Type: Test >Reporter: Mark Miller > > I see this fail too much - I've seen 3 different fail types so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org