[jira] [Commented] (HBASE-6900) RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
[ https://issues.apache.org/jira/browse/HBASE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470046#comment-13470046 ] ramkrishna.s.vasudevan commented on HBASE-6900: --- @Lars Its better if we get it in 0.94.2. I was once again checking the lastTop behaviour. Even if two times updateReaders happen i feel the lastTop will not be null. Verified the same with a testcase. {code} // All public synchronized API calls will call 'checkReseek' which will cause // the scanner stack to reseek if this.heap==null this.lastTop != null. // But if two calls to updateReaders() happen without a 'next' or 'peek' then we // will end up calling this.peek() which would cause a reseek in the middle of a updateReaders // which is NOT what we want, not to mention could cause an NPE. So we early out here. if (this.heap == null) return; {code} So i feel the patch that you gave should be fine without additional checks. +1 on it. Lars if you can commit that would be great because i cannot do the commit from office and also i may not be able to do it today. RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. Key: HBASE-6900 URL: https://issues.apache.org/jira/browse/HBASE-6900 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2, 0.96.0 Attachments: 6900-test.txt, HBASE-6900_1.patch, HBASE-6900.patch HBASE-5520 introduced reseek() on the RegionScanner. Now when a scanner is created we have the StoreScanner heap. After this if a flush or compaction happens parallely all the StoreScannerObservers are cleared so that whenever a new next() call happens we tend to recreate the scanner based on the latest store files. The reseek() in StoreScanner expects the heap not to be null because always reseek would be called from next() {code} public synchronized boolean reseek(KeyValue kv) throws IOException { //Heap cannot be null, because this is only called from next() which //guarantees that heap will never be null before this call. if (explicitColumnQuery lazySeekEnabledGlobally) { return heap.requestSeek(kv, true, useRowColBloom); } else { return heap.reseek(kv); } } {code} Now when we call RegionScanner.reseek() directly using CPs we tend to get a NPE. In our case it happened when a major compaction was going on. I will also attach a testcase to show the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470050#comment-13470050 ] Gregory Chanan commented on HBASE-6920: --- Haven't been able to get a good run. Is this holding up the new RC for you? I should be able to tell by tomorrow. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6900) RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
[ https://issues.apache.org/jira/browse/HBASE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6900: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Thanks for your patience Ram. RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. Key: HBASE-6900 URL: https://issues.apache.org/jira/browse/HBASE-6900 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2, 0.96.0 Attachments: 6900-test.txt, HBASE-6900_1.patch, HBASE-6900.patch HBASE-5520 introduced reseek() on the RegionScanner. Now when a scanner is created we have the StoreScanner heap. After this if a flush or compaction happens parallely all the StoreScannerObservers are cleared so that whenever a new next() call happens we tend to recreate the scanner based on the latest store files. The reseek() in StoreScanner expects the heap not to be null because always reseek would be called from next() {code} public synchronized boolean reseek(KeyValue kv) throws IOException { //Heap cannot be null, because this is only called from next() which //guarantees that heap will never be null before this call. if (explicitColumnQuery lazySeekEnabledGlobally) { return heap.requestSeek(kv, true, useRowColBloom); } else { return heap.reseek(kv); } } {code} Now when we call RegionScanner.reseek() directly using CPs we tend to get a NPE. In our case it happened when a major compaction was going on. I will also attach a testcase to show the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470053#comment-13470053 ] Lars Hofhansl commented on HBASE-6920: -- Yeah, it's the last (for now :) ) open jira for the next RC. Thanks Gregory. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6900) RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
[ https://issues.apache.org/jira/browse/HBASE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470102#comment-13470102 ] Hudson commented on HBASE-6900: --- Integrated in HBase-TRUNK #3429 (See [https://builds.apache.org/job/HBase-TRUNK/3429/]) HBASE-6900 RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. (Revision 1394377) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. Key: HBASE-6900 URL: https://issues.apache.org/jira/browse/HBASE-6900 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2, 0.96.0 Attachments: 6900-test.txt, HBASE-6900_1.patch, HBASE-6900.patch HBASE-5520 introduced reseek() on the RegionScanner. Now when a scanner is created we have the StoreScanner heap. After this if a flush or compaction happens parallely all the StoreScannerObservers are cleared so that whenever a new next() call happens we tend to recreate the scanner based on the latest store files. The reseek() in StoreScanner expects the heap not to be null because always reseek would be called from next() {code} public synchronized boolean reseek(KeyValue kv) throws IOException { //Heap cannot be null, because this is only called from next() which //guarantees that heap will never be null before this call. if (explicitColumnQuery lazySeekEnabledGlobally) { return heap.requestSeek(kv, true, useRowColBloom); } else { return heap.reseek(kv); } } {code} Now when we call RegionScanner.reseek() directly using CPs we tend to get a NPE. In our case it happened when a major compaction was going on. I will also attach a testcase to show the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6900) RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
[ https://issues.apache.org/jira/browse/HBASE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470101#comment-13470101 ] Hudson commented on HBASE-6900: --- Integrated in HBase-0.94 #507 (See [https://builds.apache.org/job/HBase-0.94/507/]) HBASE-6900 RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. (Revision 1394378) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. Key: HBASE-6900 URL: https://issues.apache.org/jira/browse/HBASE-6900 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2, 0.96.0 Attachments: 6900-test.txt, HBASE-6900_1.patch, HBASE-6900.patch HBASE-5520 introduced reseek() on the RegionScanner. Now when a scanner is created we have the StoreScanner heap. After this if a flush or compaction happens parallely all the StoreScannerObservers are cleared so that whenever a new next() call happens we tend to recreate the scanner based on the latest store files. The reseek() in StoreScanner expects the heap not to be null because always reseek would be called from next() {code} public synchronized boolean reseek(KeyValue kv) throws IOException { //Heap cannot be null, because this is only called from next() which //guarantees that heap will never be null before this call. if (explicitColumnQuery lazySeekEnabledGlobally) { return heap.requestSeek(kv, true, useRowColBloom); } else { return heap.reseek(kv); } } {code} Now when we call RegionScanner.reseek() directly using CPs we tend to get a NPE. In our case it happened when a major compaction was going on. I will also attach a testcase to show the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6956) Do not return back to HTablePool closed connections
Igor Yurinok created HBASE-6956: --- Summary: Do not return back to HTablePool closed connections Key: HBASE-6956 URL: https://issues.apache.org/jira/browse/HBASE-6956 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Igor Yurinok Sometimes we see a lot of Exception about closed connections: {code} org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@553fd068 closed org.apache.hadoop.hbase.client.ClosedConnectionException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@553fd068 closed {code} After investigation we assumed that it occurs because closed connection returns back into HTablePool. For our opinion best solution is check whether the table is closed in method HTablePool.putTable and if true don't add it into the queue and release such HTableInterface. But unfortunatly right now there are no access to HTable#closed field through HTableInterface -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6950) TestAcidGuarantees system test now flushes too aggressively
[ https://issues.apache.org/jira/browse/HBASE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470243#comment-13470243 ] Hudson commented on HBASE-6950: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #210 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/210/]) HBASE-6950 TestAcidGuarantees system test now flushes too aggressively (Revision 1394335) Result = FAILURE gchanan : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java TestAcidGuarantees system test now flushes too aggressively --- Key: HBASE-6950 URL: https://issues.apache.org/jira/browse/HBASE-6950 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.2, 0.94.2, 0.96.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6950.patch HBASE-6552 caused the TestAcidGuarantees system test to flush more aggressively, because flushes are where ACID problems have occurred in the past. After some more cluster testing, it seems like this too aggressive; my clusters eventually can't keep up with the number of flushes/compactions and start getting SocketTimeoutExceptions. We could try to optimize the flushes/compactions, but since this workload would never occur in practice, I don't think it is worth the effort. Instead, let's just only flush once a minute. This is arbitrary, but seems to work. Here is my comment in the (upcoming) patch: {code} // Flushing has been a source of ACID violations previously (see HBASE-2856), so ideally, // we would flush as often as possible. On a running cluster, this isn't practical: // (1) we will cause a lot of load due to all the flushing and compacting // (2) we cannot change the flushing/compacting related Configuration options to try to // alleviate this // (3) it is an unrealistic workload, since no one would actually flush that often. // Therefore, let's flush every minute to have more flushes than usual, but not overload // the running cluster. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6804) [replication] lower the amount of logging to a more human-readable level
[ https://issues.apache.org/jira/browse/HBASE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-6804: - Assignee: Jean-Daniel Cryans [replication] lower the amount of logging to a more human-readable level Key: HBASE-6804 URL: https://issues.apache.org/jira/browse/HBASE-6804 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6804-0.94.patch, HBASE-6804-0.94-v2.patch We need stop logging every time replication decides to do something. It used to be extremely useful when the code base was younger but now it should be possible to bring it down while keeping it relevant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Odell updated HBASE-5582: --- Attachment: HBASE-5582.patch Changed LOG level and re-worded the message a bit, since we no longer have HServerinfo. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Attachments: HBASE-5582.patch The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470475#comment-13470475 ] Phabricator commented on HBASE-6597: tedyu has commented on the revision [jira] [HBASE-6597] [89-fb] Incremental data block encoding. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38 Length of prefix is returned. Name this method getCommonPrefixLength ? src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93 Do we need to consider memstoreTS ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 Check / assert that skipLastBytes is not negative ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 prevKey stores the previous key, right ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 Name this variable negativeDiffTimestamp ? src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 This class can be private, right ? src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 This class can be private, right ? REVISION DETAIL https://reviews.facebook.net/D5895 To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin Cc: tedyu Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Priority: Minor Attachments: D5895.1.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Odell updated HBASE-5582: --- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) Changed the LOG.info to LOG.warn, and expanded the error message a bit. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470484#comment-13470484 ] Ted Yu commented on HBASE-5582: --- Line too long: {code} +LOG.warn(serverName.toString() + is not online or isn't known to the master. The latter could be caused by a DNS misconfiguration.); {code} Wrap the second sentence. Otherwise looks good. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Odell updated HBASE-5582: --- Attachment: HBASE-5582.patch1 No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Odell updated HBASE-5582: --- Status: Open (was: Patch Available) No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Odell updated HBASE-5582: --- Status: Patch Available (was: Open) No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470490#comment-13470490 ] Kevin Odell commented on HBASE-5582: Resubmitted the patch No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-6948: -- Attachment: HBASE-6948.patch Fix the split key to be correct byte[] shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates reassigned HBASE-6797: -- Assignee: Jesse Yates TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-6797: --- Status: Patch Available (was: Open) TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6889) Ignore source control files with apache-rat
[ https://issues.apache.org/jira/browse/HBASE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470504#comment-13470504 ] Jesse Yates commented on HBASE-6889: [~lhofhansl] wanna commit it? Ignore source control files with apache-rat --- Key: HBASE-6889 URL: https://issues.apache.org/jira/browse/HBASE-6889 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Attachments: hbase-6889-mvn-v0.patch Running 'mvn apache-rat:check' locally causes a failure because it finds the source control files, making it hard to check that you didn't include a file without a source header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470515#comment-13470515 ] Hadoop QA commented on HBASE-5582: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548006/HBASE-5582.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3013//console This message is automatically generated. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470516#comment-13470516 ] Chris Trezzo commented on HBASE-6476: - A few curious things (if anyone has any thoughts on these that would be great): 1. Check out the Test history: https://builds.apache.org/job/HBase-TRUNK/3392/testReport/junit/org.apache.hadoop.hbase.util/TestThreads/testSleepWithoutInterrupt/history/ The System.currentTimeMillis() change might not have caused the test failure (it has failed twice for the same reason since the patch was reverted). In addition, there is something wonky going on because the test duration for all the failed runs is 1~2ms. The error message for the failed runs stated that the test timed out after 6000ms, which is the timeout set in the test tag. 2. The method sleepWithoutInterrupt is only called from test code, except during thrift server shutdown. Look at TBoundedThreadPoolServer.shutdownServer(). Anyone have any thoughts why we do the extra wait which calls sleepWithoutInterrupt? My thoughts are that we could remove the extra wait since we are already calling awaitTermination on the executor service in a loop above. Either that, or we could just replace the call with another call to awaitTermination and keep the second wait loop. This would limit sleepWithoutInterrupt calls to just test code. 3. Another tricky part with the EdgeManager is dealing with small tests that run in parallel within the same JVM. If test1 uses a non-default EnvironmentEdge, and test2 is relying on the DefaultEdge and doesn't set it explicitly, test2 would unknowingly be using whatever EnvironmentEdge test1 set. This would create flapping tests that might be tricky to debug. Thoughts? Thanks, Chris Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Chris Trezzo Priority: Minor Fix For: 0.96.0 Attachments: 6476.txt, 6476-v2.txt, 6476-v2.txt, 6476v3.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-6597: -- Assignee: Mikhail Bautin Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-6597) Block Encoding Size Estimation
[ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-6597 started by Mikhail Bautin. Block Encoding Size Estimation -- Key: HBASE-6597 URL: https://issues.apache.org/jira/browse/HBASE-6597 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.89-fb Reporter: Brian Nixon Assignee: Mikhail Bautin Priority: Minor Attachments: D5895.1.patch Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470554#comment-13470554 ] Hadoop QA commented on HBASE-6797: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547179/hbase-6797-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3015//console This message is automatically generated. TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-6948: -- Attachment: (was: HBASE-6948-trunk.patch) shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-6948: -- Attachment: HBASE-6948-trunk.patch patch for trunk attached. shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianying Chang updated HBASE-6948: -- Attachment: HBASE-6948-trunk.patch shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch, HBASE-6948-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6948: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch, HBASE-6948-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470568#comment-13470568 ] Hadoop QA commented on HBASE-5582: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548008/HBASE-5582.patch1 against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3014//console This message is automatically generated. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6889) Ignore source control files with apache-rat
[ https://issues.apache.org/jira/browse/HBASE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6889: - Resolution: Fixed Fix Version/s: 0.96.0 0.94.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Thanks for the patch, Jesse. Ignore source control files with apache-rat --- Key: HBASE-6889 URL: https://issues.apache.org/jira/browse/HBASE-6889 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.2, 0.96.0 Attachments: hbase-6889-mvn-v0.patch Running 'mvn apache-rat:check' locally causes a failure because it finds the source control files, making it hard to check that you didn't include a file without a source header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470600#comment-13470600 ] Jean-Daniel Cryans commented on HBASE-5582: --- +1, going to commit. No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-5582: -- Resolution: Fixed Fix Version/s: 0.94.2 0.92.3 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92, 0.94, and trunk. Thanks for the patch Kevin! No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6724) Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92
[ https://issues.apache.org/jira/browse/HBASE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-6724: - Assignee: Ted Yu Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 -- Key: HBASE-6724 URL: https://issues.apache.org/jira/browse/HBASE-6724 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.3 Attachments: 6165.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470673#comment-13470673 ] Hadoop QA commented on HBASE-6948: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548026/HBASE-6948-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3016//console This message is automatically generated. shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch, HBASE-6948-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6948) shell create table script cannot handle split key which is expressed in raw bytes
[ https://issues.apache.org/jira/browse/HBASE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470675#comment-13470675 ] Ted Yu commented on HBASE-6948: --- {code} Running org.apache.hadoop.hbase.client.TestShell Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 97.498 sec {code} Failed test is not related to this patch. shell create table script cannot handle split key which is expressed in raw bytes - Key: HBASE-6948 URL: https://issues.apache.org/jira/browse/HBASE-6948 Project: HBase Issue Type: Bug Affects Versions: 0.92.2 Reporter: Ted Yu Attachments: HBASE-6948.patch, HBASE-6948-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6239) [replication] ReplicationSink uses the ts of the first KV for the other KVs in the same row
[ https://issues.apache.org/jira/browse/HBASE-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6239: -- Fix Version/s: (was: 0.90.8) [replication] ReplicationSink uses the ts of the first KV for the other KVs in the same row --- Key: HBASE-6239 URL: https://issues.apache.org/jira/browse/HBASE-6239 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Critical Labels: corruption Fix For: 0.92.2 Attachments: HBASE-6239-0.92-v1.patch ReplicationSink assumes that all the KVs for the same row inside a WALEdit will have the same timestamp, which is not necessarily the case. This only affects 0.90 and 0.92 since HBASE-5203 fixes it in 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6239) [replication] ReplicationSink uses the ts of the first KV for the other KVs in the same row
[ https://issues.apache.org/jira/browse/HBASE-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-6239. --- Resolution: Fixed [replication] ReplicationSink uses the ts of the first KV for the other KVs in the same row --- Key: HBASE-6239 URL: https://issues.apache.org/jira/browse/HBASE-6239 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Critical Labels: corruption Fix For: 0.92.2 Attachments: HBASE-6239-0.92-v1.patch ReplicationSink assumes that all the KVs for the same row inside a WALEdit will have the same timestamp, which is not necessarily the case. This only affects 0.90 and 0.92 since HBASE-5203 fixes it in 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6330) TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2]
[ https://issues.apache.org/jira/browse/HBASE-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6330: -- Fix Version/s: (was: 0.96.0) TestImportExport is no longer failing in HBase-TRUNK-on-Hadoop-2.0.0 See build #210. TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2] - Key: HBASE-6330 URL: https://issues.apache.org/jira/browse/HBASE-6330 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Critical Labels: hadoop-2.0 Fix For: 0.94.3 Attachments: hbase-6330-94.patch, hbase-6330-trunk.patch, hbase-6330-v2.patch See HBASE-5876. I'm going to commit the v3 patches under this name since there has been two months (my bad) since the first half was committed and found to be incomplte. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6824) Introduce ${hbase.local.dir} and save coprocessor jars there
[ https://issues.apache.org/jira/browse/HBASE-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470708#comment-13470708 ] Enis Soztutar commented on HBASE-6824: -- [~apurtell] Could you please take a look at this when you find time. Introduce ${hbase.local.dir} and save coprocessor jars there Key: HBASE-6824 URL: https://issues.apache.org/jira/browse/HBASE-6824 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-6824_v1-0.94.patch, hbase-6824_v1-trunk.patch, hbase-6824_v2-0.94.patch, hbase-6824_v2-trunk.patch We need to make the temp directory where coprocessor jars are saved configurable. For this we will add hbase.local.dir configuration parameter. Windows tests are failing due to the pathing problems for coprocessor jars: Two HBase TestClassLoading unit tests failed due to a failiure in loading the test file from HDFS: {code} testClassLoadingFromHDFS(org.apache.hadoop.hbase.coprocessor.TestClassLoading): Class TestCP1 was missing on a region testClassLoadingFromLibDirInJar(org.apache.hadoop.hbase.coprocessor.TestClassLoading): Class TestCP1 was missing on a region {code} The problem is that CoprocessorHost.load() copies the jar file locally, and schedules the local file to be deleted on exit, but calling FileSystem.deleteOnExit(). However, the filesystem is not the file system of the local file, it is the distributed file system, so on windows, the Path fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6957) TestRowCounter consistently fails against hadoop-2.0
Ted Yu created HBASE-6957: - Summary: TestRowCounter consistently fails against hadoop-2.0 Key: HBASE-6957 URL: https://issues.apache.org/jira/browse/HBASE-6957 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Fix For: 0.96.0 In https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/210/testReport/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterHiddenColumn/ , we can see: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.runRowCount(TestRowCounter.java:135) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.testRowCounterHiddenColumn(TestRowCounter.java:118) ... 2012-10-05 11:24:17,355 WARN [ContainersLauncher #1] launcher.ContainerLaunch(246): Failed to launch container. java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:355) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLogPathForWrite(LocalDirsHandlerService.java:268) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:126) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012-10-05 11:24:17,356 WARN [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_0/usercache/jenkins/appcache/application_1349436189156_0003/container_1349436189156_0003_01_02] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6957) TestRowCounter consistently fails against hadoop-2.0
[ https://issues.apache.org/jira/browse/HBASE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6957: -- Priority: Critical (was: Major) TestRowCounter consistently fails against hadoop-2.0 Key: HBASE-6957 URL: https://issues.apache.org/jira/browse/HBASE-6957 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Priority: Critical Fix For: 0.96.0 In https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/210/testReport/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterHiddenColumn/ , we can see: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.runRowCount(TestRowCounter.java:135) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.testRowCounterHiddenColumn(TestRowCounter.java:118) ... 2012-10-05 11:24:17,355 WARN [ContainersLauncher #1] launcher.ContainerLaunch(246): Failed to launch container. java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:355) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLogPathForWrite(LocalDirsHandlerService.java:268) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:126) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012-10-05 11:24:17,356 WARN [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_0/usercache/jenkins/appcache/application_1349436189156_0003/container_1349436189156_0003_01_02] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6957) TestRowCounter consistently fails against hadoop-2.0
[ https://issues.apache.org/jira/browse/HBASE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470722#comment-13470722 ] Ted Yu commented on HBASE-6957: --- I ran the test on: {code} Linux s0 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux {code} and it passed: {code} Running org.apache.hadoop.hbase.mapreduce.TestRowCounter Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.768 sec {code} TestRowCounter consistently fails against hadoop-2.0 Key: HBASE-6957 URL: https://issues.apache.org/jira/browse/HBASE-6957 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Priority: Critical Fix For: 0.96.0 In https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/210/testReport/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterHiddenColumn/ , we can see: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.runRowCount(TestRowCounter.java:135) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.testRowCounterHiddenColumn(TestRowCounter.java:118) ... 2012-10-05 11:24:17,355 WARN [ContainersLauncher #1] launcher.ContainerLaunch(246): Failed to launch container. java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:355) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLogPathForWrite(LocalDirsHandlerService.java:268) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:126) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012-10-05 11:24:17,356 WARN [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_0/usercache/jenkins/appcache/application_1349436189156_0003/container_1349436189156_0003_01_02] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-6797: --- Attachment: hbase-6797-v1.patch Updating patch onto trunk. TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6940) Enable GC logging by default
[ https://issues.apache.org/jira/browse/HBASE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470745#comment-13470745 ] Ted Yu commented on HBASE-6940: --- To do this, we need to specify the path to GC log: {code} -Xloggc:/apache/hbase/logs/gc-hbase.log {code} Should a variable, such as HBASE_GC_LOG_PATH, be introduced so that user can specify the location ? Enable GC logging by default Key: HBASE-6940 URL: https://issues.apache.org/jira/browse/HBASE-6940 Project: HBase Issue Type: Improvement Components: Admin Reporter: stack Priority: Critical Fix For: 0.96.0 I think we should enable gc by default. Its pretty frictionless apparently and could help in the case where folks are getting off the ground. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6940) Enable GC logging by default
[ https://issues.apache.org/jira/browse/HBASE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470750#comment-13470750 ] stack commented on HBASE-6940: -- We have a logging dir define already. We'll reuse that. Enable GC logging by default Key: HBASE-6940 URL: https://issues.apache.org/jira/browse/HBASE-6940 Project: HBase Issue Type: Improvement Components: Admin Reporter: stack Priority: Critical Fix For: 0.96.0 I think we should enable gc by default. Its pretty frictionless apparently and could help in the case where folks are getting off the ground. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5995) Fix and reenable TestLogRolling.testLogRollOnPipelineRestart
[ https://issues.apache.org/jira/browse/HBASE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470752#comment-13470752 ] Ted Yu commented on HBASE-5995: --- When I ran the test against hadoop 2.0, it failed with: {code} testLogRollOnPipelineRestart(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling) Time elapsed: 0.243 sec ERROR! java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1150895311-10.249.196.101-1349476630606:blk_7782056094701760427_1026; getBlockSize()=1472; corrupt=false; offset=0; locs=[127.0.0.1:44729, 127.0.0.1:38785]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:232) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:177) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:119) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:112) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:966) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:212) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:75) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1768) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:63) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1688) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1709) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:56) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:176) at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:82) at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnPipelineRestart(TestLogRolling.java:501) {code} Fix and reenable TestLogRolling.testLogRollOnPipelineRestart Key: HBASE-5995 URL: https://issues.apache.org/jira/browse/HBASE-5995 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.0 Reporter: stack Priority: Blocker Fix For: 0.96.0 HBASE-5984 disabled this flakey test (See the issue for more). This issue is about getting it enabled again. Made a blocker on 0.96.0 so it gets attention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6940) Enable GC logging by default
[ https://issues.apache.org/jira/browse/HBASE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6940: -- Attachment: 6940-trunk.txt How about this ? Enable GC logging by default Key: HBASE-6940 URL: https://issues.apache.org/jira/browse/HBASE-6940 Project: HBase Issue Type: Improvement Components: Admin Reporter: stack Priority: Critical Fix For: 0.96.0 Attachments: 6940-trunk.txt I think we should enable gc by default. Its pretty frictionless apparently and could help in the case where folks are getting off the ground. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470760#comment-13470760 ] Lars Hofhansl commented on HBASE-6920: -- ping On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470761#comment-13470761 ] Hudson commented on HBASE-5582: --- Integrated in HBase-TRUNK #3431 (See [https://builds.apache.org/job/HBase-TRUNK/3431/]) HBASE-5582 No HServerInfo found for should be a WARNING message (Kevin Odell via JD) (Revision 1394768) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6889) Ignore source control files with apache-rat
[ https://issues.apache.org/jira/browse/HBASE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470762#comment-13470762 ] Hudson commented on HBASE-6889: --- Integrated in HBase-TRUNK #3431 (See [https://builds.apache.org/job/HBase-TRUNK/3431/]) HBASE-6889 Ignore source control files with apache-rat (Jesse Yates) (Revision 1394735) Result = FAILURE larsh : Files : * /hbase/trunk/pom.xml Ignore source control files with apache-rat --- Key: HBASE-6889 URL: https://issues.apache.org/jira/browse/HBASE-6889 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.2, 0.96.0 Attachments: hbase-6889-mvn-v0.patch Running 'mvn apache-rat:check' locally causes a failure because it finds the source control files, making it hard to check that you didn't include a file without a source header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470770#comment-13470770 ] Gregory Chanan commented on HBASE-6920: --- Sorry, looks good, going to check in to 0.94.2 soon. I need to investigate if this is a problem with 0.96 On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470779#comment-13470779 ] Gregory Chanan commented on HBASE-6920: --- Thanks for the reviews, Ted and Lars. Committed to 0.94. Not closing until I investigate trunk. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470784#comment-13470784 ] Lars Hofhansl commented on HBASE-6920: -- Thanks Gregory! ... Will need to resolve temporarily to get correct release notes for 0.94.2 will reopen soon after (or should 0.96 receive a separate issue now?) On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470792#comment-13470792 ] Jesse Yates commented on HBASE-6797: [~lhofhansl] if you can dig it, can you commit it? I ran the test locally (jdk 1.7) 20x, no failures. TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470794#comment-13470794 ] Hadoop QA commented on HBASE-6797: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548056/hbase-6797-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3017//console This message is automatically generated. TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470806#comment-13470806 ] Hudson commented on HBASE-5582: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #211 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/211/]) HBASE-5582 No HServerInfo found for should be a WARNING message (Kevin Odell via JD) (Revision 1394768) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6889) Ignore source control files with apache-rat
[ https://issues.apache.org/jira/browse/HBASE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470807#comment-13470807 ] Hudson commented on HBASE-6889: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #211 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/211/]) HBASE-6889 Ignore source control files with apache-rat (Jesse Yates) (Revision 1394735) Result = FAILURE larsh : Files : * /hbase/trunk/pom.xml Ignore source control files with apache-rat --- Key: HBASE-6889 URL: https://issues.apache.org/jira/browse/HBASE-6889 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.2, 0.96.0 Attachments: hbase-6889-mvn-v0.patch Running 'mvn apache-rat:check' locally causes a failure because it finds the source control files, making it hard to check that you didn't include a file without a source header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470818#comment-13470818 ] Hudson commented on HBASE-5582: --- Integrated in HBase-0.92 #595 (See [https://builds.apache.org/job/HBase-0.92/595/]) HBASE-5582 No HServerInfo found for should be a WARNING message (Kevin Odell via JD) (Revision 1394766) HBASE-5582 No HServerInfo found for should be a WARNING message (Revision 1394765) Result = FAILURE jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt jdcryans : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6330) TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2]
[ https://issues.apache.org/jira/browse/HBASE-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6330: -- Fix Version/s: 0.96.0 TestImportExport failed in HBase-TRUNK-on-Hadoop-2.0.0 #211 TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2] - Key: HBASE-6330 URL: https://issues.apache.org/jira/browse/HBASE-6330 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Critical Labels: hadoop-2.0 Fix For: 0.94.3, 0.96.0 Attachments: hbase-6330-94.patch, hbase-6330-trunk.patch, hbase-6330-v2.patch See HBASE-5876. I'm going to commit the v3 patches under this name since there has been two months (my bad) since the first half was committed and found to be incomplte. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470826#comment-13470826 ] Ted Yu commented on HBASE-6797: --- Integrated to trunk. Thanks for the patch, Jesse. Thanks for the review, Lars. TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470859#comment-13470859 ] Hudson commented on HBASE-6920: --- Integrated in HBase-0.94 #509 (See [https://builds.apache.org/job/HBase-0.94/509/]) HBASE-6920 On timeout connecting to master, client can get stuck and never make progress (Revision 1394857) Result = FAILURE gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestClientTimeouts.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/RandomTimeoutRpcEngine.java On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6958) TestAssignmentManager fails in trunk
Ted Yu created HBASE-6958: - Summary: TestAssignmentManager fails in trunk Key: HBASE-6958 URL: https://issues.apache.org/jira/browse/HBASE-6958 Project: HBase Issue Type: Bug Reporter: Ted Yu From https://builds.apache.org/job/HBase-TRUNK/3432/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManager/testBalanceOnMasterFailoverScenarioWithOpenedNode/ : {code} Stacktrace java.lang.Exception: test timed out after 5000 milliseconds at java.lang.System.arraycopy(Native Method) at java.lang.ThreadGroup.remove(ThreadGroup.java:969) at java.lang.ThreadGroup.threadTerminated(ThreadGroup.java:942) at java.lang.Thread.exit(Thread.java:732) ... 2012-10-06 00:46:12,521 DEBUG [MASTER_CLOSE_REGION-mockedAMExecutor-0] zookeeper.ZKUtil(1141): mockedServer-0x13a33892de7000e Retrieved 81 byte(s) of data from znode /hbase/unassigned/dc01abf9cd7fd0ea256af4df02811640 and set watcher; region=t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640., state=M_ZK_REGION_OFFLINE, servername=master,1,1, createTime=1349484372509, payload.length=0 2012-10-06 00:46:12,522 ERROR [MASTER_CLOSE_REGION-mockedAMExecutor-0] executor.EventHandler(205): Caught throwable while processing event RS_ZK_REGION_CLOSED java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestAssignmentManager$MockedLoadBalancer.randomAssignment(TestAssignmentManager.java:773) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1709) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1666) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1435) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1155) at org.apache.hadoop.hbase.master.TestAssignmentManager$AssignmentManagerWithExtrasForTesting.assign(TestAssignmentManager.java:1035) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1130) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1125) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-10-06 00:46:12,522 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(670): Handling transition=M_ZK_REGION_OFFLINE, server=master,1,1, region=dc01abf9cd7fd0ea256af4df02811640, current state from region state map ={t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640. state=OFFLINE, ts=1349484372508, server=null} {code} Looks like NPE happened on this line: {code} this.gate.set(true); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6958) TestAssignmentManager fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6958: -- Fix Version/s: 0.96.0 TestAssignmentManager fails in trunk Key: HBASE-6958 URL: https://issues.apache.org/jira/browse/HBASE-6958 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.96.0 From https://builds.apache.org/job/HBase-TRUNK/3432/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManager/testBalanceOnMasterFailoverScenarioWithOpenedNode/ : {code} Stacktrace java.lang.Exception: test timed out after 5000 milliseconds at java.lang.System.arraycopy(Native Method) at java.lang.ThreadGroup.remove(ThreadGroup.java:969) at java.lang.ThreadGroup.threadTerminated(ThreadGroup.java:942) at java.lang.Thread.exit(Thread.java:732) ... 2012-10-06 00:46:12,521 DEBUG [MASTER_CLOSE_REGION-mockedAMExecutor-0] zookeeper.ZKUtil(1141): mockedServer-0x13a33892de7000e Retrieved 81 byte(s) of data from znode /hbase/unassigned/dc01abf9cd7fd0ea256af4df02811640 and set watcher; region=t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640., state=M_ZK_REGION_OFFLINE, servername=master,1,1, createTime=1349484372509, payload.length=0 2012-10-06 00:46:12,522 ERROR [MASTER_CLOSE_REGION-mockedAMExecutor-0] executor.EventHandler(205): Caught throwable while processing event RS_ZK_REGION_CLOSED java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestAssignmentManager$MockedLoadBalancer.randomAssignment(TestAssignmentManager.java:773) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1709) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1666) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1435) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1155) at org.apache.hadoop.hbase.master.TestAssignmentManager$AssignmentManagerWithExtrasForTesting.assign(TestAssignmentManager.java:1035) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1130) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1125) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-10-06 00:46:12,522 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(670): Handling transition=M_ZK_REGION_OFFLINE, server=master,1,1, region=dc01abf9cd7fd0ea256af4df02811640, current state from region state map ={t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640. state=OFFLINE, ts=1349484372508, server=null} {code} Looks like NPE happened on this line: {code} this.gate.set(true); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5995) Fix and reenable TestLogRolling.testLogRollOnPipelineRestart
[ https://issues.apache.org/jira/browse/HBASE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-5995: -- Issue Type: Sub-task (was: Task) Parent: HBASE-6891 Fix and reenable TestLogRolling.testLogRollOnPipelineRestart Key: HBASE-5995 URL: https://issues.apache.org/jira/browse/HBASE-5995 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.96.0 Reporter: stack Priority: Blocker Fix For: 0.96.0 HBASE-5984 disabled this flakey test (See the issue for more). This issue is about getting it enabled again. Made a blocker on 0.96.0 so it gets attention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5995) Fix and reenable TestLogRolling.testLogRollOnPipelineRestart
[ https://issues.apache.org/jira/browse/HBASE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470872#comment-13470872 ] Andrew Purtell commented on HBASE-5995: --- This fails consistently against Hadoop 2. Fix and reenable TestLogRolling.testLogRollOnPipelineRestart Key: HBASE-5995 URL: https://issues.apache.org/jira/browse/HBASE-5995 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.96.0 Reporter: stack Priority: Blocker Fix For: 0.96.0 HBASE-5984 disabled this flakey test (See the issue for more). This issue is about getting it enabled again. Made a blocker on 0.96.0 so it gets attention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6958) TestAssignmentManager sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6958: -- Summary: TestAssignmentManager sometimes fails in trunk (was: TestAssignmentManager fails in trunk) TestAssignmentManager sometimes fails in trunk -- Key: HBASE-6958 URL: https://issues.apache.org/jira/browse/HBASE-6958 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.96.0 From https://builds.apache.org/job/HBase-TRUNK/3432/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManager/testBalanceOnMasterFailoverScenarioWithOpenedNode/ : {code} Stacktrace java.lang.Exception: test timed out after 5000 milliseconds at java.lang.System.arraycopy(Native Method) at java.lang.ThreadGroup.remove(ThreadGroup.java:969) at java.lang.ThreadGroup.threadTerminated(ThreadGroup.java:942) at java.lang.Thread.exit(Thread.java:732) ... 2012-10-06 00:46:12,521 DEBUG [MASTER_CLOSE_REGION-mockedAMExecutor-0] zookeeper.ZKUtil(1141): mockedServer-0x13a33892de7000e Retrieved 81 byte(s) of data from znode /hbase/unassigned/dc01abf9cd7fd0ea256af4df02811640 and set watcher; region=t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640., state=M_ZK_REGION_OFFLINE, servername=master,1,1, createTime=1349484372509, payload.length=0 2012-10-06 00:46:12,522 ERROR [MASTER_CLOSE_REGION-mockedAMExecutor-0] executor.EventHandler(205): Caught throwable while processing event RS_ZK_REGION_CLOSED java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestAssignmentManager$MockedLoadBalancer.randomAssignment(TestAssignmentManager.java:773) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1709) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1666) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1435) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1155) at org.apache.hadoop.hbase.master.TestAssignmentManager$AssignmentManagerWithExtrasForTesting.assign(TestAssignmentManager.java:1035) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1130) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1125) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-10-06 00:46:12,522 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(670): Handling transition=M_ZK_REGION_OFFLINE, server=master,1,1, region=dc01abf9cd7fd0ea256af4df02811640, current state from region state map ={t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640. state=OFFLINE, ts=1349484372508, server=null} {code} Looks like NPE happened on this line: {code} this.gate.set(true); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6920: - Resolution: Fixed Status: Resolved (was: Patch Available) On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6958) TestAssignmentManager fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470874#comment-13470874 ] Ted Yu commented on HBASE-6958: --- Making timeout longer, I found that there was trouble scanning .META. Here is jstack: {code} RunAmJoinCluster prio=5 tid=0x7ffc32041000 nid=0x6d03 waiting on condition [0x000113fd6000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:194) at org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:371) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:218) at org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:127) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:668) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:567) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:181) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:141) at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2163) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:320) at org.apache.hadoop.hbase.master.TestAssignmentManager$2.run(TestAssignmentManager.java:1087) {code} TestAssignmentManager fails in trunk Key: HBASE-6958 URL: https://issues.apache.org/jira/browse/HBASE-6958 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.96.0 From https://builds.apache.org/job/HBase-TRUNK/3432/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManager/testBalanceOnMasterFailoverScenarioWithOpenedNode/ : {code} Stacktrace java.lang.Exception: test timed out after 5000 milliseconds at java.lang.System.arraycopy(Native Method) at java.lang.ThreadGroup.remove(ThreadGroup.java:969) at java.lang.ThreadGroup.threadTerminated(ThreadGroup.java:942) at java.lang.Thread.exit(Thread.java:732) ... 2012-10-06 00:46:12,521 DEBUG [MASTER_CLOSE_REGION-mockedAMExecutor-0] zookeeper.ZKUtil(1141): mockedServer-0x13a33892de7000e Retrieved 81 byte(s) of data from znode /hbase/unassigned/dc01abf9cd7fd0ea256af4df02811640 and set watcher; region=t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640., state=M_ZK_REGION_OFFLINE, servername=master,1,1, createTime=1349484372509, payload.length=0 2012-10-06 00:46:12,522 ERROR [MASTER_CLOSE_REGION-mockedAMExecutor-0] executor.EventHandler(205): Caught throwable while processing event RS_ZK_REGION_CLOSED java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestAssignmentManager$MockedLoadBalancer.randomAssignment(TestAssignmentManager.java:773) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1709) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1666) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1435) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1155) at org.apache.hadoop.hbase.master.TestAssignmentManager$AssignmentManagerWithExtrasForTesting.assign(TestAssignmentManager.java:1035) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1130) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1125) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-10-06 00:46:12,522 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(670): Handling transition=M_ZK_REGION_OFFLINE, server=master,1,1, region=dc01abf9cd7fd0ea256af4df02811640, current state from region state map ={t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640. state=OFFLINE, ts=1349484372508, server=null} {code} Looks like NPE happened on this line: {code} this.gate.set(true); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6900) RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
[ https://issues.apache.org/jira/browse/HBASE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470875#comment-13470875 ] Hudson commented on HBASE-6900: --- Integrated in HBase-0.94-security #60 (See [https://builds.apache.org/job/HBase-0.94-security/60/]) HBASE-6900 RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. (Revision 1394378) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. Key: HBASE-6900 URL: https://issues.apache.org/jira/browse/HBASE-6900 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2, 0.96.0 Attachments: 6900-test.txt, HBASE-6900_1.patch, HBASE-6900.patch HBASE-5520 introduced reseek() on the RegionScanner. Now when a scanner is created we have the StoreScanner heap. After this if a flush or compaction happens parallely all the StoreScannerObservers are cleared so that whenever a new next() call happens we tend to recreate the scanner based on the latest store files. The reseek() in StoreScanner expects the heap not to be null because always reseek would be called from next() {code} public synchronized boolean reseek(KeyValue kv) throws IOException { //Heap cannot be null, because this is only called from next() which //guarantees that heap will never be null before this call. if (explicitColumnQuery lazySeekEnabledGlobally) { return heap.requestSeek(kv, true, useRowColBloom); } else { return heap.reseek(kv); } } {code} Now when we call RegionScanner.reseek() directly using CPs we tend to get a NPE. In our case it happened when a major compaction was going on. I will also attach a testcase to show the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470876#comment-13470876 ] Hudson commented on HBASE-6920: --- Integrated in HBase-0.94-security #60 (See [https://builds.apache.org/job/HBase-0.94-security/60/]) HBASE-6920 On timeout connecting to master, client can get stuck and never make progress (Revision 1394857) Result = FAILURE gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestClientTimeouts.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/ipc/RandomTimeoutRpcEngine.java On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5582) No HServerInfo found for should be a WARNING message
[ https://issues.apache.org/jira/browse/HBASE-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470877#comment-13470877 ] Hudson commented on HBASE-5582: --- Integrated in HBase-0.94-security #60 (See [https://builds.apache.org/job/HBase-0.94-security/60/]) HBASE-5582 No HServerInfo found for should be a WARNING message (Kevin Odell via JD) (Revision 1394767) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java No HServerInfo found for should be a WARNING message -- Key: HBASE-5582 URL: https://issues.apache.org/jira/browse/HBASE-5582 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Shrijeet Paliwal Assignee: Kevin Odell Priority: Trivial Labels: newbie Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-5582.patch, HBASE-5582.patch1 The message from RegionServerTracker No HServerInfo found for... is easy to miss. It should not be INFO. From irc chat {noformat} jdcryans JohnP789: can you grep for No HServerInfo found for in that log? jdcryans wait I see it jdcryans ok there's your problem shrijeet_ Yes it is there shrijeet_ jdcryans: it should be INFO, why? jdcryans it shouldn't be INFO, it's so easy to miss jdcryans it's not the first time we have to look super closely to figure this one out shrijeet_ yes , I will file a jira jdcryans in any case it's a mismatch in that machine's DNS config shrijeet_ anyways JohnP789 is waiting :) go on JohnP789 haha! JohnP789 yes... ??? :-) jdcryans the master is expecting a RS called localhost.localdomain,53875,1328924863478 17:26 jdcryans but the RS calls itself localhost,53875,1328924863478 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6889) Ignore source control files with apache-rat
[ https://issues.apache.org/jira/browse/HBASE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470878#comment-13470878 ] Hudson commented on HBASE-6889: --- Integrated in HBase-0.94-security #60 (See [https://builds.apache.org/job/HBase-0.94-security/60/]) HBASE-6889 Ignore source control files with apache-rat (Jesse Yates) (Revision 1394736) Result = FAILURE larsh : Files : * /hbase/branches/0.94/pom.xml Ignore source control files with apache-rat --- Key: HBASE-6889 URL: https://issues.apache.org/jira/browse/HBASE-6889 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.2, 0.96.0 Attachments: hbase-6889-mvn-v0.patch Running 'mvn apache-rat:check' locally causes a failure because it finds the source control files, making it hard to check that you didn't include a file without a source header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470883#comment-13470883 ] Hudson commented on HBASE-6797: --- Integrated in HBase-TRUNK #3433 (See [https://builds.apache.org/job/HBase-TRUNK/3433/]) HBASE-6797 TestHFileCleaner#testHFileCleaning sometimes fails in trunk (Jesse) (Revision 1394875) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/TimeToLiveHFileCleaner.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestHFileCleaner.java TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6958) TestAssignmentManager sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470884#comment-13470884 ] Ted Yu commented on HBASE-6958: --- Here is my environment: Darwin L0032 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 java version 1.7.0_07 Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) TestAssignmentManager sometimes fails in trunk -- Key: HBASE-6958 URL: https://issues.apache.org/jira/browse/HBASE-6958 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.96.0 From https://builds.apache.org/job/HBase-TRUNK/3432/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManager/testBalanceOnMasterFailoverScenarioWithOpenedNode/ : {code} Stacktrace java.lang.Exception: test timed out after 5000 milliseconds at java.lang.System.arraycopy(Native Method) at java.lang.ThreadGroup.remove(ThreadGroup.java:969) at java.lang.ThreadGroup.threadTerminated(ThreadGroup.java:942) at java.lang.Thread.exit(Thread.java:732) ... 2012-10-06 00:46:12,521 DEBUG [MASTER_CLOSE_REGION-mockedAMExecutor-0] zookeeper.ZKUtil(1141): mockedServer-0x13a33892de7000e Retrieved 81 byte(s) of data from znode /hbase/unassigned/dc01abf9cd7fd0ea256af4df02811640 and set watcher; region=t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640., state=M_ZK_REGION_OFFLINE, servername=master,1,1, createTime=1349484372509, payload.length=0 2012-10-06 00:46:12,522 ERROR [MASTER_CLOSE_REGION-mockedAMExecutor-0] executor.EventHandler(205): Caught throwable while processing event RS_ZK_REGION_CLOSED java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestAssignmentManager$MockedLoadBalancer.randomAssignment(TestAssignmentManager.java:773) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1709) at org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:1666) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1435) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1155) at org.apache.hadoop.hbase.master.TestAssignmentManager$AssignmentManagerWithExtrasForTesting.assign(TestAssignmentManager.java:1035) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1130) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1125) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-10-06 00:46:12,522 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(670): Handling transition=M_ZK_REGION_OFFLINE, server=master,1,1, region=dc01abf9cd7fd0ea256af4df02811640, current state from region state map ={t,,1349484359011.dc01abf9cd7fd0ea256af4df02811640. state=OFFLINE, ts=1349484372508, server=null} {code} Looks like NPE happened on this line: {code} this.gate.set(true); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470885#comment-13470885 ] Lars Hofhansl commented on HBASE-6920: -- Looks like this is breaking the security build. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6920: - Attachment: 6920-addendum.txt Addendum to fix the SecureRpcEngine On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: 6920-addendum.txt, HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470888#comment-13470888 ] Lars Hofhansl commented on HBASE-6920: -- Committed addendum to 0.94 On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: 6920-addendum.txt, HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470892#comment-13470892 ] Hudson commented on HBASE-6920: --- Integrated in HBase-0.94-security #62 (See [https://builds.apache.org/job/HBase-0.94-security/62/]) HBASE-6920 Addendum - fix SecureRpcEngine (Revision 1394908) Result = FAILURE larsh : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureRpcEngine.java On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: 6920-addendum.txt, HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470894#comment-13470894 ] Hudson commented on HBASE-6920: --- Integrated in HBase-0.94 #511 (See [https://builds.apache.org/job/HBase-0.94/511/]) HBASE-6920 Addendum - fix SecureRpcEngine (Revision 1394908) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureRpcEngine.java On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: 6920-addendum.txt, HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira