[jira] [Commented] (HADOOP-17905) Modify Text.ensureCapacity() to efficiently max out the backing array size
[ https://issues.apache.org/jira/browse/HADOOP-17905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413527#comment-17413527 ] Peter Bacsko commented on HADOOP-17905: --- [~elgoiri] PR is available for review. cc [~snemeth]. > Modify Text.ensureCapacity() to efficiently max out the backing array size > -- > > Key: HADOOP-17905 > URL: https://issues.apache.org/jira/browse/HADOOP-17905 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is a continuation of HADOOP-17901. > Right now we use a factor of 1.5x to increase the byte array if it's full. > However, if the size reaches a certain point, the increment is only (current > size + length). This can cause performance issues if the textual data which > we intend to store is beyond this point. > Instead, let's max out the array to the maximum. Based on different sources, > a safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, > AbstractCollection, HashTable, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17905) Modify Text.ensureCapacity() to efficiently max out the backing array size
[ https://issues.apache.org/jira/browse/HADOOP-17905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-17905: -- Description: This is a continuation of HADOOP-17901. Right now we use a factor of 1.5x to increase the byte array if it's full. However, if the size reaches a certain point, the increment is only (current size + length). This can cause performance issues if the textual data which we intend to store is beyond this point. Instead, let's max out the array to the maximum. Based on different sources, a safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, AbstractCollection, HashTable, etc). was: This is a continuation of HADOOP-17901. Right now we use a factor of 1.5x to increase the byte array if it's full. However, if the size reaches a certain point, the increment is only (current size + length). This can cause performance issues if the textual data which we intend to store is beyond this point. Instead, let's max out the array to the maximum. Based on different sources, this is usually determined to be Integer.MAX_VALUE - 8 (see ArrayList, AbstractCollection, HashTable, etc). > Modify Text.ensureCapacity() to efficiently max out the backing array size > -- > > Key: HADOOP-17905 > URL: https://issues.apache.org/jira/browse/HADOOP-17905 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > This is a continuation of HADOOP-17901. > Right now we use a factor of 1.5x to increase the byte array if it's full. > However, if the size reaches a certain point, the increment is only (current > size + length). This can cause performance issues if the textual data which > we intend to store is beyond this point. > Instead, let's max out the array to the maximum. Based on different sources, > a safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, > AbstractCollection, HashTable, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17905) Modify Text.ensureCapacity() to efficiently max out the backing array size
Peter Bacsko created HADOOP-17905: - Summary: Modify Text.ensureCapacity() to efficiently max out the backing array size Key: HADOOP-17905 URL: https://issues.apache.org/jira/browse/HADOOP-17905 Project: Hadoop Common Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko This is a continuation of HADOOP-17901. Right now we use a factor of 1.5x to increase the byte array if it's full. However, if the size reaches a certain point, the increment is only (current size + length). This can cause performance issues if the textual data which we intend to store is beyond this point. Instead, let's max out the array to the maximum. Based on different sources, this is usually determined to be Integer.MAX_VALUE - 8 (see ArrayList, AbstractCollection, HashTable, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
[ https://issues.apache.org/jira/browse/HADOOP-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412483#comment-17412483 ] Peter Bacsko commented on HADOOP-17901: --- Thanks [~elgoiri]. I was also thinking about the possible expansion of the array to the max size (which is often assumed to be Integer.MAX_VALUE - 8), but I think I'll do that in a different JIRA. > Performance degradation in Text.append() after HADOOP-16951 > --- > > Key: HADOOP-17901 > URL: https://issues.apache.org/jira/browse/HADOOP-17901 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Labels: pull-request-available > Attachments: HADOOP-17901-001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We discovered a serious performance degradation in {{Text.append()}}. > The problem is that the logic which intends to increase the size of the > backing array does not work as intended. > It's very difficult to spot, so I added extra logs to see what happens. > Let's add 4096 bytes of textual data in a loop: > {noformat} > public static void main(String[] args) { > Text text = new Text(); > String toAppend = RandomStringUtils.randomAscii(4096); > for(int i = 0; i < 100; i++) { > text.append(toAppend.getBytes(), 0, 4096); > } > } > {noformat} > With some debug printouts, we can observe: > {noformat} > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length > + len: 28672 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 30720 to 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 32768 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 36864 to 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - > length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 49152 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 36864 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 43008 to 49152 > ... > {noformat} > After a certain number of {{append()}} calls, subsequent capacity increments > are small. > It's because the difference between two {{length + (length >> 1)}} values is > always 6144 bytes. Because the size of the backing array is trailing behind > the calculated value, the increment will also be 6144 bytes. This means that > new arrays are constantly created. > Suggested solution: don't calculate the capacity in advance based on length. > Instead, pass the required minimum to {{ensureCapacity()}}. Then the > increment should depend on the actual size of the byte array if the desired > capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
[ https://issues.apache.org/jira/browse/HADOOP-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411883#comment-17411883 ] Peter Bacsko commented on HADOOP-17901: --- cc [~belugabehr] [~elgoiri] you guys worked on the related ticket, could you review this? > Performance degradation in Text.append() after HADOOP-16951 > --- > > Key: HADOOP-17901 > URL: https://issues.apache.org/jira/browse/HADOOP-17901 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: HADOOP-17901-001.patch > > > We discovered a serious performance degradation in {{Text.append()}}. > The problem is that the logic which intends to increase the size of the > backing array does not work as intended. > It's very difficult to spot, so I added extra logs to see what happens. > Let's add 4096 bytes of textual data in a loop: > {noformat} > public static void main(String[] args) { > Text text = new Text(); > String toAppend = RandomStringUtils.randomAscii(4096); > for(int i = 0; i < 100; i++) { > text.append(toAppend.getBytes(), 0, 4096); > } > } > {noformat} > With some debug printouts, we can observe: > {noformat} > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length > + len: 28672 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 30720 to 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 32768 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 36864 to 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - > length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 49152 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 36864 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 43008 to 49152 > ... > {noformat} > After a certain number of {{append()}} calls, subsequent capacity increments > are small. > It's because the difference between two {{length + (length >> 1)}} values is > always 6144 bytes. Because the size of the backing array is trailing behind > the calculated value, the increment will also be 6144 bytes. This means that > new arrays are constantly created. > Suggested solution: don't calculate the capacity in advance based on length. > Instead, pass the required minimum to {{ensureCapacity()}}. Then the > increment should depend on the actual size of the byte array if the desired > capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
[ https://issues.apache.org/jira/browse/HADOOP-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-17901: -- Status: Patch Available (was: Open) > Performance degradation in Text.append() after HADOOP-16951 > --- > > Key: HADOOP-17901 > URL: https://issues.apache.org/jira/browse/HADOOP-17901 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: HADOOP-17901-001.patch > > > We discovered a serious performance degradation in {{Text.append()}}. > The problem is that the logic which intends to increase the size of the > backing array does not work as intended. > It's very difficult to spot, so I added extra logs to see what happens. > Let's add 4096 bytes of textual data in a loop: > {noformat} > public static void main(String[] args) { > Text text = new Text(); > String toAppend = RandomStringUtils.randomAscii(4096); > for(int i = 0; i < 100; i++) { > text.append(toAppend.getBytes(), 0, 4096); > } > } > {noformat} > With some debug printouts, we can observe: > {noformat} > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length > + len: 28672 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 30720 to 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 32768 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 36864 to 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - > length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 49152 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 36864 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 43008 to 49152 > ... > {noformat} > After a certain number of {{append()}} calls, subsequent capacity increments > are small. > It's because the difference between two {{length + (length >> 1)}} values is > always 6144 bytes. Because the size of the backing array is trailing behind > the calculated value, the increment will also be 6144 bytes. This means that > new arrays are constantly created. > Suggested solution: don't calculate the capacity in advance based on length. > Instead, pass the required minimum to {{ensureCapacity()}}. Then the > increment should depend on the actual size of the byte array if the desired > capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
[ https://issues.apache.org/jira/browse/HADOOP-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-17901: -- Attachment: HADOOP-17901-001.patch > Performance degradation in Text.append() after HADOOP-16951 > --- > > Key: HADOOP-17901 > URL: https://issues.apache.org/jira/browse/HADOOP-17901 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: HADOOP-17901-001.patch > > > We discovered a serious performance degradation in {{Text.append()}}. > The problem is that the logic which intends to increase the size of the > backing array does not work as intended. > It's very difficult to spot, so I added extra logs to see what happens. > Let's add 4096 bytes of textual data in a loop: > {noformat} > public static void main(String[] args) { > Text text = new Text(); > String toAppend = RandomStringUtils.randomAscii(4096); > for(int i = 0; i < 100; i++) { > text.append(toAppend.getBytes(), 0, 4096); > } > } > {noformat} > With some debug printouts, we can observe: > {noformat} > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length > + len: 28672 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 30720 to 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - > length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 > 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 32768 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 36864 to 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - > length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length > + (length >> 1): 49152 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length > + len: 36864 > 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) > - >>> enhancing capacity from 43008 to 49152 > ... > {noformat} > After a certain number of {{append()}} calls, subsequent capacity increments > are small. > It's because the difference between two {{length + (length >> 1)}} values is > always 6144 bytes. Because the size of the backing array is trailing behind > the calculated value, the increment will also be 6144 bytes. This means that > new arrays are constantly created. > Suggested solution: don't calculate the capacity in advance based on length. > Instead, pass the required minimum to {{ensureCapacity()}}. Then the > increment should depend on the actual size of the byte array if the desired > capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
Peter Bacsko created HADOOP-17901: - Summary: Performance degradation in Text.append() after HADOOP-16951 Key: HADOOP-17901 URL: https://issues.apache.org/jira/browse/HADOOP-17901 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Peter Bacsko Assignee: Peter Bacsko We discovered a serious performance degradation in {{Text.append()}}. The problem is that the logic which intends to increase the size of the backing array does not work as intended. It's very difficult to spot, so I added extra logs to see what happens. Let's add 4096 bytes of textual data in a loop: {noformat} public static void main(String[] args) { Text text = new Text(); String toAppend = RandomStringUtils.randomAscii(4096); for(int i = 0; i < 100; i++) { text.append(toAppend.getBytes(), 0, 4096); } } {noformat} With some debug printouts, we can observe: {noformat} 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length + len: 28672 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 30720 to 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length + len: 32768 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 36864 to 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 49152 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length + len: 36864 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 43008 to 49152 ... {noformat} After a certain number of {{append()}} calls, subsequent capacity increments are small. It's because the difference between two {{length + (length >> 1)}} values is always 6144 bytes. Because the size of the backing array is trailing behind the calculated value, the increment will also be 6144 bytes. This means that new arrays are constantly created. Suggested solution: don't calculate the capacity in advance based on length. Instead, pass the required minimum to {{ensureCapacity()}}. Then the increment should depend on the actual size of the byte array if the desired capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17573) Fix compilation error of OBSFileSystem in trunk
[ https://issues.apache.org/jira/browse/HADOOP-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298800#comment-17298800 ] Peter Bacsko commented on HADOOP-17573: --- I've just seen this problem. The PR fixed the build locally. +1 from me. > Fix compilation error of OBSFileSystem in trunk > --- > > Key: HADOOP-17573 > URL: https://issues.apache.org/jira/browse/HADOOP-17573 > Project: Hadoop Common > Issue Type: Bug >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project hadoop-huaweicloud: Compilation failure > [ERROR] > /home/centos/srcs/hadoop/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileSystem.java:[396,58] > incompatible types: org.apache.hadoop.util.BlockingThreadPoolExecutorService > cannot be converted to > org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListeningExecutorService > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17324) Don't relocate org.bouncycastle in shaded client jars
[ https://issues.apache.org/jira/browse/HADOOP-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230071#comment-17230071 ] Peter Bacsko commented on HADOOP-17324: --- [~csun] I think the commit https://github.com/apache/hadoop/commit/2522bf2f9b0c720eab099fef27bd3d22460ad5d0 introduced a compilation problem: {noformat} [INFO] Apache Hadoop Client Aggregator SKIPPED [INFO] Apache Hadoop Client API ... SKIPPED [INFO] Apache Hadoop Client Runtime ... SKIPPED [INFO] Apache Hadoop Client Test Minicluster .. SKIPPED [INFO] Apache Hadoop Client Packaging Invariants .. SKIPPED [INFO] Apache Hadoop Client Packaging Invariants for Test . SKIPPED [INFO] Apache Hadoop Client Packaging Integration Tests ... FAILURE [ 2.050 s] [INFO] Apache Hadoop Client Modules 3.4.0-SNAPSHOT SUCCESS [ 1.455 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 3.012 s (Wall Clock) [INFO] Finished at: 2020-11-11T17:29:17+01:00 [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project hadoop-client-integration-tests: Compilation failure: Compilation failure: [ERROR] /home/bacskop/repos/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37] package org.apache.hadoop.yarn.server does not exist [ERROR] /home/bacskop/repos/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11] cannot find symbol [ERROR] symbol: class MiniYARNCluster [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster [ERROR] /home/bacskop/repos/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23] cannot find symbol [ERROR] symbol: class MiniYARNCluster [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-client-integration-tests {noformat} Could you please investigate this? > Don't relocate org.bouncycastle in shaded client jars > - > > Key: HADOOP-17324 > URL: https://issues.apache.org/jira/browse/HADOOP-17324 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Critical > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > When downstream apps depend on {{hadoop-client-api}}, > {{hadoop-client-runtime}} and {{hadoop-client-minicluster}}, it seems the > {{MiniYARNCluster}} could have issue because > {{org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException}} > is not in any of the above jars. > {code} > Error: Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException > Error:at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > Error:at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:862) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1296) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:353) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$200(MiniYARNCluster.java:127) >
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975157#comment-16975157 ] Peter Bacsko commented on HADOOP-16683: --- [~adam.antal] we don't have build results for branch-3.2. The trick is to upload a patch, wait until the build starts, then upload the next one. > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch, HADOOP-16683.branch-3.1.001.patch, > HADOOP-16683.branch-3.2.001.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968412#comment-16968412 ] Peter Bacsko commented on HADOOP-16683: --- +1 (non-binding) > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968247#comment-16968247 ] Peter Bacsko commented on HADOOP-16683: --- Just a question: do we know for sure that {{AccessControlException}} can only be wrapped inside an {{IOException}}? Perhaps we should guard ourselves more aggressively, examining the entire exception-chain (I believe this is what I did with {{SaslException}}). > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967666#comment-16967666 ] Peter Bacsko commented on HADOOP-16683: --- [~adam.antal] I think the return value of this function should be {{boolean}} instead of {{Throwable}}, because you don't really do much with the returned object: {{private static Throwable getWrappedAccessControlException(Exception e)}} > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16683.001.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935841#comment-16935841 ] Peter Bacsko commented on HADOOP-16580: --- +1 (non-binding) > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16211) Update guava to 27.0-jre in hadoop-project branch-3.2
[ https://issues.apache.org/jira/browse/HADOOP-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863090#comment-16863090 ] Peter Bacsko commented on HADOOP-16211: --- On my machine, {{TestTimelineReaderWebServicesHBaseStorage}} is so bad that every single testcase fails. Looking at the Jenkins build result, the same thing happened. Wow, that's really bad. Created JIRA: YARN-9622 > Update guava to 27.0-jre in hadoop-project branch-3.2 > - > > Key: HADOOP-16211 > URL: https://issues.apache.org/jira/browse/HADOOP-16211 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-16211-branch-3.2.001.patch, > HADOOP-16211-branch-3.2.002.patch, HADOOP-16211-branch-3.2.003.patch, > HADOOP-16211-branch-3.2.004.patch, HADOOP-16211-branch-3.2.005.patch, > HADOOP-16211-branch-3.2.006.patch > > > com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found > CVE-2018-10237. > This is a sub-task for branch-3.2 from HADOOP-15960 to track issues on that > particular branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16213) Update guava to 27.0-jre in hadoop-project branch-3.1
[ https://issues.apache.org/jira/browse/HADOOP-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863060#comment-16863060 ] Peter Bacsko commented on HADOOP-16213: --- Created YARN-9621 to track testDistributedShellWithPlacementConstraint failure. > Update guava to 27.0-jre in hadoop-project branch-3.1 > - > > Key: HADOOP-16213 > URL: https://issues.apache.org/jira/browse/HADOOP-16213 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-16213-branch-3.1.001.patch, > HADOOP-16213-branch-3.1.002.patch, HADOOP-16213-branch-3.1.003.patch, > HADOOP-16213-branch-3.1.004.patch, HADOOP-16213-branch-3.1.005.patch, > HADOOP-16213-branch-3.1.006.patch > > > com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found > CVE-2018-10237. > This is a sub-task for branch-3.1 from HADOOP-15960 to track issues on that > particular branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16211) Update guava to 27.0-jre in hadoop-project branch-3.2
[ https://issues.apache.org/jira/browse/HADOOP-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863035#comment-16863035 ] Peter Bacsko commented on HADOOP-16211: --- https://issues.apache.org/jira/browse/YARN-8672 addressed the failed test in hadoop.yarn.server.nodemanager.containermanager.TestContainerManager, it's just hasn't been backported to 3.2. > Update guava to 27.0-jre in hadoop-project branch-3.2 > - > > Key: HADOOP-16211 > URL: https://issues.apache.org/jira/browse/HADOOP-16211 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-16211-branch-3.2.001.patch, > HADOOP-16211-branch-3.2.002.patch, HADOOP-16211-branch-3.2.003.patch, > HADOOP-16211-branch-3.2.004.patch, HADOOP-16211-branch-3.2.005.patch, > HADOOP-16211-branch-3.2.006.patch > > > com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found > CVE-2018-10237. > This is a sub-task for branch-3.2 from HADOOP-15960 to track issues on that > particular branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Attachment: HADOOP-16238-005.patch > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch, HADOOP-16238-005.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833641#comment-16833641 ] Peter Bacsko commented on HADOOP-16238: --- I uploaded patch v5 where the default is "true". > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch, HADOOP-16238-005.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820794#comment-16820794 ] Peter Bacsko commented on HADOOP-16238: --- [~jojochuang] could you review this patch please? > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817970#comment-16817970 ] Peter Bacsko commented on HADOOP-16238: --- [~adam.antal] based on a quick analysis, this socket option is disabled on every major OS, so setting it to false by default doesn't break anything. Having said that, a 100% perfect solution is simply not touching it if not defined in the config. If we want to be extra-safe, it's a viable approach. > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814468#comment-16814468 ] Peter Bacsko commented on HADOOP-16238: --- Thanks [~wilfreds], handled the newline stuff. > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Attachment: HADOOP-16238-004.patch > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch, HADOOP-16238-004.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16237) Fix new findbugs issues after update guava to 27.0-jre in hadoop-project trunk
[ https://issues.apache.org/jira/browse/HADOOP-16237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810775#comment-16810775 ] Peter Bacsko commented on HADOOP-16237: --- *CosmosDBDocumentStoreReader /* *CosmosDBDocumentStoreWriter* {noformat} if (client == null) { synchronized (this) { if (client == null) { LOG.info("Creating Cosmos DB Client..."); client = DocumentStoreUtils.createCosmosDBClient(conf); } } }{noformat} To me this looks like the standard DCL pattern and with {{client}} being non-volatile, it's faulty. So either make it volatile or we should consider making {{client}} non-static - is it expensive to create? Do we really need to cache it once it's created? *FlowRunDocument.aggregate* {noformat} LOG.error("Unknown TimelineMetricOperation."){noformat} I vote for WARN level. If it's really an error, then we probably should throw an exception, no? *FlowRunDocument.aggregateMetrics(Map)* I think FindBugs is right, this can be enhanced. We retrieve the {{keySet()}} from {{metricSubDocMap}} then perform {{get()}} on it if {{metrics}} happens to contain the same key. Operating on the {{EntrySet}} is definitely better here (although I have no idea whether it really speeds up things). > Fix new findbugs issues after update guava to 27.0-jre in hadoop-project trunk > -- > > Key: HADOOP-16237 > URL: https://issues.apache.org/jira/browse/HADOOP-16237 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: > branch-findbugs-hadoop-common-project_hadoop-kms-warnings.html, > branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html, > > branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html, > > branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-documentstore-warnings.html > > > There are a bunch of new findbugs issues in the build after committing the > guava update. > Mostly in yarn, but we have to check and handle those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810746#comment-16810746 ] Peter Bacsko commented on HADOOP-16238: --- [~wilfreds] / [~ste...@apache.org] could you review this patch? > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Attachment: HADOOP-16238-003.patch > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch, > HADOOP-16238-003.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Attachment: HADOOP-16238-002.patch > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch, HADOOP-16238-002.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Status: Patch Available (was: Open) > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
[ https://issues.apache.org/jira/browse/HADOOP-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-16238: -- Attachment: HADOOP-16238-001.patch > Add the possbility to set SO_REUSEADDR in IPC Server Listener > - > > Key: HADOOP-16238 > URL: https://issues.apache.org/jira/browse/HADOOP-16238 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: HADOOP-16238-001.patch > > > Currently we can't enable SO_REUSEADDR in the IPC Server. In some > circumstances, this would be desirable, see explanation here: > [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] > Rarely it also causes problems in a test case > {{TestMiniMRClientCluster.testRestart}}: > {noformat} > 2019-04-04 11:21:31,896 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(273)) - Service > org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state > STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [test-host:35491] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) > at > org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) > at > org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} > > At least for testing, having this socket option enabled is benefical. We > could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
Peter Bacsko created HADOOP-16238: - Summary: Add the possbility to set SO_REUSEADDR in IPC Server Listener Key: HADOOP-16238 URL: https://issues.apache.org/jira/browse/HADOOP-16238 Project: Hadoop Common Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko Currently we can't enable SO_REUSEADDR in the IPC Server. In some circumstances, this would be desirable, see explanation here: [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] Rarely it also causes problems in a test case {{TestMiniMRClientCluster.testRestart}}: {noformat} 2019-04-04 11:21:31,896 INFO [main] service.AbstractService (AbstractService.java:noteFailure(273)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [test-host:35491] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [test-host:35491] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) at org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) at org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) at org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} At least for testing, having this socket option enabled is benefical. We could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643416#comment-16643416 ] Peter Bacsko commented on HADOOP-15822: --- [~jlowe] you were right, it's not related to zstandard. I reproduced this with other codecs + no compression. It's possibly an edge case. > zstd compressor can fail with a small output buffer > --- > > Key: HADOOP-15822 > URL: https://issues.apache.org/jira/browse/HADOOP-15822 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: HADOOP-15822.001.patch, HADOOP-15822.002.patch > > > TestZStandardCompressorDecompressor fails a couple of tests on my machine > with the latest zstd library (1.3.5). Compression can fail to successfully > finalize the stream when a small output buffer is used resulting in a failed > to init error, and decompression with a direct buffer can fail with an > invalid src size error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642106#comment-16642106 ] Peter Bacsko commented on HADOOP-15822: --- No, I still haven't had the time to check out with other codecs. But tomorrow I'll perform a test with no compression/snappy/lz4/etc. > zstd compressor can fail with a small output buffer > --- > > Key: HADOOP-15822 > URL: https://issues.apache.org/jira/browse/HADOOP-15822 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: HADOOP-15822.001.patch, HADOOP-15822.002.patch > > > TestZStandardCompressorDecompressor fails a couple of tests on my machine > with the latest zstd library (1.3.5). Compression can fail to successfully > finalize the stream when a small output buffer is used resulting in a failed > to init error, and decompression with a direct buffer can fail with an > invalid src size error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642000#comment-16642000 ] Peter Bacsko edited comment on HADOOP-15822 at 10/8/18 3:28 PM: I reproduced the problem. This is what happens if the sort buffer is 2047MiB. {noformat} ... 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 1267927860; bufend = 2082571562; bufvoid = 2146435072 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 316981960(1267927840); kvend = 91355880(365423520); length = 225626081/134152192 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) -1997752227 kvi 37170708(148682832) 2018-10-08 08:16:24,712 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 20 2018-10-08 08:16:24,712 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832) 2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832) 2018-10-08 08:16:24,727 INFO [main] org.apache.hadoop.mapred.Merger: Merging 21 sorted segments 2018-10-08 08:16:24,735 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,736 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,738 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,739 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,741 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,742 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,743 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,744 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,745 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,746 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,748 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,749 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,750 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,752 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,753 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,754 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,755 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,756 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,757 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,769 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,770 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 21 segments left of total size: 35310116 bytes 2018-10-08 08:16:30,104 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1469) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1365) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at org.apache.hadoop.io.Text.write(Text.java:330) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1163) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:727) at
[jira] [Commented] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642000#comment-16642000 ] Peter Bacsko commented on HADOOP-15822: --- I reproduced the problem. This is what happens if the sort buffer is 2047MiB. {noformat} ... 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 1267927860; bufend = 2082571562; bufvoid = 2146435072 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 316981960(1267927840); kvend = 91355880(365423520); length = 225626081/134152192 2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) -1997752227 kvi 37170708(148682832) 2018-10-08 08:16:24,712 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 20 2018-10-08 08:16:24,712 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832) 2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832) 2018-10-08 08:16:24,727 INFO [main] org.apache.hadoop.mapred.Merger: Merging 21 sorted segments 2018-10-08 08:16:24,735 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,736 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,738 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,739 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,741 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,742 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,743 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,744 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,745 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,746 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,748 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,749 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,750 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,752 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,753 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,754 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,755 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,756 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,757 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,769 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.zst] 2018-10-08 08:16:24,770 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 21 segments left of total size: 35310116 bytes 2018-10-08 08:16:30,104 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1469) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1365) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at org.apache.hadoop.io.Text.write(Text.java:330) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1163) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:727) at
[jira] [Comment Edited] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640338#comment-16640338 ] Peter Bacsko edited comment on HADOOP-15822 at 10/5/18 8:58 PM: [~jlowe] what a strange coincidence. I was also testing zstandard today and set {{mapreduce.task.io.sort.mb}} to 2047, which is the max I guess. Now the mapper was running on a 10GiB zstd compressed text file and then failed. The {{equator}} became a negative number and {{collect()}} threw {{ArrayIndexOutOfBoundsException}}. Mapper output compression was also enabled, probably that's what really matters here. It failed after like 40 minutes. I'm not sure whether it's zstd or not, because I haven't had the time to try it with other codecs, but it's something worth keeping in mind. was (Author: pbacsko): [~jlowe] what strange coincidence. I was also testing zstandard today and set {{mapreduce.task.io.sort.mb}} to 2047, which is the max I guess. Now the mapper was running on a 10GiB zstd compressed text file and then failed. The {{equator}} became a negative number and {{collect()}} threw {{ArrayIndexOutOfBoundsException}}. Mapper output compression was also enabled, probably that's what really matters here. It failed after like 40 minutes. I'm not sure whether it's zstd or not, because I haven't had the time to try it with other codecs, but it's something worth keeping in mind. > zstd compressor can fail with a small output buffer > --- > > Key: HADOOP-15822 > URL: https://issues.apache.org/jira/browse/HADOOP-15822 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: HADOOP-15822.001.patch, HADOOP-15822.002.patch > > > TestZStandardCompressorDecompressor fails a couple of tests on my machine > with the latest zstd library (1.3.5). Compression can fail to successfully > finalize the stream when a small output buffer is used resulting in a failed > to init error, and decompression with a direct buffer can fail with an > invalid src size error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15822) zstd compressor can fail with a small output buffer
[ https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640338#comment-16640338 ] Peter Bacsko commented on HADOOP-15822: --- [~jlowe] what strange coincidence. I was also testing zstandard today and set {{mapreduce.task.io.sort.mb}} to 2047, which is the max I guess. Now the mapper was running on a 10GiB zstd compressed text file and then failed. The {{equator}} became a negative number and {{collect()}} threw {{ArrayIndexOutOfBoundsException}}. Mapper output compression was also enabled, probably that's what really matters here. It failed after like 40 minutes. I'm not sure whether it's zstd or not, because I haven't had the time to try it with other codecs, but it's something worth keeping in mind. > zstd compressor can fail with a small output buffer > --- > > Key: HADOOP-15822 > URL: https://issues.apache.org/jira/browse/HADOOP-15822 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: HADOOP-15822.001.patch, HADOOP-15822.002.patch > > > TestZStandardCompressorDecompressor fails a couple of tests on my machine > with the latest zstd library (1.3.5). Compression can fail to successfully > finalize the stream when a small output buffer is used resulting in a failed > to init error, and decompression with a direct buffer can fail with an > invalid src size error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-14982: -- Attachment: HADOOP-14982-003.patch > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch, HADOOP-14892-002.patch, > HADOOP-14982-003.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. > 17/10/25 06:37:49
[jira] [Updated] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-14982: -- Attachment: HADOOP-14892-002.patch > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch, HADOOP-14892-002.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. > 17/10/25 06:37:49 INFO
[jira] [Commented] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1657#comment-1657 ] Peter Bacsko commented on HADOOP-14982: --- [~daryn] how do you get 1011 lines of output? I set the logging level to DEBUG and even in that case it's only 215 lines (in case of Hadoop 3). > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36
[jira] [Commented] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220762#comment-16220762 ] Peter Bacsko commented on HADOOP-14982: --- Thanks [~daryn], will modify the patch accordingly. > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. > 17/10/25 06:37:49
[jira] [Updated] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-14982: -- Status: Patch Available (was: Open) > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. > 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider:
[jira] [Updated] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
[ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated HADOOP-14982: -- Attachment: HADOOP-14892-001.patch > Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're > used without authenticating with kerberos in HA env > --- > > Key: HADOOP-14982 > URL: https://issues.apache.org/jira/browse/HADOOP-14982 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: HADOOP-14892-001.patch > > > If HA is configured for the Resource Manager in a secure environment, using > the mapred client goes into a loop if the user is not authenticated with > Kerberos. > {noformat} > [root@pb6sec-1 ~]# mapred job -list > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. > 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 > failover attempts. Trying to failover after sleeping for 582ms. > 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. > 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm25 > 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From host_redacted/IP_redacted to > com.host.redacted:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 > failover attempts. Trying to failover after sleeping for 1667ms. > 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm36 > 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , > while invoking ApplicationClientProtocolPBClientImpl.getApplications over > rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. > 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing
[jira] [Created] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
Peter Bacsko created HADOOP-14982: - Summary: Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env Key: HADOOP-14982 URL: https://issues.apache.org/jira/browse/HADOOP-14982 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Peter Bacsko Assignee: Peter Bacsko If HA is configured for the Resource Manager in a secure environment, using the mapred client goes into a loop if the user is not authenticated with Kerberos. {noformat} [root@pb6sec-1 ~]# mapred job -list 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 failover attempts. Trying to failover after sleeping for 582ms. 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 failover attempts. Trying to failover after sleeping for 1667ms. 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:49 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 6 failover attempts. Trying to failover after sleeping for 1055ms.