[jira] [Resolved] (HADOOP-15360) Log some more helpful information when catch RuntimeException or Error in IPC.Server
[ https://issues.apache.org/jira/browse/HADOOP-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao resolved HADOOP-15360. -- Resolution: Not A Problem close issue since it is not a problem. > Log some more helpful information when catch RuntimeException or Error in > IPC.Server > - > > Key: HADOOP-15360 > URL: https://issues.apache.org/jira/browse/HADOOP-15360 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: He Xiaoqiao >Priority: Major > > IPC.Server#logException doesn't not print exception stack trace when catch > RuntimeException or Error, for instance: > {code:java} > 2018-03-28 21:52:25,385 WARN org.apache.hadoop.ipc.Server: IPC Server handler > 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo > from *.*.*.*:59326 Call#46 Retry#0 java.lang.ArrayIndexOutOfBoundsException: 0 > {code} > this log message is not friendly for debug. I think it is necessary to print > more helpful message or full stack trace when the exception is > RuntimeException or Error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15360) Log some more helpful information when catch RuntimeException or Error in IPC.Server
He Xiaoqiao created HADOOP-15360: Summary: Log some more helpful information when catch RuntimeException or Error in IPC.Server Key: HADOOP-15360 URL: https://issues.apache.org/jira/browse/HADOOP-15360 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: He Xiaoqiao IPC.Server#logException doesn't not print exception stack trace when catch RuntimeException or Error, for instance: {code:java} 2018-03-28 21:52:25,385 WARN org.apache.hadoop.ipc.Server: IPC Server handler 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:59326 Call#46 Retry#0 java.lang.ArrayIndexOutOfBoundsException: 0 {code} this log message is not friendly for debug. I think it is necessary to print more helpful message or full stack trace when the exception is RuntimeException or Error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424863#comment-16424863 ] genericqa commented on HADOOP-14999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 18m 36s{color} | {color:red} Docker failed to build yetus/hadoop:dbd69cb. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917477/HADOOP-14999-branch-2.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14430/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999-branch-2.001.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: (was: HADOOP-14999-branch-2.001.patch) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-7050) proxyuser host/group config properties don't work if user name as DOT in it
[ https://issues.apache.org/jira/browse/HADOOP-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424792#comment-16424792 ] Quanlong Huang commented on HADOOP-7050: [~k4kaliazz], I have a similar problem as yours. I failed to launch HiveServer2 in linux since my username contains a (.)dot too. Finally, I disabled impersonation in it by setting hive.server2.enable.doAs to false in hive-site.xml. > proxyuser host/group config properties don't work if user name as DOT in it > --- > > Key: HADOOP-7050 > URL: https://issues.apache.org/jira/browse/HADOOP-7050 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Alejandro Abdelnur >Priority: Major > > If the user contains a DOT, "foo.bar", proxy user configuration fails to be > read properly and it does not kick in. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424727#comment-16424727 ] Xiao Chen commented on HADOOP-15359: bq. JDK-7092821 Yes, I wasn't accurate, updated the description. That jira is the closest I can find. But if that's taken care of, (i.e. either the synchronized method isn't called any more, or the lock removed), then there is no deadlock in this case anyways. One of its linked jiras also mentioned " In order to alleviate this problem applications should cache the result of the Cipher.getInstance() call per thread and reinitialise (Cipher.init(...)) the cached copy instead of calling Cipher.getInstance() again. " but the caller here would be krb5... > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Major > Attachments: 1.jstack, 2.jstack > > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at >
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Priority: Major (was: Critical) > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Major > Attachments: 1.jstack, 2.jstack > > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) > at
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Description: In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows the following: {noformat} "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for monitor entry [0x7f6bc2bd6000] java.lang.Thread.State: BLOCKED (on object monitor) at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.Cipher.getInstance(Cipher.java:513) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) - locked <0x83444878> (a java.nio.HeapByteBuffer) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked <0x834448c0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) {noformat} and at the end of jstack: {noformat} Found one Java-level deadlock: = "IPC Parameter Sending Thread #29": waiting to lock monitor 0x17ff49f8 (object 0x80277040, a sun.security.provider.Sun), which is held by UNKNOWN_owner_addr=0x50607000 Java stack information for the threads listed above: === "IPC Parameter Sending Thread #29": at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) - locked <0x834386b8> (a java.lang.Object) at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315) at sun.security.jgss.krb5.WrapToken.(WrapToken.java:422) at
[jira] [Commented] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424683#comment-16424683 ] Wei-Chiu Chuang commented on HADOOP-15359: -- JDK-7092821 mentioned it a scalability bottleneck rather than a deadlock. Not sure how JDK detetermines a deadlock though. HADOOP-13836 (Securing Hadoop RPC using SSL) should help with this in the long run, since it would not depend on JDK SASL eventually. And it improves RPC performance as well. > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Critical > Attachments: 1.jstack, 2.jstack > > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Attachment: 2.jstack 1.jstack > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Critical > Attachments: 1.jstack, 2.jstack > > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) > at
[jira] [Commented] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424643#comment-16424643 ] Xiao Chen commented on HADOOP-15359: Attached 2 sample jstacks. > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Critical > Attachments: 1.jstack, 2.jstack > > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) >
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Affects Version/s: 2.6.0 > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Critical > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) >
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Description: In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows the following: {noformat} "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for monitor entry [0x7f6bc2bd6000] java.lang.Thread.State: BLOCKED (on object monitor) at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.Cipher.getInstance(Cipher.java:513) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) - locked <0x83444878> (a java.nio.HeapByteBuffer) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked <0x834448c0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) {noformat} and at the end of jstack: {noformat} Found one Java-level deadlock: = "IPC Parameter Sending Thread #29": waiting to lock monitor 0x17ff49f8 (object 0x80277040, a sun.security.provider.Sun), which is held by UNKNOWN_owner_addr=0x50607000 Java stack information for the threads listed above: === "IPC Parameter Sending Thread #29": at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) - locked <0x834386b8> (a java.lang.Object) at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315) at sun.security.jgss.krb5.WrapToken.(WrapToken.java:422) at
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Priority: Critical (was: Major) > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.6.0, 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Critical > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) > at
[jira] [Created] (HADOOP-15359) IPC client could run into JDK deadlock
Xiao Chen created HADOOP-15359: -- Summary: IPC client could run into JDK deadlock Key: HADOOP-15359 URL: https://issues.apache.org/jira/browse/HADOOP-15359 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 3.0.0, 2.8.0 Reporter: Xiao Chen In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows the following: {noformat} "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for monitor entry [0x7f6bc2bd6000] java.lang.Thread.State: BLOCKED (on object monitor) at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.Cipher.getInstance(Cipher.java:513) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) - locked <0x83444878> (a java.nio.HeapByteBuffer) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked <0x834448c0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) {noformat} and at the end of jstack: {noformat} Found one Java-level deadlock: = "IPC Parameter Sending Thread #29": waiting to lock monitor 0x17ff49f8 (object 0x80277040, a sun.security.provider.Sun), which is held by UNKNOWN_owner_addr=0x50607000 Java stack information for the threads listed above: === "IPC Parameter Sending Thread #29": at java.security.Provider.getService(Provider.java:1035) - waiting to lock <0x80277040> (a sun.security.provider.Sun) at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) - locked <0x834386b8> (a java.lang.Object) at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
[ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15359: --- Summary: IPC client hang in kerberized cluster due to JDK deadlock (was: IPC client could run into JDK deadlock) > IPC client hang in kerberized cluster due to JDK deadlock > - > > Key: HADOOP-15359 > URL: https://issues.apache.org/jira/browse/HADOOP-15359 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.0, 3.0.0 >Reporter: Xiao Chen >Priority: Major > > In a recent internal testing, we have found a DFS client hang. Further > inspecting jstack shows the following: > {noformat} > "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 > daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for > monitor entry [0x7f6bc2bd6000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.Cipher.getInstance(Cipher.java:513) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484) > at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447) > at > sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413) > at > sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59) > at > sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231) > at > sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466) > at > sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374) > at > sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209) > at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182) > at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053) > at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403) > at > com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617) > at > org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583) > - locked <0x83444878> (a java.nio.HeapByteBuffer) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x834448c0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > {noformat} > and at the end of jstack: > {noformat} > Found one Java-level deadlock: > = > "IPC Parameter Sending Thread #29": > waiting to lock monitor 0x17ff49f8 (object 0x80277040, a > sun.security.provider.Sun), > which is held by UNKNOWN_owner_addr=0x50607000 > Java stack information for the threads listed above: > === > "IPC Parameter Sending Thread #29": > at java.security.Provider.getService(Provider.java:1035) > - waiting to lock <0x80277040> (a sun.security.provider.Sun) > at > sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437) > at > sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) > at > sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) > at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293) > - locked <0x834386b8> (a java.lang.Object) > at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121) > at > javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160) > at > sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187) >
[jira] [Commented] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424565#comment-16424565 ] Jim Brennan commented on HADOOP-15357: -- [~lmccay], [~asuresh], I believe this patch is ready for review. cc: [~jlowe] > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HADOOP-15357.001.patch, HADOOP-15357.002.patch > > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424553#comment-16424553 ] genericqa commented on HADOOP-15357: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} hadoop-common-project/hadoop-common: The patch generated 0 new + 243 unchanged - 1 fixed = 243 total (was 244) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 7s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HADOOP-15357 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917415/HADOOP-15357.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4777a3250d9f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5a174f8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14429/testReport/ | | Max. process+thread count | 1512 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14429/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Configuration.getPropsWithPrefix no longer does variable
[jira] [Commented] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking
[ https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424445#comment-16424445 ] Xiao Chen commented on HADOOP-14987: The conflicts were trivial so I resolved it on the fly. Can you get the diff from git history? > Improve KMSClientProvider log around delegation token checking > -- > > Key: HADOOP-14987 > URL: https://issues.apache.org/jira/browse/HADOOP-14987 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.3 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.0.0, 2.10.0, 2.8.4, 2.9.2 > > Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, > HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch > > > KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to > build the key to look for KMS-DT from the UGI's token map. The token lookup > key here varies depending on the KMSClientProvider's configuration value for > hadoop.security.token.service.use_ip. In certain cases, the token obtained > with non-matching hadoop.security.token.service.use_ip setting will not be > recognized by KMSClientProvider. This ticket is opened to improve logs for > troubleshooting KMS delegation token related issues like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking
[ https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424395#comment-16424395 ] Rushabh S Shah commented on HADOOP-14987: - [~xiaochen]:Do you mind attaching the latest patch that you committed for branch-2* ? > Improve KMSClientProvider log around delegation token checking > -- > > Key: HADOOP-14987 > URL: https://issues.apache.org/jira/browse/HADOOP-14987 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.3 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.0.0, 2.10.0, 2.8.4, 2.9.2 > > Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, > HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch > > > KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to > build the key to look for KMS-DT from the UGI's token map. The token lookup > key here varies depending on the KMSClientProvider's configuration value for > hadoop.security.token.service.use_ip. In certain cases, the token obtained > with non-matching hadoop.security.token.service.use_ip setting will not be > recognized by KMSClientProvider. This ticket is opened to improve logs for > troubleshooting KMS delegation token related issues like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking
[ https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424391#comment-16424391 ] Xiao Chen commented on HADOOP-14987: Due to conflicts from another jira, I cherry-picked this to branch-2, branch-2.9, branch-2.8. Compiled and ran the changed test locally before pushing. > Improve KMSClientProvider log around delegation token checking > -- > > Key: HADOOP-14987 > URL: https://issues.apache.org/jira/browse/HADOOP-14987 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.3 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.0.0, 2.10.0, 2.8.4, 2.9.2 > > Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, > HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch > > > KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to > build the key to look for KMS-DT from the UGI's token map. The token lookup > key here varies depending on the KMSClientProvider's configuration value for > hadoop.security.token.service.use_ip. In certain cases, the token obtained > with non-matching hadoop.security.token.service.use_ip setting will not be > recognized by KMSClientProvider. This ticket is opened to improve logs for > troubleshooting KMS delegation token related issues like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking
[ https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14987: --- Fix Version/s: 2.9.2 2.8.4 2.10.0 > Improve KMSClientProvider log around delegation token checking > -- > > Key: HADOOP-14987 > URL: https://issues.apache.org/jira/browse/HADOOP-14987 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.3 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.0.0, 2.10.0, 2.8.4, 2.9.2 > > Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, > HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch > > > KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to > build the key to look for KMS-DT from the UGI's token map. The token lookup > key here varies depending on the KMSClientProvider's configuration value for > hadoop.security.token.service.use_ip. In certain cases, the token obtained > with non-matching hadoop.security.token.service.use_ip setting will not be > recognized by KMSClientProvider. This ticket is opened to improve logs for > troubleshooting KMS delegation token related issues like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424387#comment-16424387 ] Jim Brennan commented on HADOOP-15357: -- Renamed local variable to fix the check-style issue and submitted a new patch. > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HADOOP-15357.001.patch, HADOOP-15357.002.patch > > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HADOOP-15357: - Attachment: HADOOP-15357.002.patch > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HADOOP-15357.001.patch, HADOOP-15357.002.patch > > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424316#comment-16424316 ] genericqa commented on HADOOP-15357: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 243 unchanged - 1 fixed = 244 total (was 244) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 49s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HADOOP-15357 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917380/HADOOP-15357.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 513dc5057496 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93d47a0 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14428/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14428/testReport/ | | Max. process+thread count | 1717 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14428/console | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424241#comment-16424241 ] Xiao Chen commented on HADOOP-14445: Thanks for the prompt review Rushabh. bq. test failure I found it yesterday when looking at another jira, and it turns out to be HADOOP-15355. Committed that one last night, so next run should clear off. I'll correct the checkstyle after a final round of manual testing with real clusters. Will also provide branch-2 patches. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14758) S3GuardTool.prune to handle UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HADOOP-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424204#comment-16424204 ] Hudson commented on HADOOP-14758: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13919 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13919/]) HADOOP-14758. S3GuardTool.prune to handle UnsupportedOperationException. (stevel: rev 5a174f8ac6e5f170b427b30bf72ef33f90c20d91) * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java > S3GuardTool.prune to handle UnsupportedOperationException > - > > Key: HADOOP-14758 > URL: https://issues.apache.org/jira/browse/HADOOP-14758 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Steve Loughran >Assignee: Gabor Bota >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HADOOP-14758.001.patch > > > {{MetadataStore.prune()}} may throw {{UnsupportedOperationException}} if not > supported. > {{S3GuardTool.prune}} should recognise this, catch it and treat it > differently from any other failure, e.g. inform and return 0 as its a no-op -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-13500) Concurrency issues when using Configuration iterator
[ https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar reassigned HADOOP-13500: --- Assignee: (was: Ajay Kumar) > Concurrency issues when using Configuration iterator > > > Key: HADOOP-13500 > URL: https://issues.apache.org/jira/browse/HADOOP-13500 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Jason Lowe >Priority: Major > > It is possible to encounter a ConcurrentModificationException while trying to > iterate a Configuration object. The iterator method tries to walk the > underlying Property object without proper synchronization, so another thread > simultaneously calling the set method can trigger it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14758) S3GuardTool.prune to handle UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HADOOP-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14758: Resolution: Fixed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) +1, committed —thanks! > S3GuardTool.prune to handle UnsupportedOperationException > - > > Key: HADOOP-14758 > URL: https://issues.apache.org/jira/browse/HADOOP-14758 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Steve Loughran >Assignee: Gabor Bota >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HADOOP-14758.001.patch > > > {{MetadataStore.prune()}} may throw {{UnsupportedOperationException}} if not > supported. > {{S3GuardTool.prune}} should recognise this, catch it and treat it > differently from any other failure, e.g. inform and return 0 as its a no-op -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424166#comment-16424166 ] Rushabh S Shah commented on HADOOP-14445: - Thanks [~xiaochen] for the latest patch. It looks good. bq. If KMS server is old, you'd get an old token. Thanks for catching that. I totally missed that. There is one test failure in the latest run. {noformat} org.apache.hadoop.conf.TestCommonConfigurationFields.testCompareXmlAgainstConfigurationClass Failing for the past 1 build (Since Failed#14425 ) Took 0.2 sec. Error Message core-default.xml has 2 properties missing in class org.apache.hadoop.fs.CommonConfigurationKeys class org.apache.hadoop.fs.CommonConfigurationKeysPublic class org.apache.hadoop.fs.local.LocalConfigKeys class org.apache.hadoop.fs.ftp.FtpConfigKeys class org.apache.hadoop.ha.SshFenceByTcpPort class org.apache.hadoop.security.LdapGroupsMapping class org.apache.hadoop.ha.ZKFailoverController class org.apache.hadoop.security.ssl.SSLFactory class org.apache.hadoop.security.CompositeGroupsMapping class org.apache.hadoop.io.erasurecode.CodecUtil class org.apache.hadoop.security.RuleBasedLdapGroupsMapping Entries: hadoop.security.key.default.bitlength hadoop.security.key.default.cipher expected:<0> but was:<2> {noformat} I can't think of a way that your latest patch can introduce this failure. The hadoop-common build is fairly stable compared to hadoop-hdfs. Can you please double check whether your patch introduced this failure. If not, can you please find out which jira is responsible ? Also there are couple of checkstyle warnings in TestKMS.java regarding unused import. If the test failure is not related, then you can make the checkstyle changes while committing. Also can you upload the new patch after committing and resolving the jira. I know some people had concerns that it is difficult to co-relate the commit with the last patch if they are not the same. +1 (non-binding) pending confirming test failure. Thanks a lot for the good work here. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HADOOP-15357: - Status: Patch Available (was: Open) > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HADOOP-15357.001.patch > > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HADOOP-15357: - Attachment: HADOOP-15357.001.patch > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HADOOP-15357.001.patch > > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-13500) Concurrency issues when using Configuration iterator
[ https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reopened HADOOP-13500: - This is not a duplicate of HADOOP-13556. That JIRA only changed the getPropsWithPrefix method which was not involved in the error reported by this JIRA or TEZ-3413. AFAICT iterating a shared configuration object is still unsafe. > Concurrency issues when using Configuration iterator > > > Key: HADOOP-13500 > URL: https://issues.apache.org/jira/browse/HADOOP-13500 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Jason Lowe >Assignee: Ajay Kumar >Priority: Major > > It is possible to encounter a ConcurrentModificationException while trying to > iterate a Configuration object. The iterator method tries to walk the > underlying Property object without proper synchronization, so another thread > simultaneously calling the set method can trigger it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned HADOOP-15357: Assignee: Jim Brennan > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution
[ https://issues.apache.org/jira/browse/HADOOP-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424056#comment-16424056 ] Jim Brennan commented on HADOOP-15357: -- [~lmccay], thanks for the prompt replies. I will happy to put up a patch later today! > Configuration.getPropsWithPrefix no longer does variable substitution > - > > Key: HADOOP-15357 > URL: https://issues.apache.org/jira/browse/HADOOP-15357 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Priority: Major > > Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the > Configuration.get() method to get the value of the variables. After > [HADOOP-13556], it now uses props.getProperty(). > The difference is that Configuration.get() does deprecation handling and more > importantly variable substitution on the value. So if a property has a > variable specified with ${variable_name}, it will no longer be expanded when > retrieved via getPropsWithPrefix(). > Was this change in behavior intentional? I am using this function in the fix > for [MAPREDUCE-7069], but we do want variable expansion to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15358) SFTPConnectionPool connections leakage
[ https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Pryakhin updated HADOOP-15358: -- Description: Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are listed below: # mkdirs method the public mkdirs method acquires a new ChannelSftp from the pool [1] and then recursively creates directories, checking for the directory existence beforehand by calling the method exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it ends up in returning the FilesStatus instance [4]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory method [5] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections. # open method [6]. This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it closes it[7]. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance. [1] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658 [2] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321 [3] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202 [4] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290 [5] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640 [6] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504 [7] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123 was: Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are listed below: # mkdirs method the public mkdirs method acquires a new ChannelSftp [from the pool|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658]] and then recursively creates directories, checking for the directory existence beforehand by calling the method [exists|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321] ] which delegates to the getFileStatus(ChannelSftp channel, Path file) [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202]] and so on until it ends up in returning the [FilesStatus instance|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290]]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640]] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections. # open [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504]]. This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp
[jira] [Updated] (HADOOP-15358) SFTPConnectionPool connections leakage
[ https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Pryakhin updated HADOOP-15358: -- Description: Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are listed below: # mkdirs method the public mkdirs method acquires a new ChannelSftp [from the pool|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658]] and then recursively creates directories, checking for the directory existence beforehand by calling the method [exists|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321] ] which delegates to the getFileStatus(ChannelSftp channel, Path file) [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202]] and so on until it ends up in returning the [FilesStatus instance|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290]]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640]] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections. # open [method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504]]. This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it [closes|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123]] it. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance. was: Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are listed below: # mkdirs method the public mkdirs method acquires a new ChannelSftp from the pool [1] and then recursively creates directories, checking for the directory existence beforehand by calling the method exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it ends up in returning the FilesStatus instance [4]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory method [5] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections. # open method [6] This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it closes it[7]. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance. [1] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658 [2] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321 [3] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202 [4] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290 [5] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640 [6]
[jira] [Created] (HADOOP-15358) SFTPConnectionPool connections leakage
Mikhail Pryakhin created HADOOP-15358: - Summary: SFTPConnectionPool connections leakage Key: HADOOP-15358 URL: https://issues.apache.org/jira/browse/HADOOP-15358 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 3.0.0 Reporter: Mikhail Pryakhin Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of SFTPFileSystem are chained together resulting in establishing multiple connections to the SFTP server to accomplish one compound action, those methods are listed below: # mkdirs method the public mkdirs method acquires a new ChannelSftp from the pool [1] and then recursively creates directories, checking for the directory existence beforehand by calling the method exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it ends up in returning the FilesStatus instance [4]. The resource leakage occurs in the method getWorkingDirectory which calls the getHomeDirectory method [5] which in turn establishes a new connection to the sftp server instead of using an already created connection. As the mkdirs method is recursive this results in creating a huge number of connections. # open method [6] This method returns an instance of FSDataInputStream which consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp instance back to the pool but instead it closes it[7]. This leads to establishing another connection to an SFTP server when the next method is called on the FileSystem instance. [1] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658 [2] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321 [3] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202 [4] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290 [5] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640 [6] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504 [7] https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423924#comment-16423924 ] genericqa commented on HADOOP-14999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 15s{color} | {color:red} Docker failed to build yetus/hadoop:dbd69cb. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917350/HADOOP-14999-branch-2.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14427/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423921#comment-16423921 ] Genmao Yu commented on HADOOP-14999: [~Sammi] attach HADOOP-14999-branch-2.001.patch for branch-2 > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999-branch-2.001.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14651) Update okhttp version to 2.7.5
[ https://issues.apache.org/jira/browse/HADOOP-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423823#comment-16423823 ] genericqa commented on HADOOP-14651: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 18s{color} | {color:red} Docker failed to build yetus/hadoop:dbd69cb. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14651 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917344/HADOOP-14651-branch-2.0.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14426/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update okhttp version to 2.7.5 > -- > > Key: HADOOP-14651 > URL: https://issues.apache.org/jira/browse/HADOOP-14651 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/adl >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 3.1.0 > > Attachments: HADOOP-14651-branch-2.0.004.patch, > HADOOP-14651-branch-2.0.004.patch, HADOOP-14651-branch-3.0.004.patch, > HADOOP-14651-branch-3.0.004.patch, HADOOP-14651.001.patch, > HADOOP-14651.002.patch, HADOOP-14651.003.patch, HADOOP-14651.004.patch > > > The current artifact is: > com.squareup.okhttp:okhttp:2.4.0 > That version could either be bumped to 2.7.5 (the latest of that line), or > use the latest artifact: > com.squareup.okhttp3:okhttp:3.8.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14651) Update okhttp version to 2.7.5
[ https://issues.apache.org/jira/browse/HADOOP-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14651: Status: Patch Available (was: Open) > Update okhttp version to 2.7.5 > -- > > Key: HADOOP-14651 > URL: https://issues.apache.org/jira/browse/HADOOP-14651 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/adl >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 3.1.0 > > Attachments: HADOOP-14651-branch-2.0.004.patch, > HADOOP-14651-branch-2.0.004.patch, HADOOP-14651-branch-3.0.004.patch, > HADOOP-14651-branch-3.0.004.patch, HADOOP-14651.001.patch, > HADOOP-14651.002.patch, HADOOP-14651.003.patch, HADOOP-14651.004.patch > > > The current artifact is: > com.squareup.okhttp:okhttp:2.4.0 > That version could either be bumped to 2.7.5 (the latest of that line), or > use the latest artifact: > com.squareup.okhttp3:okhttp:3.8.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14651) Update okhttp version to 2.7.5
[ https://issues.apache.org/jira/browse/HADOOP-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14651: Attachment: HADOOP-14651-branch-2.0.004.patch > Update okhttp version to 2.7.5 > -- > > Key: HADOOP-14651 > URL: https://issues.apache.org/jira/browse/HADOOP-14651 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/adl >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 3.1.0 > > Attachments: HADOOP-14651-branch-2.0.004.patch, > HADOOP-14651-branch-2.0.004.patch, HADOOP-14651-branch-3.0.004.patch, > HADOOP-14651-branch-3.0.004.patch, HADOOP-14651.001.patch, > HADOOP-14651.002.patch, HADOOP-14651.003.patch, HADOOP-14651.004.patch > > > The current artifact is: > com.squareup.okhttp:okhttp:2.4.0 > That version could either be bumped to 2.7.5 (the latest of that line), or > use the latest artifact: > com.squareup.okhttp3:okhttp:3.8.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14651) Update okhttp version to 2.7.5
[ https://issues.apache.org/jira/browse/HADOOP-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14651: Status: Open (was: Patch Available) > Update okhttp version to 2.7.5 > -- > > Key: HADOOP-14651 > URL: https://issues.apache.org/jira/browse/HADOOP-14651 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/adl >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 3.1.0 > > Attachments: HADOOP-14651-branch-2.0.004.patch, > HADOOP-14651-branch-3.0.004.patch, HADOOP-14651-branch-3.0.004.patch, > HADOOP-14651.001.patch, HADOOP-14651.002.patch, HADOOP-14651.003.patch, > HADOOP-14651.004.patch > > > The current artifact is: > com.squareup.okhttp:okhttp:2.4.0 > That version could either be bumped to 2.7.5 (the latest of that line), or > use the latest artifact: > com.squareup.okhttp3:okhttp:3.8.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15355) TestCommonConfigurationFields is broken by HADOOP-15312
[ https://issues.apache.org/jira/browse/HADOOP-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423574#comment-16423574 ] LiXin Ge commented on HADOOP-15355: --- Thanks [~xiaochen] for reviewing and committing this! > TestCommonConfigurationFields is broken by HADOOP-15312 > --- > > Key: HADOOP-15355 > URL: https://issues.apache.org/jira/browse/HADOOP-15355 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: LiXin Ge >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.2, 3.1.1 > > Attachments: HADOOP-15355.001.patch, HADOOP-15355.002.patch > > > TestCommonConfigurationFields is failing after HADOOP-15312. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15312) Undocumented KeyProvider configuration keys
[ https://issues.apache.org/jira/browse/HADOOP-15312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423546#comment-16423546 ] Hudson commented on HADOOP-15312: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13916 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13916/]) HADOOP-15355. TestCommonConfigurationFields is broken by HADOOP-15312. (xiao: rev 1077392eaad303ddd82bcbe259a4045d8a028c20) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java > Undocumented KeyProvider configuration keys > --- > > Key: HADOOP-15312 > URL: https://issues.apache.org/jira/browse/HADOOP-15312 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: LiXin Ge >Priority: Major > Fix For: 3.1.0, 2.10.0, 3.0.2 > > Attachments: HADOOP-15312.001.patch, HADOOP-15312.002.patch, > HADOOP-15312.003.patch > > > Via HADOOP-14445, I found two undocumented configuration keys: > hadoop.security.key.default.bitlength and hadoop.security.key.default.cipher -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15355) TestCommonConfigurationFields is broken by HADOOP-15312
[ https://issues.apache.org/jira/browse/HADOOP-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423545#comment-16423545 ] Hudson commented on HADOOP-15355: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13916 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13916/]) HADOOP-15355. TestCommonConfigurationFields is broken by HADOOP-15312. (xiao: rev 1077392eaad303ddd82bcbe259a4045d8a028c20) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java > TestCommonConfigurationFields is broken by HADOOP-15312 > --- > > Key: HADOOP-15355 > URL: https://issues.apache.org/jira/browse/HADOOP-15355 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: LiXin Ge >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.2, 3.1.1 > > Attachments: HADOOP-15355.001.patch, HADOOP-15355.002.patch > > > TestCommonConfigurationFields is failing after HADOOP-15312. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15317) Improve NetworkTopology chooseRandom's loop
[ https://issues.apache.org/jira/browse/HADOOP-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423526#comment-16423526 ] Hudson commented on HADOOP-15317: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13915 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13915/]) HADOOP-15317. Improve NetworkTopology chooseRandom's loop. (xiao: rev 57374c4737ab0fccf52dae3cea911fc6bd90e1b7) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java > Improve NetworkTopology chooseRandom's loop > --- > > Key: HADOOP-15317 > URL: https://issues.apache.org/jira/browse/HADOOP-15317 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.0.2, 3.1.1, 2.9.2 > > Attachments: HADOOP-15317.01.patch, HADOOP-15317.02.patch, > HADOOP-15317.03.patch, HADOOP-15317.04.patch, HADOOP-15317.05.patch, > HADOOP-15317.06.patch, Screen Shot 2018-03-28 at 7.23.32 PM.png > > > Recently we found a postmortem case where the ANN seems to be in an infinite > loop. From the logs it seems it just went through a rolling restart, and DNs > are getting registered. > Later the NN become unresponsive, and from the stacktrace it's inside a > do-while loop inside {{NetworkTopology#chooseRandom}} - part of what's done > in HDFS-10320. > Going through the code and logs I'm not able to come up with any theory > (thought about incorrect locking, or the Node object being modified outside > of NetworkTopology, both seem impossible) why this is happening, but we > should eliminate this loop. > stacktrace: > {noformat} > Stack: > java.util.HashMap.hash(HashMap.java:338) > java.util.HashMap.containsKey(HashMap.java:595) > java.util.HashSet.contains(HashSet.java:203) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:786) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:732) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:757) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:692) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:666) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:573) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:368) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:243) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:115) > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4AdditionalDatanode(BlockManager.java:1596) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3599) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:717) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15355) TestCommonConfigurationFields is broken by HADOOP-15312
[ https://issues.apache.org/jira/browse/HADOOP-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15355: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.1 3.0.2 3.2.0 2.10.0 Status: Resolved (was: Patch Available) Committed this to trunk, branch-3.[0-1], branch-2. Thanks [~GeLiXin] and [~shv]! > TestCommonConfigurationFields is broken by HADOOP-15312 > --- > > Key: HADOOP-15355 > URL: https://issues.apache.org/jira/browse/HADOOP-15355 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: LiXin Ge >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.2, 3.1.1 > > Attachments: HADOOP-15355.001.patch, HADOOP-15355.002.patch > > > TestCommonConfigurationFields is failing after HADOOP-15312. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15355) TestCommonConfigurationFields is broken by HADOOP-15312
[ https://issues.apache.org/jira/browse/HADOOP-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423520#comment-16423520 ] Xiao Chen commented on HADOOP-15355: +1, let's fix the broken test. Committing this > TestCommonConfigurationFields is broken by HADOOP-15312 > --- > > Key: HADOOP-15355 > URL: https://issues.apache.org/jira/browse/HADOOP-15355 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: LiXin Ge >Priority: Major > Attachments: HADOOP-15355.001.patch, HADOOP-15355.002.patch > > > TestCommonConfigurationFields is failing after HADOOP-15312. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15317) Improve NetworkTopology chooseRandom's loop
[ https://issues.apache.org/jira/browse/HADOOP-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15317: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.2 3.1.1 3.0.2 3.2.0 2.8.4 2.10.0 Status: Resolved (was: Patch Available) > Improve NetworkTopology chooseRandom's loop > --- > > Key: HADOOP-15317 > URL: https://issues.apache.org/jira/browse/HADOOP-15317 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.0.2, 3.1.1, 2.9.2 > > Attachments: HADOOP-15317.01.patch, HADOOP-15317.02.patch, > HADOOP-15317.03.patch, HADOOP-15317.04.patch, HADOOP-15317.05.patch, > HADOOP-15317.06.patch, Screen Shot 2018-03-28 at 7.23.32 PM.png > > > Recently we found a postmortem case where the ANN seems to be in an infinite > loop. From the logs it seems it just went through a rolling restart, and DNs > are getting registered. > Later the NN become unresponsive, and from the stacktrace it's inside a > do-while loop inside {{NetworkTopology#chooseRandom}} - part of what's done > in HDFS-10320. > Going through the code and logs I'm not able to come up with any theory > (thought about incorrect locking, or the Node object being modified outside > of NetworkTopology, both seem impossible) why this is happening, but we > should eliminate this loop. > stacktrace: > {noformat} > Stack: > java.util.HashMap.hash(HashMap.java:338) > java.util.HashMap.containsKey(HashMap.java:595) > java.util.HashSet.contains(HashSet.java:203) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:786) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:732) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:757) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:692) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:666) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:573) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:368) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:243) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:115) > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4AdditionalDatanode(BlockManager.java:1596) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3599) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:717) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15317) Improve NetworkTopology chooseRandom's loop
[ https://issues.apache.org/jira/browse/HADOOP-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423516#comment-16423516 ] Xiao Chen commented on HADOOP-15317: Pushed this to all branches (trunk, branch-3.[0-1], branch-2, branch-2.[8-9]) to match HDFS-10320. Thanks Ajay and Eddy for the reviews! > Improve NetworkTopology chooseRandom's loop > --- > > Key: HADOOP-15317 > URL: https://issues.apache.org/jira/browse/HADOOP-15317 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.0.2, 3.1.1, 2.9.2 > > Attachments: HADOOP-15317.01.patch, HADOOP-15317.02.patch, > HADOOP-15317.03.patch, HADOOP-15317.04.patch, HADOOP-15317.05.patch, > HADOOP-15317.06.patch, Screen Shot 2018-03-28 at 7.23.32 PM.png > > > Recently we found a postmortem case where the ANN seems to be in an infinite > loop. From the logs it seems it just went through a rolling restart, and DNs > are getting registered. > Later the NN become unresponsive, and from the stacktrace it's inside a > do-while loop inside {{NetworkTopology#chooseRandom}} - part of what's done > in HDFS-10320. > Going through the code and logs I'm not able to come up with any theory > (thought about incorrect locking, or the Node object being modified outside > of NetworkTopology, both seem impossible) why this is happening, but we > should eliminate this loop. > stacktrace: > {noformat} > Stack: > java.util.HashMap.hash(HashMap.java:338) > java.util.HashMap.containsKey(HashMap.java:595) > java.util.HashSet.contains(HashSet.java:203) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:786) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:732) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:757) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:692) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:666) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:573) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:368) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:243) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:115) > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4AdditionalDatanode(BlockManager.java:1596) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3599) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:717) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org