[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392540#comment-16392540 ] genericqa commented on HADOOP-14999: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 18s{color} | {color:green} hadoop-aliyun in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913728/HADOOP-14999.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fd903c602e94 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 113f401 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14287/testReport/ | | Max. process+thread count | 328 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14287/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key:
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.06.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: (was: HADOOP-14445.06.patch) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392510#comment-16392510 ] Xiao Chen commented on HADOOP-14445: Patch 6 to fix pre-commit errors. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.06.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.009.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15234) NPE when initializing KMSWebApp
[ https://issues.apache.org/jira/browse/HADOOP-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392402#comment-16392402 ] Xiao Chen commented on HADOOP-15234: Thanks all for the comments. bq. Sorry for asking you to make 2-3 revisions for such a simple patch. While I appreciate the friendliness, IMO there is nothing to be sorry about. We're all trying to solve the problem in the best way. Even if the problem appears simple, as it turned out here the possible solutions are many, and I think it's common to iterate over a few patches - it's simply development. :) bq. without unit tests I'm okay here given this is a supportability improvement. bq. Preconditions Since here we're just doing a one-off at service startup time, which is a rare operation and not performance critical, I'd vote for readability. Null check and throw is fine by me too. bq. should we throw in the implementation of Maybe I misunderstood, please let me know if so. The factory [depends on|https://github.com/apache/hadoop/blob/113f401f41ee575cb303ceb647bc243108d93a04/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderFactory.java#L97] the null check to create the correct provider out of all the factories loaded by service loader. So throwing in one of them would not work. It may be reasonable to throw instead of returning null in {{KeyProviderFactory#get}}, but that class is {{InterfaceAudience.Public}}. I do have 1 comment on the patch: Can we add {{providerString}} to the message being thrown, so the exception is more self-explaining? > NPE when initializing KMSWebApp > --- > > Key: HADOOP-15234 > URL: https://issues.apache.org/jira/browse/HADOOP-15234 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Reporter: Xiao Chen >Assignee: fang zhenyi >Priority: Major > Attachments: HADOOP-15234.001.patch, HADOOP-15234.002.patch > > > During KMS startup, if the {{keyProvider}} is null, it will NPE inside > KeyProviderExtension. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.crypto.key.KeyProviderExtension.(KeyProviderExtension.java:43) > at > org.apache.hadoop.crypto.key.CachingKeyProvider.(CachingKeyProvider.java:93) > at > org.apache.hadoop.crypto.key.kms.server.KMSWebApp.contextInitialized(KMSWebApp.java:170) > {noformat} > We're investigating the exact scenario that could lead to this, but the NPE > and log around it can be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392299#comment-16392299 ] Aaron Fabbri commented on HADOOP-15209: --- Noticed you just mentioned cancelling the patch, nevermind on my last "is it ready" comment. My first feedback is about CopyCommitter#deleteMissing(). The goal seems to be to reduce no-op deletes, but you have 3 retries with 1 second sleeps on failed deletes. Ideally we'd only do that for S3, or add a config flag (default false) to enable retries there. Really we should be able to query the FS for capabilities and do retry for eventual consistent stores. Just ping me when you think this is ready to commit and I'll re-review. > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures
[ https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392206#comment-16392206 ] Arpit Agarwal commented on HADOOP-12897: Pre-commit only runs unit tests for the changed module to save time. > KerberosAuthenticator.authenticate to include URL on IO failures > > > Key: HADOOP-12897 > URL: https://issues.apache.org/jira/browse/HADOOP-12897 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Minor > Fix For: 3.2.0 > > Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, > HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, > HADOOP-12897.006.patch, HADOOP-12897.007.patch > > > If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you > get a stack trace, but without the URL it is trying to talk to. > That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} > handler —which can't be called here as its not in the {{hadoop-auth}} module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures
[ https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392205#comment-16392205 ] Ajay Kumar commented on HADOOP-12897: - [~xiaochen],[~arpitagarwal] Sorry i missed this earlier. Not sure why pre-commit didn't caught it. Thanks for fixing it. > KerberosAuthenticator.authenticate to include URL on IO failures > > > Key: HADOOP-12897 > URL: https://issues.apache.org/jira/browse/HADOOP-12897 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Minor > Fix For: 3.2.0 > > Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, > HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, > HADOOP-12897.006.patch, HADOOP-12897.007.patch > > > If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you > get a stack trace, but without the URL it is trying to talk to. > That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} > handler —which can't be called here as its not in the {{hadoop-auth}} module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392198#comment-16392198 ] Íñigo Goiri commented on HADOOP-13144: -- I think I'm too used to the HDFS runs... Two full +1 from Yetus in a day! > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch, > HADOOP-13144.002.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures
[ https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392186#comment-16392186 ] Arpit Agarwal commented on HADOOP-12897: Thanks for committing the fix [~xiaochen]. > KerberosAuthenticator.authenticate to include URL on IO failures > > > Key: HADOOP-12897 > URL: https://issues.apache.org/jira/browse/HADOOP-12897 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Minor > Fix For: 3.2.0 > > Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, > HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, > HADOOP-12897.006.patch, HADOOP-12897.007.patch > > > If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you > get a stack trace, but without the URL it is trying to talk to. > That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} > handler —which can't be called here as its not in the {{hadoop-auth}} module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392185#comment-16392185 ] Arpit Agarwal commented on HADOOP-15280: Thanks for taking care of this [~bharatviswa] and [~xiaochen]. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.2.0 > > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392178#comment-16392178 ] genericqa commented on HADOOP-13144: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 39s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-13144 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913669/HADOOP-13144.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7825417f3dac 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 113f401 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14286/testReport/ | | Max. process+thread count | 1355 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14286/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Enhancing IPC client throughput via multiple connections per user > - > >
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392125#comment-16392125 ] Akira Ajisaka commented on HADOOP-15293: +1, LGTM > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HADOOP-13144: - Attachment: HADOOP-13144.002.patch > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch, > HADOOP-13144.002.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15297) Make S3A etag => checksum feature optional
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392043#comment-16392043 ] Devaraj Das commented on HADOOP-15297: -- Seems fine except for a minor issue.. There is an empty test() method that you should remove. > Make S3A etag => checksum feature optional > -- > > Key: HADOOP-15297 > URL: https://issues.apache.org/jira/browse/HADOOP-15297 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15297-001.patchh > > > HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with > checksums. > Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to > s3a. > Rather than revert I want to make it an option, off by default. Once we are > happy with distcp in future, we can turn it on. > Why an option? Because it lines up for a successor to distcp which saves src > and dest checksums to a file and can then verify whether or not files have > really changed. Currently distcp relies on dest checksum algorithm being the > same as the src for incremental updates, but if either of the stores don't > serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392038#comment-16392038 ] Aaron Fabbri commented on HADOOP-15209: --- I will try to review / test this today. > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392038#comment-16392038 ] Aaron Fabbri edited comment on HADOOP-15209 at 3/8/18 10:42 PM: I will try to review / test this today. You feel like this is ready to commit? was (Author: fabbri): I will try to review / test this today. > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15299) Bump Hadoop's Jackson 2 dependency 2.9.x
[ https://issues.apache.org/jira/browse/HADOOP-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15299: --- Description: There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes were released for Jackson 2.8.x and 2.9.x but not 2.7.x (which we're on). We shouldn't be on an unmaintained line, regardless. HBase is already on 2.9.x, we have a shaded client now, the API changes are relatively minor and so far in my testing I haven't seen any problems. I think many of our usual reasons to hesitate upgrading this dependency don't apply. (was: There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes were released for 2.8.x and 2.9.x but not 2.7.x (which we're on). We shouldn't be on an unmaintained line, regardless. HBase is already on 2.9.x, we have a shaded client now, the API changes are relatively minor and so far in my testing I haven't seen any problems. I think many of our usual reasons to hesitate upgrading this dependency don't apply.) > Bump Hadoop's Jackson 2 dependency 2.9.x > > > Key: HADOOP-15299 > URL: https://issues.apache.org/jira/browse/HADOOP-15299 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0, 3.2.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > > There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) > mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes > were released for Jackson 2.8.x and 2.9.x but not 2.7.x (which we're on). We > shouldn't be on an unmaintained line, regardless. HBase is already on 2.9.x, > we have a shaded client now, the API changes are relatively minor and so far > in my testing I haven't seen any problems. I think many of our usual reasons > to hesitate upgrading this dependency don't apply. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391984#comment-16391984 ] Wei Yan commented on HADOOP-13144: -- +1 from my side. The RPC.java is not active updated part. Maybe [~cnauroth] , [~steve_l] can help take a look? > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391878#comment-16391878 ] genericqa commented on HADOOP-13144: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 7 new + 368 unchanged - 0 fixed = 375 total (was 368) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-13144 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913638/HADOOP-13144.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a18365f71f7e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 113f401 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/testReport/ | | Max. process+thread count | 1589 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/console | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output
[ https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391870#comment-16391870 ] Devaraj Das commented on HADOOP-15277: -- +1 > remove .FluentPropertyBeanIntrospector from CLI operation log output > > > Key: HADOOP-15277 > URL: https://issues.apache.org/jira/browse/HADOOP-15277 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15277-001.patch > > > When hadoop metrics is started, a message about bean introspection appears. > {code} > 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when > creating PropertyDescriptor for public final void > org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! > Ignoring this property. > {code} > When using wasb or s3a,. this message appears in the client logs, because > they both start metrics > I propose to raise the log level to ERROR for that class in log4j.properties -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391856#comment-16391856 ] Virajith Jalaparti commented on HADOOP-15292: - [~ste...@apache.org], Thanks for reviewing and committing it. [~elgoiri] and [~chris.douglas], thanks for the reviews. > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always
[ https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391853#comment-16391853 ] Steve Loughran commented on HADOOP-15300: - wasb updates every time. As does adl {code:java} File System Counters FILE: Number of bytes read=1640418 FILE: Number of bytes written=1636188 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 WASB: Number of bytes read=0 WASB: Number of bytes written=915753 WASB: Number of read operations=0 WASB: Number of large read operations=0 WASB: Number of write operations=0 Map-Reduce Framework Map input records=96 Map output records=0 Input split bytes=308 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=16 Total committed heap usage (bytes)=408944640 File Input Format Counters Bytes Read=34752 File Output Format Counters Bytes Written=16 DistCp Counters Bandwidth in Btyes=12212 Bytes Copied=461862 Bytes Expected=461862 Files Copied=69 DIR_COPY=27 118.98 real13.33 user 1.83 sys {code} Updated {code:java} 2018-03-08 15:21:44,045 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1665)) - Counters: 25 File System Counters FILE: Number of bytes read=1635633 FILE: Number of bytes written=1630856 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 WASB: Number of bytes read=0 WASB: Number of bytes written=910462 WASB: Number of read operations=0 WASB: Number of large read operations=0 WASB: Number of write operations=0 Map-Reduce Framework Map input records=96 Map output records=0 Input split bytes=306 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=18 Total committed heap usage (bytes)=457179136 File Input Format Counters Bytes Read=35264 File Output Format Counters Bytes Written=16 DistCp Counters Bandwidth in Btyes=10566 Bytes Copied=461862 Bytes Expected=461862 Files Copied=69 DIR_COPY=27 129.40 real14.55 user 2.08 sys {code} > distcp -update to WASB and ADL copies up all the files, always > -- > > Key: HADOOP-15300 > URL: https://issues.apache.org/jira/browse/HADOOP-15300 > Project: Hadoop Common > Issue Type: Bug > Components: fs/adl, fs/azure >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Major > > If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the > source files are copied up every time. In contrast, if you use hdfs:// or > s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums > for a diff, but s3a is just returning file length and relying on distcp logic > being "if either src or dest doesn't do checksums, only compare file len" > somehow that's not kicking in. Tested for file: and hdfs sources, wasb and > adl dests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always
[ https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391851#comment-16391851 ] Steve Loughran commented on HADOOP-15300: - distcp to s3a {code} 2018-03-08 15:09:17,385 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1658)) - Job job_local1068976850_0001 completed successfully 2018-03-08 15:09:17,394 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1665)) - Counters: 25 File System Counters FILE: Number of bytes read=1622306 FILE: Number of bytes written=1634552 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 S3A: Number of bytes read=0 S3A: Number of bytes written=897647 S3A: Number of read operations=1688 S3A: Number of large read operations=0 S3A: Number of write operations=902 Map-Reduce Framework Map input records=96 Map output records=0 Input split bytes=306 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=63 Total committed heap usage (bytes)=752877568 File Input Format Counters Bytes Read=34752 File Output Format Counters Bytes Written=16 DistCp Counters Bandwidth in Btyes=32392 Bytes Copied=461862 Bytes Expected=461862 Files Copied=69 DIR_COPY=27 {code} second {code} 018-03-08 15:10:07,937 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1658)) - Job job_local864019435_0001 completed successfully 2018-03-08 15:10:07,944 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1665)) - Counters: 24 File System Counters FILE: Number of bytes read=724653 FILE: Number of bytes written=1651348 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 S3A: Number of bytes read=0 S3A: Number of bytes written=0 S3A: Number of read operations=389 S3A: Number of large read operations=0 S3A: Number of write operations=0 Map-Reduce Framework Map input records=96 Map output records=69 Input split bytes=304 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=6 Total committed heap usage (bytes)=529530880 File Input Format Counters Bytes Read=34752 File Output Format Counters Bytes Written=11169 DistCp Counters Bandwidth in Btyes=0 Bytes Skipped=461862 DIR_COPY=27 Files Skipped=69 {code} > distcp -update to WASB and ADL copies up all the files, always > -- > > Key: HADOOP-15300 > URL: https://issues.apache.org/jira/browse/HADOOP-15300 > Project: Hadoop Common > Issue Type: Bug > Components: fs/adl, fs/azure >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Major > > If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the > source files are copied up every time. In contrast, if you use hdfs:// or > s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums > for a diff, but s3a is just returning file length and relying on distcp logic > being "if either src or dest doesn't do checksums, only compare file len" > somehow that's not kicking in. Tested for file: and hdfs sources, wasb and > adl dests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always
Steve Loughran created HADOOP-15300: --- Summary: distcp -update to WASB and ADL copies up all the files, always Key: HADOOP-15300 URL: https://issues.apache.org/jira/browse/HADOOP-15300 Project: Hadoop Common Issue Type: Bug Components: fs/adl, fs/azure Affects Versions: 3.1.0 Reporter: Steve Loughran If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the source files are copied up every time. In contrast, if you use hdfs:// or s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums for a diff, but s3a is just returning file length and relying on distcp logic being "if either src or dest doesn't do checksums, only compare file len" somehow that's not kicking in. Tested for file: and hdfs sources, wasb and adl dests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391846#comment-16391846 ] Steve Loughran commented on HADOOP-15209: - yes, as jenkins never runs any of the store tests...we can't give it the credentials. Policy for submitting a patch to a specific store is: declare the endpoint you ran against, and no declaration == no review. We're strict even with ourselves. Changes to the layers indirectly used by the stores (hadoop common, distcp) aren't so well managed. if people know they are going to interfere with a store then they should test. Cancelling the current patch as I've decided that the retry logic is over-convoluted, I'm just going to ignore if delete(path) returns false as all the connectors just mean "no file there, so no operation attempted"...except for Ftp though. > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15299) Bump Hadoop's Jackson 2 dependency 2.9.x
Sean Mackrory created HADOOP-15299: -- Summary: Bump Hadoop's Jackson 2 dependency 2.9.x Key: HADOOP-15299 URL: https://issues.apache.org/jira/browse/HADOOP-15299 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.1.0, 3.2.0 Reporter: Sean Mackrory Assignee: Sean Mackrory There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes were released for 2.8.x and 2.9.x but not 2.7.x (which we're on). We shouldn't be on an unmaintained line, regardless. HBase is already on 2.9.x, we have a shaded client now, the API changes are relatively minor and so far in my testing I haven't seen any problems. I think many of our usual reasons to hesitate upgrading this dependency don't apply. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15209: Status: Open (was: Patch Available) > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391796#comment-16391796 ] Aaron Fabbri commented on HADOOP-15206: --- Ah.. yes, I didn't notice the read() call. Thank you, makes sense now. > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Assignee: Aki Tanaka >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 > > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, > HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, > HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, > HADOOP-15206.008.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391727#comment-16391727 ] Hudson commented on HADOOP-15280: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13798 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13798/]) HADOOP-15280. TestKMS.testWebHDFSProxyUserKerb and (xiao: rev a906a226458a0b4c4b2df61d9bcf375a1d194925) * (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.2.0 > > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391711#comment-16391711 ] Xiao Chen commented on HADOOP-14445: Failed tests in TestKMS are not related. TestLBKMSCP test case should be removed, now that we do not have the URI format configuration - will do that in next rev. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391669#comment-16391669 ] Íñigo Goiri commented on HADOOP-13144: -- Thanks [~ywskycn] for trying this out; I posted [^HADOOP-13144.001.patch] with the fixes for compilation. I submitted the patch so Yetus should cover the next ones. In general, I think this is touching a pretty sensitive part of the Hadoop code but I think the modifications are pretty minimal. At the same time, as [~ywskycn] pointed out, it helps dramatically with the performance of the Routers for HDFS. We would open a separate JIRA for the Router connection creation if this goes in. Anybody available for a review? > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15280: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~rchiang] for filing the Jira and [~bharatviswa] for the fix. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.2.0 > > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HADOOP-13144: - Attachment: HADOOP-13144.001.patch > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HADOOP-13144: - Assignee: Íñigo Goiri Status: Patch Available (was: Open) > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391663#comment-16391663 ] Xiao Chen commented on HADOOP-15280: I was more thinking of walking into the cause of the exception and check cause there in the util. But don't feel strongly. Let's fix trunk tests for now. +1 > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391647#comment-16391647 ] Íñigo Goiri commented on HADOOP-15209: -- The ADL related tests seem ignored ([report|https://builds.apache.org/job/PreCommit-HADOOP-Build/14263/testReport/org.apache.hadoop.fs.adl.live/TestAdlContractDistCpLive/]). Is that expected? > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391646#comment-16391646 ] genericqa commented on HADOOP-15293: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 52s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15293 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913624/HADOOP-15293.2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 04f186aeffef 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7ef4d94 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14283/testReport/ | | Max. process+thread count | 1357 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14283/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL:
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391645#comment-16391645 ] genericqa commented on HADOOP-15280: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-common-project/hadoop-kms: The patch generated 0 new + 97 unchanged - 1 fixed = 97 total (was 98) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 59s{color} | {color:green} hadoop-kms in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15280 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913627/HADOOP-15280.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 360098f02dee 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7ef4d94 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14284/testReport/ | | Max. process+thread count | 319 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-kms U: hadoop-common-project/hadoop-kms | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14284/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple
[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391568#comment-16391568 ] Aki Tanaka commented on HADOOP-15206: - In my understanding, skipBytes-- will not be executed basically. bufferedIn.read() is executed only when skipBytes is 0, this usually means that the file position is at the end of the split. However, InputStream.skip says "Skips over and discards {{n}} bytes of data from the input stream. The {{skip}} method may, for a variety of reasons, end up skipping over some smaller number of bytes, possibly {{0}}. The actual number of bytes skipped is returned." ([https://docs.oracle.com/javase/7/docs/api/java/io/FilterInputStream.html).] So I thought InputStream.skip() might return 0 even if the position is not at the end of the split. Please let me know if my understanding is wrong. Thank you. > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Assignee: Aki Tanaka >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 > > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, > HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, > HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, > HADOOP-15206.008.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output
[ https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15277: Description: When hadoop metrics is started, a message about bean introspection appears. {code} 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property. {code} When using wasb or s3a,. this message appears in the client logs, because they both start metrics I propose to raise the log level to ERROR for that class in log4j.properties was: when using the default logs, I get told off by beanutils {code} 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property. {code} This is a distraction. I propose to raise the log level to ERROR for that class in log4j.properties > remove .FluentPropertyBeanIntrospector from CLI operation log output > > > Key: HADOOP-15277 > URL: https://issues.apache.org/jira/browse/HADOOP-15277 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15277-001.patch > > > When hadoop metrics is started, a message about bean introspection appears. > {code} > 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when > creating PropertyDescriptor for public final void > org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! > Ignoring this property. > {code} > When using wasb or s3a,. this message appears in the client logs, because > they both start metrics > I propose to raise the log level to ERROR for that class in log4j.properties -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13622) `-atomic` should not be supported while using `distcp` command in object file system
[ https://issues.apache.org/jira/browse/HADOOP-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391525#comment-16391525 ] Steve Loughran commented on HADOOP-13622: - having been playing with distcp, I consider HADOOP-15281 a priority item as it means that every upload forces a rename of data, even without the -atomic operator > `-atomic` should not be supported while using `distcp` command in object file > system > > > Key: HADOOP-13622 > URL: https://issues.apache.org/jira/browse/HADOOP-13622 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Minor > > After discussing with [~ste...@apache.org] in HADOOP-13593, I get the point > that none of the object stores support atomic renames. So I file a new jira > and ready to provide a patch to disable `distcp -atomic` in object file > system. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15297) Make S3A etag => checksum feature optional
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15297: Summary: Make S3A etag => checksum feature optional (was: Make s3a etag -> checksum publishing option) > Make S3A etag => checksum feature optional > -- > > Key: HADOOP-15297 > URL: https://issues.apache.org/jira/browse/HADOOP-15297 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15297-001.patchh > > > HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with > checksums. > Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to > s3a. > Rather than revert I want to make it an option, off by default. Once we are > happy with distcp in future, we can turn it on. > Why an option? Because it lines up for a successor to distcp which saves src > and dest checksums to a file and can then verify whether or not files have > really changed. Currently distcp relies on dest checksum algorithm being the > same as the src for incremental updates, but if either of the stores don't > serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391522#comment-16391522 ] Jason Lowe commented on HADOOP-15206: - skipBytes is decremented because of the read() call. The skip() call is not guaranteed to be able to skip, and the workaround in that case is to try to read(). If the read() is successful then we were able to skip one more byte and need to account for that in the total number of bytes trying to be skipped. > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Assignee: Aki Tanaka >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 > > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, > HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, > HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, > HADOOP-15206.008.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497 ] Bharat Viswanadham edited comment on HADOOP-15280 at 3/8/18 4:46 PM: - [~xiaochen] As we are wrapping the exception with a new message, checking the original cause exception will get original message. I have not added any utility methods to GenericTestUtils, let me know if you want to do in a different approach, other than proposed in the patch. Attached v02 patch. was (Author: bharatviswa): As we are wrapping the exception with a new message, checking the original cause exception will get original message. Attached v02 patch. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497 ] Bharat Viswanadham commented on HADOOP-15280: - As we are wrapping the exception with a new message, checking the original cause exception will get original message. Attached v02 to patch. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497 ] Bharat Viswanadham edited comment on HADOOP-15280 at 3/8/18 4:44 PM: - As we are wrapping the exception with a new message, checking the original cause exception will get original message. Attached v02 patch. was (Author: bharatviswa): As we are wrapping the exception with a new message, checking the original cause exception will get original message. Attached v02 to patch. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15280: Attachment: HADOOP-15280.01.patch > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391493#comment-16391493 ] Steve Loughran commented on HADOOP-15209: - FWIW, managed to fail distcp if you run it against an S3 store w/ simulated inconsistency turned on (no s3guard); operation saw duplicate entries in the directory listing at the destination. {code} 2018-03-08 16:33:17,517 [Thread-131] WARN mapred.LocalJobRunner (LocalJobRunner.java:run(590)) - job_local148600535_0001 org.apache.hadoop.tools.CopyListing$DuplicateFileException: File s3a://hwdev-steve-frankfurt-new/SLOW/hadoop-auth/src and s3a://hwdev-steve-frankfurt-new/SLOW/hadoop-auth/src would cause duplicates. Aborting at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:175) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:89) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) at org.apache.hadoop.tools.mapred.CopyCommitter.listTargetFiles(CopyCommitter.java:575) at org.apache.hadoop.tools.mapred.CopyCommitter.deleteMissing(CopyCommitter.java:402) at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:117) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:567) 2018-03-08 16:33:18,469 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1660)) - Job job_local148600535_0001 failed with state FAILED due to: NA 2018-03-08 16:33:18,478 [main] INFO mapreduce.Job (Job.java:monitorAndPrintJob(1665)) - Counters: 25 File System Counters FILE: Number of bytes read=1621092 FILE: Number of bytes written=1632776 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 S3A: Number of bytes read=0 S3A: Number of bytes written=895927 S3A: Number of read operations=1673 S3A: Number of large read operations=0 S3A: Number of write operations=904 Map-Reduce Framework Map input records=96 {code} I'm not going to fix that here > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391491#comment-16391491 ] Wei Yan commented on HADOOP-13144: -- Thanks for the patch [~elgoiri]. I tried it yesterday and it worked well. The Router RPC throughput has been largely improved, and RPC handlers are not blocked on the connection itself. BTW, it also needs to add new function implementation in classed ProtobufRpcEngine and TestRPC.StoppedRpcEngine. > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Priority: Minor > Attachments: HADOOP-13144.000.patch > > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13023) Distcp with -update feature on first time raw data not working
[ https://issues.apache.org/jira/browse/HADOOP-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13023: Component/s: tools/distcp > Distcp with -update feature on first time raw data not working > -- > > Key: HADOOP-13023 > URL: https://issues.apache.org/jira/browse/HADOOP-13023 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.6.0 >Reporter: Mavin Martin >Priority: Major > > When attempting to do a distcp with the -update feature toggled on encrypted > data, the distcp shows as successful. Reading the encrypted file on the > target_path does not work since the keyName does not exist. > Please see my example to reproduce the issue. > {code} > [root@xxx bin]# hdfs crypto -listZones > /tmp/a/tedDEF00013 > [root@xxx bin]# hdfs dfs -ls -R /tmp > drwxr-xr-x - xxx xxx 0 2016-04-14 00:22 /tmp/a > drwxr-xr-x - xxx xxx 0 2016-04-14 00:00 /tmp/a/ted > -rw-r--r-- 3 xxx xxx 33 2016-04-14 00:00 /tmp/a/ted/test.txt > [root@xxx bin]# hadoop distcp -update /.reserved/raw/tmp/a/ted > /.reserved/raw/tmp/a-with-update/ted > [root@xxx bin]# hdfs crypto -listZones > /tmp/a/tedDEF00013 > [root@xxx bin]# hadoop distcp /.reserved/raw/tmp/a/ted > /.reserved/raw/tmp/a-no-update/ted > [root@xxx bin]# hdfs crypto -listZones > /tmp/a/tedDEF00013 > /tmp/a-no-update/ted DEF00013 > {code} > The crypto zone for 'a-with-update' should have been created since this is a > new destination. You can verify this by looking at 'a-no-update'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391474#comment-16391474 ] Takanobu Asanuma commented on HADOOP-15293: --- Thanks for the review, [~ste...@apache.org]. Actually, I was wondering that way or the 1st patch's approach. Uploaded a new patch. Surely this is simpler. > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HADOOP-15293: -- Attachment: HADOOP-15293.2.patch > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15298) provide non-zero default for the Azure rename & delete thread pool sizes
Steve Loughran created HADOOP-15298: --- Summary: provide non-zero default for the Azure rename & delete thread pool sizes Key: HADOOP-15298 URL: https://issues.apache.org/jira/browse/HADOOP-15298 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure, tools/distcp Affects Versions: 3.0.0 Reporter: Steve Loughran If you provide non-zero values for the rename & delete threads, distcp gets faster -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15158) AliyunOSS: Supports role based credential in URL
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391230#comment-16391230 ] Steve Loughran commented on HADOOP-15158: - Is this adding the idea of putting user:secret into the URI? If so, I'm going to have to -1 it on security grounds. If you look at HADOOP-3733 you can see the effort I had to put in to try and keep secrets embedded in s3n/s3a URLs out of logs, and even then failed. If you put confidental secrets in URLs, they get into Paths, which get into error messages and stack traces, and so into bug reports. I know this, I've seen it. It's why I'm getting close to cutting the user:secret feature from S3A entirely. except if users explicity enable it with an option to make clear you shouldn't be doing it "fs.s3a.dangerous.secrets.in.uris". S3a does per-bucket settings on URIs & lets you hide secrets in URLs, ADL had just added this (HADOOP-13972). I believe this is the better way to do it, as it also lets you tune any other option on a container-by-container basis > AliyunOSS: Supports role based credential in URL > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch, HADOOP-15158.004.patch, HADOOP-15158.005.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391227#comment-16391227 ] Steve Loughran commented on HADOOP-15293: - not that pretty, Why not look for if you looked for the string "recognized SSL message" and you'll have a partial match on both? > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391154#comment-16391154 ] Rushabh S Shah commented on HADOOP-15292: - [~ste...@apache.org]: This seems like a performance improvement to distcp tool. Should we backport to branch-2.8 also ? > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391140#comment-16391140 ] Hudson commented on HADOOP-15273: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13797 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13797/]) HADOOP-15273.distcp can't handle remote stores with different checksum (stevel: rev 7ef4d942dd96232b0743a40ed25f77065254f94d) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Fix For: 3.1.0 > > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391120#comment-16391120 ] Hudson commented on HADOOP-15292: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13796 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13796/]) HADOOP-15292. Distcp's use of pread is slowing it down. Contributed by (stevel: rev 3bd6b1fd85c44354c777ef4fda6415231505b2a4) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) committed to branch-3.1+; reran copy mapper test first. Test-wise, this shows we need some more realistic store distcp tests, specifically: need to set HDFS <--> store rather than just local <--> store. And also: intra-store, inter-store. Which will make it a fairly complex piece of work. > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Fix For: 3.1.0 > > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15292: Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) +1 committed. Ran the S3A distcp test first. thanks for finding & fixing this Virajith! > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned HADOOP-15292: --- Assignee: Virajith Jalaparti > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390999#comment-16390999 ] genericqa commented on HADOOP-14445: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 10s{color} | {color:orange} hadoop-common-project: The patch generated 12 new + 288 unchanged - 7 fixed = 300 total (was 295) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 13s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 33s{color} | {color:red} hadoop-kms in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 39s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider | | | hadoop.crypto.key.kms.server.TestKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-14445 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913556/HADOOP-14445.05.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux e8eba6ba1c08 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page
[ https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390984#comment-16390984 ] Takanobu Asanuma commented on HADOOP-15296: --- Thanks for reviewing and committing it, [~linyiqun]! > Fix a wrong link for RBF in the top page > > > Key: HADOOP-15296 > URL: https://issues.apache.org/jira/browse/HADOOP-15296 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Minor > Fix For: 3.1.0, 3.2.0, 3.0.2 > > Attachments: HADOOP-15296.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390932#comment-16390932 ] genericqa commented on HADOOP-14999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-14999 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913562/diff-between-patch7-and-patch8.txt | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14282/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910 ] Genmao Yu edited comment on HADOOP-14999 at 3/8/18 8:42 AM: Thanks for [~Sammi] 's review. 1. comment-1: remove unused config and refine some config 2. comment-2: fixed 3. comment-3: Sorry, any problems? All style check passed. {code:java} Preconditions.checkArgument(v >= min, String.format("Value of %s: %d is below the minimum value %d", key, v, min)); {code} 4. comment-4: update unit test 5. comment-5: IMHO, It is too large to test 5GB in integration test. And {{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned. 6. "But they are not cleaned when exception happens during the write() process.": all temp files are {{deleteOnExit}}, but I also add the resource clean logic in {{try-finally}} performance test: test file upload |file size|before patch|after patch (with 4 parallelism)| |10MB|1.03s|1.1s| |100MB|6.5s|2.3s| |1GB|56.5s|13.5s| |10GB|574s|173s| was (Author: unclegen): Thanks for [~Sammi] 's review. 1. comment-1: remove unused config and refine some config 2. comment-2: fixed 3. comment-3: Sorry, any problems? {code} Preconditions.checkArgument(v >= min, String.format("Value of %s: %d is below the minimum value %d", key, v, min)); {code} 4. comment-4: update unit test 5. comment-5: IMHO, It is too large to test 5GB in integration test. And {{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned. 6. "But they are not cleaned when exception happens during the write() process.": all temp files are {{deleteOnExit}}, but I also add the resource clean logic in {{try-finally}} performance test: test file upload |file size|before patch|after patch (with 4 parallelism)| |10MB|1.03s|1.1s| |100MB|6.5s|2.3s| |1GB|56.5s|13.5s| |10GB|574s|173s| > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page
[ https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390917#comment-16390917 ] Hudson commented on HADOOP-15296: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13794 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13794/]) HADOOP-15296. Fix a wrong link for RBF in the top page. Contributed by (yqlin: rev 4cc9a6d9bb34329d6de30706d5432c7cb675bb88) * (edit) hadoop-project/src/site/markdown/index.md.vm > Fix a wrong link for RBF in the top page > > > Key: HADOOP-15296 > URL: https://issues.apache.org/jira/browse/HADOOP-15296 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Minor > Fix For: 3.1.0, 3.2.0, 3.0.2 > > Attachments: HADOOP-15296.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390912#comment-16390912 ] Genmao Yu commented on HADOOP-14999: All the tests passed against "oss-cn-shanghai.aliyuncs.com > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390911#comment-16390911 ] Genmao Yu commented on HADOOP-14999: diff-between-patch7-and-patch8.txt shows the detailed changes > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: diff-between-patch7-and-patch8.txt > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910 ] Genmao Yu commented on HADOOP-14999: Thanks for [~Sammi] 's review. 1. comment-1: remove unused config and refine some config 2. comment-2: fixed 3. comment-3: Sorry, any problems? {code} Preconditions.checkArgument(v >= min, String.format("Value of %s: %d is below the minimum value %d", key, v, min)); {code} 4. comment-4: update unit test 5. comment-5: IMHO, It is too large to test 5GB in integration test. And {{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned. 6. "But they are not cleaned when exception happens during the write() process.": all temp files are {{deleteOnExit}}, but I also add the resource clean logic in {{try-finally}} performance test: test file upload |file size|before patch|after patch (with 4 parallelism)| |10MB|1.03s|1.1s| |100MB|6.5s|2.3s| |1GB|56.5s|13.5s| |10GB|574s|173s| > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.008.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15296) Fix a wrong link for RBF in the top page
[ https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HADOOP-15296: --- Affects Version/s: 3.0.0 Target Version/s: 3.1.0, 3.2.0, 3.0.2 Fix Version/s: 3.0.2 3.2.0 3.1.0 Committed to trunk, branch-3.1 and branch-3.0. Thanks [~tasanuma0829] for the contribution! > Fix a wrong link for RBF in the top page > > > Key: HADOOP-15296 > URL: https://issues.apache.org/jira/browse/HADOOP-15296 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Minor > Fix For: 3.1.0, 3.2.0, 3.0.2 > > Attachments: HADOOP-15296.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15296) Fix a wrong link for RBF in the top page
[ https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HADOOP-15296: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Fix a wrong link for RBF in the top page > > > Key: HADOOP-15296 > URL: https://issues.apache.org/jira/browse/HADOOP-15296 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Minor > Fix For: 3.1.0, 3.2.0, 3.0.2 > > Attachments: HADOOP-15296.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org