[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353510#comment-16353510 ] Aki Tanaka commented on HADOOP-15206: - [~jlowe] Thank you for your insights. I have created a patch based on your comment. As far as I tested, all the unit tests passed and I confirmed that the issue I was seeing was solved. I greatly appreciate any and someone take a look. Alternative proposals are also very welcome. Regarding the duplicated record scenario, the record was read twice when BZip2Codec starts reading at position 0 (BZip2 header) and position 4 (first BZip2 marker). test.bz2:0+1 -> read 100 records test.bz2:3+4 -> read 99 records 2018-02-05 20:49:51,598 ERROR [Thread-3] mapred.TestTextInputFormat2 (TestTextInputFormat2.java:verifyPartitions(324)) - splits[0]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:0+1 count=100 2018-02-05 20:49:51,605 ERROR [Thread-3] mapred.TestTextInputFormat2 (TestTextInputFormat2.java:verifyPartitions(326)) - splits[1]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:1+1 count=0 2018-02-05 20:49:51,608 ERROR [Thread-3] mapred.TestTextInputFormat2 (TestTextInputFormat2.java:verifyPartitions(326)) - splits[2]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:2+1 count=0 2018-02-05 20:49:51,614 ERROR [Thread-3] mapred.TestTextInputFormat2 (TestTextInputFormat2.java:verifyPartitions(313)) - read 1 2018-02-05 20:49:51,617 WARN [Thread-3] mapred.TestTextInputFormat2 (TestTextInputFormat2.java:verifyPartitions(315)) - conflict with 1 in split 3 at position 7 > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Priority: Major > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aki Tanaka updated HADOOP-15206: Attachment: HADOOP-15206.001.patch > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Priority: Major > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353264#comment-16353264 ] genericqa commented on HADOOP-10571: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 43s{color} | {color:green} root generated 0 new + 1234 unchanged - 3 fixed = 1234 total (was 1237) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 31s{color} | {color:orange} root: The patch generated 3 new + 769 unchanged - 35 fixed = 772 total (was 804) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 20s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s{color} | {color:green} hadoop-hdfs-nfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 18s{color} | {color:green} hadoop-gridmix in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s{color} | {color:green} hadoop-openstack in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}235m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-10571 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909303/HADOOP-10571.05.patch | |
[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353214#comment-16353214 ] genericqa commented on HADOOP-15007: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 37s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 4 new + 241 unchanged - 0 fixed = 245 total (was 241) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 1s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 90m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15007 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909280/HADOOP-15007.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2945e22c0e09 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 60656bc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/testReport/ | | Max. process+thread count | 1405 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/console | | Powered by | Apache Yetus
[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15007: Status: Patch Available (was: Open) > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > Attachments: HADOOP-15007.000.patch > > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15211) Distcp update not preserving root directory permissions
PRASHANT GOLASH created HADOOP-15211: Summary: Distcp update not preserving root directory permissions Key: HADOOP-15211 URL: https://issues.apache.org/jira/browse/HADOOP-15211 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Affects Versions: 2.6.0 Reporter: PRASHANT GOLASH "hadoop distcp -pugpb -update " does not preserve permission for directory. Although, it preserves permissions for child directories (/child1 etc.) Using hadoop-distcp', version:'2.6.0-cdh5.7.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15134) ADL problems parsing JSON responses to include error details
[ https://issues.apache.org/jira/browse/HADOOP-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353086#comment-16353086 ] Steve Loughran commented on HADOOP-15134: - FWIW, suspect underlying cause is endpoint returning text/html because a proxy got in the way, and JSON parser isn't checking content-type first > ADL problems parsing JSON responses to include error details > > > Key: HADOOP-15134 > URL: https://issues.apache.org/jira/browse/HADOOP-15134 > Project: Hadoop Common > Issue Type: Bug > Components: fs/adl >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > Labels: supportability > > Currently any failure of ADL's response JSON parsing results in the error > text like "Unexpected error happened reading response stream or parsing JSon > from rename()";". This is not useful. Fix by: including the exception text, > logging the Ex & info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes
[ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353064#comment-16353064 ] genericqa commented on HADOOP-15204: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 241 unchanged - 0 fixed = 242 total (was 241) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 44s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 83m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestRaceWhenRelogin | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15204 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909288/HADOOP-15204.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6d31ca2ae4b2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 33e6cdb | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/testReport/ | | Max. process+thread count | 1396 (vs. ulimit of
[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15209: Status: Open (was: Patch Available) > PoC: DistCp to eliminate needless deletion of files under deleted directories > - > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned HADOOP-15209: --- Assignee: Steve Loughran > PoC: DistCp to eliminate needless deletion of files under deleted directories > - > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15210) Handle FNFE from S3Guard.getMetadataStore() in S3A initialize()
Steve Loughran created HADOOP-15210: --- Summary: Handle FNFE from S3Guard.getMetadataStore() in S3A initialize() Key: HADOOP-15210 URL: https://issues.apache.org/jira/browse/HADOOP-15210 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.0.0 Reporter: Steve Loughran {{S3Guard.getMetadataStore()}} throws FileNotFoundExceptions up, as the comments say " rely on callers to catch and treat specially" S3A Filesystem doesn't do that, instead it will just fail FileSystem.initialize; the FNFE is generated by DynamoDBMetadataStore. Are we happy with this? Downgrading has some appeal: if you don't have the table, it will keep going. But failures could be a sign of bad config, so maybe silent recovery is bad. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353044#comment-16353044 ] Andras Bokor commented on HADOOP-10571: --- {quote} h4. LocalFileSystem L142 we could move to making {{p}} another arg {quote} LocalFilesystem still uses commons logging and a comment says {quote}This log is widely used in the org.apache.hadoop.fs code and tests, so must be considered something to only be changed with care.{quote} Other comments are addressed. > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HADOOP-10571 > URL: https://issues.apache.org/jira/browse/HADOOP-10571 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, > HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch, > HADOOP-10571.05.patch > > > When logging an exception, we often convert the exception to string or call > {{.getMessage}}. Instead we can use the log method overloads which take > {{Throwable}} as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HADOOP-10571: -- Attachment: HADOOP-10571.05.patch > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HADOOP-10571 > URL: https://issues.apache.org/jira/browse/HADOOP-10571 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, > HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch, > HADOOP-10571.05.patch > > > When logging an exception, we often convert the exception to string or call > {{.getMessage}}. Instead we can use the log method overloads which take > {{Throwable}} as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable
[ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353020#comment-16353020 ] Steve Loughran commented on HADOOP-14468: - One thing related to this is whether we should have a TTL on tombstone markers. Even in non-auth mode, when we reconcile the listings, files recorded as deleted are omitted. If someone can create that file via another client, will it ever be seen in listings? > S3Guard: make short-circuit getFileStatus() configurable > > > Key: HADOOP-14468 > URL: https://issues.apache.org/jira/browse/HADOOP-14468 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a > result from the MetadataStore (e.g. dynamodb) first. > I would like to add a new parameter > {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps > the current behavior. When false, S3AFileSystem will check both S3 and the > MetadataStore. > I'm not sure yet if we want to have this behavior the same for all callers of > getFileStatus(), or if we only want to check both S3 and MetadataStore for > some internal callers such as open(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14918) remove the Local Dynamo DB test option
[ https://issues.apache.org/jira/browse/HADOOP-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14918: Status: Open (was: Patch Available) > remove the Local Dynamo DB test option > -- > > Key: HADOOP-14918 > URL: https://issues.apache.org/jira/browse/HADOOP-14918 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0, 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14918-001.patch, HADOOP-14918-002.patch > > > I'm going to propose cutting out the localdynamo test option for s3guard > * the local DDB JAR is unmaintained/lags the SDK We work with...eventually > there'll be differences in API. > * as the local dynamo DB is unshaded. it complicates classpath setup for the > build. Remove it and there's no need to worry about versions of anything > other than the shaded AWS > * it complicates test runs. Now we need to test for both localdynamo *and* > real dynamo > * but we can't ignore real dynamo, because that's the one which matters > While the local option promises to reduce test costs, really, it's just > adding complexity. If you are testing with s3guard, you need to have a real > table to test against., And with the exception of those people testing s3a > against non-AWS, consistent endpoints, everyone should be testing with > S3Guard. > -Straightforward to remove.- -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes
[ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352956#comment-16352956 ] Anu Engineer commented on HADOOP-15204: --- [~ste...@apache.org], [~chris.douglas] Thanks for the comments. Patch v3 addresses all the comments. Details below: bq. IDE shuffled imports; please revert Thanks for catching this, Fixed. bq. parseFromString() can just use Precondition.checkArgument for validation Fixed. bq. validation/parse errors to include value at error, and, ideally, config option too. Compare a stack trace saying "Value not in expected format", with one saying "value of option 'buffer.size' not in expected format "54exa" Fixed. bq. sanitizedValue.toLowerCase() should specify local for case conversion, same everywhere else used. Fixed. bq. What if a caller doesn't want to provide a string default value of the new getters, but just a number? That would let me return something like -1 to mean "no value set", which I can't do with the current API. There is an API that takes a default float argument, and a default string argument with the storage unit. bq. getStorageSize(String name, String defaultValue, + StorageUnit targetUnit) -- Does this come up often? We define the standard defaults as "5 GB", etc., so yes it is a convenient function. bq. I'd lean toward MB instead of MEGABYTES, and similar. Fixed. I agree, thanks for this suggestion, that does improve code readability. bq. Please, no. This is the silliest dependency we have on Guava. Fixed. I still use it in Configuration, since it is already in the file as an import. > Add Configuration API for parsing storage sizes > --- > > Key: HADOOP-15204 > URL: https://issues.apache.org/jira/browse/HADOOP-15204 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch, > HADOOP-15204.003.patch > > > Hadoop has a lot of configurations that specify memory and disk size. This > JIRA proposes to add an API like {{Configuration.getStorageSize}} which will > allow users > to specify units like KB, MB, GB etc. This is JIRA is inspired by > HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great > improvement for ozone code base, this JIRA hopes to do the same thing for > configs that deal with disk and memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15204) Add Configuration API for parsing storage sizes
[ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HADOOP-15204: -- Attachment: HADOOP-15204.003.patch > Add Configuration API for parsing storage sizes > --- > > Key: HADOOP-15204 > URL: https://issues.apache.org/jira/browse/HADOOP-15204 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch, > HADOOP-15204.003.patch > > > Hadoop has a lot of configurations that specify memory and disk size. This > JIRA proposes to add an API like {{Configuration.getStorageSize}} which will > allow users > to specify units like KB, MB, GB etc. This is JIRA is inspired by > HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great > improvement for ozone code base, this JIRA hopes to do the same thing for > configs that deal with disk and memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352934#comment-16352934 ] Ajay Kumar commented on HADOOP-15007: - [~ste...@apache.org],[~anu],[~elek] thanks for valuable feedback. Adding first pass of patch with following changes: * Replaced Enums with a common interface with Strings * Logging moved at trace level. This should handle [~elek] case as well.( i.e "I wouldn't like to see any warnings during my cluster startup as there are no problems with my cluster even if the tags are misspelled. The warning should be displayed during the build time for he developers/reviewers." ) * Added test case for invalid tags * Updated javadocs in Configuration for tag functionality. > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > Attachments: HADOOP-15007.000.patch > > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15007: Attachment: HADOOP-15007.000.patch > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > Attachments: HADOOP-15007.000.patch > > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15007: Attachment: (was: HADOOP-15007.000.patch) > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > Attachments: HADOOP-15007.000.patch > > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15007: Attachment: HADOOP-15007.000.patch > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > Attachments: HADOOP-15007.000.patch > > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352906#comment-16352906 ] genericqa commented on HADOOP-15209: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 24s{color} | {color:orange} root: The patch generated 2 new + 57 unchanged - 0 fixed = 59 total (was 57) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 10s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 16s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.mapred.TestDeletedDirTracker | | | hadoop.tools.contract.TestLocalContractDistCp | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15209 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909258/HADOOP-15209-001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ad790c896377 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4e9a59c | | maven | version: Apache Maven 3.3.9 | | Default Java |
[jira] [Commented] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP
[ https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352889#comment-16352889 ] Sanjay Radia commented on HADOOP-15191: --- Steve, can you please explain how this will be used? For example will distcp call the fs-object to see if it has a bulk delete and then call that fs's bulk deletes? Alternatively we could add a bulk delete operation to the FileSystem and FileContext API and have distcp simply call fs.bulkDelete(...); the fs implementation will either call the bulk delete operation or call individual delete. The second approach has the advantage that distcp's code is simpler. > Add Private/Unstable BulkDelete operations to supporting object stores for > DistCP > - > > Key: HADOOP-15191 > URL: https://issues.apache.org/jira/browse/HADOOP-15191 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, > HADOOP-15191-003.patch, HADOOP-15191-004.patch > > > Large scale DistCP with the -delete option doesn't finish in a viable time > because of the final CopyCommitter doing a 1 by 1 delete of all missing > files. This isn't randomized (the list is sorted), and it's throttled by AWS. > If bulk deletion of files was exposed as an API, distCP would do 1/1000 of > the REST calls, so not get throttled. > Proposed: add an initially private/unstable interface for stores, > {{BulkDelete}} which declares a page size and offers a > {{bulkDelete(List)}} operation for the bulk deletion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15203) Support composite trusted channel resolver that supports both whitelist and blacklist
[ https://issues.apache.org/jira/browse/HADOOP-15203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352857#comment-16352857 ] genericqa commented on HADOOP-15203: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 2s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 58s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15203 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909245/HADOOP-15203.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3a6bbe3c29f1 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4e9a59c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14071/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HADOOP-11867: - Labels: performance (was: ) > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HADOOP-11867: - Component/s: hdfs-client > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HADOOP-11867: - Affects Version/s: (was: 2.8.0) 3.0.0 > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15209: Status: Patch Available (was: Open) > PoC: DistCp to eliminate needless deletion of files under deleted directories > - > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
Steve Loughran created HADOOP-15209: --- Summary: PoC: DistCp to eliminate needless deletion of files under deleted directories Key: HADOOP-15209 URL: https://issues.apache.org/jira/browse/HADOOP-15209 Project: Hadoop Common Issue Type: Improvement Components: tools/distcp Affects Versions: 2.9.0 Reporter: Steve Loughran DistCP issues a delete(file) request even if is underneath an already deleted directory. This generates needless load on filesystems/object stores, and, if the store throttles delete, can dramatically slow down the delete operation. If the distcp delete operation can build a history of deleted directories, then it will know when it does not need to issue those deletes. Care is needed here to make sure that whatever structure is created does not overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352741#comment-16352741 ] Steve Loughran commented on HADOOP-15209: - HADOOP-15209 patch 001 this builds up a set of paths which have have been deleted. bug: its broken, as shown by test failures. I know the problem, but HADOOP-15208 shows my plan: move this out of distcp and experiment elsewhere. > PoC: DistCp to eliminate needless deletion of files under deleted directories > - > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15209: Attachment: HADOOP-15209-001.patch > PoC: DistCp to eliminate needless deletion of files under deleted directories > - > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15208) DistCp to offer option to save src/dest filesets as alternative to delete()
[ https://issues.apache.org/jira/browse/HADOOP-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352729#comment-16352729 ] Steve Loughran commented on HADOOP-15208: - See HADOOP-15191 as one scale strategy > DistCp to offer option to save src/dest filesets as alternative to delete() > --- > > Key: HADOOP-15208 > URL: https://issues.apache.org/jira/browse/HADOOP-15208 > Project: Hadoop Common > Issue Type: New Feature > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > There are opportunities to improve distcp delete performance and scalability > with object stores, but you need to test with production datasets to > determine if the optimizations work, don't run out of memory, etc. > By adding the option to save the sequence files of source, dest listings, > people (myself included) can experiment with different strategies before > trying to commit one which doesn't scale -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15208) DistCp to offer option to save src/dest filesets as alternative to delete()
Steve Loughran created HADOOP-15208: --- Summary: DistCp to offer option to save src/dest filesets as alternative to delete() Key: HADOOP-15208 URL: https://issues.apache.org/jira/browse/HADOOP-15208 Project: Hadoop Common Issue Type: New Feature Components: tools/distcp Affects Versions: 2.9.0 Reporter: Steve Loughran Assignee: Steve Loughran There are opportunities to improve distcp delete performance and scalability with object stores, but you need to test with production datasets to determine if the optimizations work, don't run out of memory, etc. By adding the option to save the sequence files of source, dest listings, people (myself included) can experiment with different strategies before trying to commit one which doesn't scale -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP
[ https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15191: Status: Open (was: Patch Available) > Add Private/Unstable BulkDelete operations to supporting object stores for > DistCP > - > > Key: HADOOP-15191 > URL: https://issues.apache.org/jira/browse/HADOOP-15191 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, > HADOOP-15191-003.patch, HADOOP-15191-004.patch > > > Large scale DistCP with the -delete option doesn't finish in a viable time > because of the final CopyCommitter doing a 1 by 1 delete of all missing > files. This isn't randomized (the list is sorted), and it's throttled by AWS. > If bulk deletion of files was exposed as an API, distCP would do 1/1000 of > the REST calls, so not get throttled. > Proposed: add an initially private/unstable interface for stores, > {{BulkDelete}} which declares a page size and offers a > {{bulkDelete(List)}} operation for the bulk deletion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352722#comment-16352722 ] Steve Loughran commented on HADOOP-10571: - It's way past time to get this in, so lets aim for this week. I've reviewed it all by counting {} and arg numbers, checking that e.getMessage() is only called when I'm happy that this value isn't null (i.e. if its a subclass of IOE), and that no info is being lost compared to before. As this goes near HDFS, can you create a JIRA here and mark it as part of this one? That'll let the hdfs dev team know what's coming. h4. LocalFileSystem L142 we could move to making {{p}} another arg h4. DNS L314. The log pattern doesn't include the localhost string L432: review h4. DataNode L2435 that's complex enough that it should retain the debug enabled guard. L2435 replace use of %d with {} L2724. Use e.toString() L3360. Possibly better as {{LOG.debug("{}", sb)}} ; avoids calling sb.toString() when not needed h4. DataXceiver L717 retain isDebug guard to avoid calling Arrays.asList() h3. EditLogBackupInputStream revert. Ensures text of underlying error isn't lost on rethrow h4. StandbyCheckpointer > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HADOOP-10571 > URL: https://issues.apache.org/jira/browse/HADOOP-10571 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, > HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch > > > When logging an exception, we often convert the exception to string or call > {{.getMessage}}. Instead we can use the log method overloads which take > {{Throwable}} as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15203) Support composite trusted channel resolver that supports both whitelist and blacklist
[ https://issues.apache.org/jira/browse/HADOOP-15203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15203: Attachment: HADOOP-15203.001.patch > Support composite trusted channel resolver that supports both whitelist and > blacklist > - > > Key: HADOOP-15203 > URL: https://issues.apache.org/jira/browse/HADOOP-15203 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: security > Attachments: HADOOP-15203.000.patch, HADOOP-15203.001.patch > > > support composite trusted channel resolver that supports both whitelist and > blacklist -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)
[ https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352392#comment-16352392 ] Zsolt Venczel commented on HADOOP-6852: --- In the attached patch I have added previously available test files back, re-enabled test-cases and added a proposed fix: within BZip2Codec I've changed the default read mode from CONTINUOUS to BYBLOCK at BZip2Codec.createInputStream(InputStream in, Decompressor decompressor) as BYBLOCK read mode handles concatenated bzip2 correctly and also this way it is consistent with mapred/LineRecordReader and input/LineRecordReader decompressor creation logic. As a result of the change the concatenated-bzip2 issue is fixed. > apparent bug in concatenated-bzip2 support (decoding) > - > > Key: HADOOP-6852 > URL: https://issues.apache.org/jira/browse/HADOOP-6852 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 0.22.0 > Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15 >Reporter: Greg Roelofs >Assignee: Zsolt Venczel >Priority: Major > Attachments: HADOOP-6852.01.patch > > > The following simplified code (manually picked out of testMoreBzip2() in > https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) > triggers a "java.io.IOException: bad block header" in > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( > CBZip2InputStream.java:527): > {noformat} > JobConf jobConf = new JobConf(defaultConf); > CompressionCodec bzip2 = new BZip2Codec(); > ReflectionUtils.setConf(bzip2, jobConf); > localFs.delete(workDir, true); > // copy multiple-member test file to HDFS > String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension(); > Path fnLocal2 = new > Path(System.getProperty("test.concat.data","/tmp"),fn2); > Path fnHDFS2 = new Path(workDir, fn2); > localFs.copyFromLocalFile(fnLocal2, fnHDFS2); > FileInputFormat.setInputPaths(jobConf, workDir); > final FileInputStream in2 = new FileInputStream(fnLocal2.toString()); > CompressionInputStream cin2 = bzip2.createInputStream(in2); > LineReader in = new LineReader(cin2); > Text out = new Text(); > int numBytes, totalBytes=0, lineNum=0; > while ((numBytes = in.readLine(out)) > 0) { > ++lineNum; > totalBytes += numBytes; > } > in.close(); > {noformat} > The specified file is also included in the H-6835 patch linked above, and > some additional debug output is included in the commented-out test loop > above. (Only in the linked, "v4" version of the patch, however--I'm about to > remove the debug stuff for checkin.) > It's possible I've done something completely boneheaded here, but the file, > at least, checks out in a subsequent set of subtests and with stock bzip2 > itself. Only the code above is problematic; it reads through the first > concatenated chunk (17 lines of text) just fine but chokes on the header of > the second one. Altogether, the test file contains 84 lines of text and 4 > concatenated bzip2 files. > (It's possible this is a mapreduce issue rather than common, but note that > the identical gzip test works fine. Possibly it's related to the > stream-vs-decompressor dichotomy, though; intentionally not supported?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)
[ https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352394#comment-16352394 ] genericqa commented on HADOOP-6852: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 3m 54s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-6852 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909222/HADOOP-6852.01.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14070/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > apparent bug in concatenated-bzip2 support (decoding) > - > > Key: HADOOP-6852 > URL: https://issues.apache.org/jira/browse/HADOOP-6852 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 0.22.0 > Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15 >Reporter: Greg Roelofs >Assignee: Zsolt Venczel >Priority: Major > Attachments: HADOOP-6852.01.patch > > > The following simplified code (manually picked out of testMoreBzip2() in > https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) > triggers a "java.io.IOException: bad block header" in > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( > CBZip2InputStream.java:527): > {noformat} > JobConf jobConf = new JobConf(defaultConf); > CompressionCodec bzip2 = new BZip2Codec(); > ReflectionUtils.setConf(bzip2, jobConf); > localFs.delete(workDir, true); > // copy multiple-member test file to HDFS > String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension(); > Path fnLocal2 = new > Path(System.getProperty("test.concat.data","/tmp"),fn2); > Path fnHDFS2 = new Path(workDir, fn2); > localFs.copyFromLocalFile(fnLocal2, fnHDFS2); > FileInputFormat.setInputPaths(jobConf, workDir); > final FileInputStream in2 = new FileInputStream(fnLocal2.toString()); > CompressionInputStream cin2 = bzip2.createInputStream(in2); > LineReader in = new LineReader(cin2); > Text out = new Text(); > int numBytes, totalBytes=0, lineNum=0; > while ((numBytes = in.readLine(out)) > 0) { > ++lineNum; > totalBytes += numBytes; > } > in.close(); > {noformat} > The specified file is also included in the H-6835 patch linked above, and > some additional debug output is included in the commented-out test loop > above. (Only in the linked, "v4" version of the patch, however--I'm about to > remove the debug stuff for checkin.) > It's possible I've done something completely boneheaded here, but the file, > at least, checks out in a subsequent set of subtests and with stock bzip2 > itself. Only the code above is problematic; it reads through the first > concatenated chunk (17 lines of text) just fine but chokes on the header of > the second one. Altogether, the test file contains 84 lines of text and 4 > concatenated bzip2 files. > (It's possible this is a mapreduce issue rather than common, but note that > the identical gzip test works fine. Possibly it's related to the > stream-vs-decompressor dichotomy, though; intentionally not supported?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)
[ https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HADOOP-6852: - Assignee: Zsolt Venczel > apparent bug in concatenated-bzip2 support (decoding) > - > > Key: HADOOP-6852 > URL: https://issues.apache.org/jira/browse/HADOOP-6852 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 0.22.0 > Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15 >Reporter: Greg Roelofs >Assignee: Zsolt Venczel >Priority: Major > Attachments: HADOOP-6852.01.patch > > > The following simplified code (manually picked out of testMoreBzip2() in > https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) > triggers a "java.io.IOException: bad block header" in > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( > CBZip2InputStream.java:527): > {noformat} > JobConf jobConf = new JobConf(defaultConf); > CompressionCodec bzip2 = new BZip2Codec(); > ReflectionUtils.setConf(bzip2, jobConf); > localFs.delete(workDir, true); > // copy multiple-member test file to HDFS > String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension(); > Path fnLocal2 = new > Path(System.getProperty("test.concat.data","/tmp"),fn2); > Path fnHDFS2 = new Path(workDir, fn2); > localFs.copyFromLocalFile(fnLocal2, fnHDFS2); > FileInputFormat.setInputPaths(jobConf, workDir); > final FileInputStream in2 = new FileInputStream(fnLocal2.toString()); > CompressionInputStream cin2 = bzip2.createInputStream(in2); > LineReader in = new LineReader(cin2); > Text out = new Text(); > int numBytes, totalBytes=0, lineNum=0; > while ((numBytes = in.readLine(out)) > 0) { > ++lineNum; > totalBytes += numBytes; > } > in.close(); > {noformat} > The specified file is also included in the H-6835 patch linked above, and > some additional debug output is included in the commented-out test loop > above. (Only in the linked, "v4" version of the patch, however--I'm about to > remove the debug stuff for checkin.) > It's possible I've done something completely boneheaded here, but the file, > at least, checks out in a subsequent set of subtests and with stock bzip2 > itself. Only the code above is problematic; it reads through the first > concatenated chunk (17 lines of text) just fine but chokes on the header of > the second one. Altogether, the test file contains 84 lines of text and 4 > concatenated bzip2 files. > (It's possible this is a mapreduce issue rather than common, but note that > the identical gzip test works fine. Possibly it's related to the > stream-vs-decompressor dichotomy, though; intentionally not supported?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)
[ https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HADOOP-6852: -- Status: Patch Available (was: Open) > apparent bug in concatenated-bzip2 support (decoding) > - > > Key: HADOOP-6852 > URL: https://issues.apache.org/jira/browse/HADOOP-6852 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 0.22.0 > Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15 >Reporter: Greg Roelofs >Assignee: Zsolt Venczel >Priority: Major > Attachments: HADOOP-6852.01.patch > > > The following simplified code (manually picked out of testMoreBzip2() in > https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) > triggers a "java.io.IOException: bad block header" in > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( > CBZip2InputStream.java:527): > {noformat} > JobConf jobConf = new JobConf(defaultConf); > CompressionCodec bzip2 = new BZip2Codec(); > ReflectionUtils.setConf(bzip2, jobConf); > localFs.delete(workDir, true); > // copy multiple-member test file to HDFS > String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension(); > Path fnLocal2 = new > Path(System.getProperty("test.concat.data","/tmp"),fn2); > Path fnHDFS2 = new Path(workDir, fn2); > localFs.copyFromLocalFile(fnLocal2, fnHDFS2); > FileInputFormat.setInputPaths(jobConf, workDir); > final FileInputStream in2 = new FileInputStream(fnLocal2.toString()); > CompressionInputStream cin2 = bzip2.createInputStream(in2); > LineReader in = new LineReader(cin2); > Text out = new Text(); > int numBytes, totalBytes=0, lineNum=0; > while ((numBytes = in.readLine(out)) > 0) { > ++lineNum; > totalBytes += numBytes; > } > in.close(); > {noformat} > The specified file is also included in the H-6835 patch linked above, and > some additional debug output is included in the commented-out test loop > above. (Only in the linked, "v4" version of the patch, however--I'm about to > remove the debug stuff for checkin.) > It's possible I've done something completely boneheaded here, but the file, > at least, checks out in a subsequent set of subtests and with stock bzip2 > itself. Only the code above is problematic; it reads through the first > concatenated chunk (17 lines of text) just fine but chokes on the header of > the second one. Altogether, the test file contains 84 lines of text and 4 > concatenated bzip2 files. > (It's possible this is a mapreduce issue rather than common, but note that > the identical gzip test works fine. Possibly it's related to the > stream-vs-decompressor dichotomy, though; intentionally not supported?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)
[ https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HADOOP-6852: -- Attachment: HADOOP-6852.01.patch > apparent bug in concatenated-bzip2 support (decoding) > - > > Key: HADOOP-6852 > URL: https://issues.apache.org/jira/browse/HADOOP-6852 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 0.22.0 > Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15 >Reporter: Greg Roelofs >Assignee: Zsolt Venczel >Priority: Major > Attachments: HADOOP-6852.01.patch > > > The following simplified code (manually picked out of testMoreBzip2() in > https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) > triggers a "java.io.IOException: bad block header" in > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( > CBZip2InputStream.java:527): > {noformat} > JobConf jobConf = new JobConf(defaultConf); > CompressionCodec bzip2 = new BZip2Codec(); > ReflectionUtils.setConf(bzip2, jobConf); > localFs.delete(workDir, true); > // copy multiple-member test file to HDFS > String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension(); > Path fnLocal2 = new > Path(System.getProperty("test.concat.data","/tmp"),fn2); > Path fnHDFS2 = new Path(workDir, fn2); > localFs.copyFromLocalFile(fnLocal2, fnHDFS2); > FileInputFormat.setInputPaths(jobConf, workDir); > final FileInputStream in2 = new FileInputStream(fnLocal2.toString()); > CompressionInputStream cin2 = bzip2.createInputStream(in2); > LineReader in = new LineReader(cin2); > Text out = new Text(); > int numBytes, totalBytes=0, lineNum=0; > while ((numBytes = in.readLine(out)) > 0) { > ++lineNum; > totalBytes += numBytes; > } > in.close(); > {noformat} > The specified file is also included in the H-6835 patch linked above, and > some additional debug output is included in the commented-out test loop > above. (Only in the linked, "v4" version of the patch, however--I'm about to > remove the debug stuff for checkin.) > It's possible I've done something completely boneheaded here, but the file, > at least, checks out in a subsequent set of subtests and with stock bzip2 > itself. Only the code above is problematic; it reads through the first > concatenated chunk (17 lines of text) just fine but chokes on the header of > the second one. Altogether, the test file contains 84 lines of text and 4 > concatenated bzip2 files. > (It's possible this is a mapreduce issue rather than common, but note that > the identical gzip test works fine. Possibly it's related to the > stream-vs-decompressor dichotomy, though; intentionally not supported?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352138#comment-16352138 ] Elek, Marton edited comment on HADOOP-15007 at 2/5/18 8:31 AM: --- +1 for separaing action item 3. +0 for action item 1 and 2. (This is not my personal preference but don't thinks it's a big deal, let's do it in any way.) My preference is: # Don't use tags in the code at all (only return existing tags as a List to the ui) # Use enums to represent tags # Use string constants to represent tags I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better then 2. Just one additional note about a potential use case. I can imagine that some user would like to introduce custom tags for the configuration values (eg. MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration keys with a specific tag. With the "log at the first time" method the user will have a a warning/error for every custom tag. Maybe it's also not a big deal but I think it's a real use-case and in this case we don't need logging at all. In fact I wouldn't like to see any tag related log. Tags are maintained by the developer (except the previous use case). I wouldn't like to see any warnings during my cluster startup as there are no problems with my cluster even if the tags are misspelled. The warning should be displayed during the build time for he developers/reviewers. But again, I can live together with the existing solution, I just tried to propose a simplification. was (Author: elek): +1 for separaing action item 3. +0 for action item 1 and 2. (This is not my personal preference but don't thinks it's a big deal, let's do it in any way.) My preference is: # Don't use tags in the code at all # Use enums to represent tags # Use string constants to represent tags I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better then 2. Just one additional note about a potential use case. I can imagine that some user would like to introduce custom tags for the configuration values (eg. MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration keys with a specific tag. With the "log at the first time" method the user will have a a warning/error for every custom tag. Maybe it's also not a big deal but I think it's a real use-case and in this case we don't need logging at all. In fact I wouldn't like to see any tag related log. Tags are maintained by the developer (except the previous use case). I wouldn't like to see any warnings during my cluster startup as there are no problems with my cluster even if the tags are misspelled. The warning should be displayed during the build time for he developers/reviewers. But again, I can live together with the existing solution, I just tried to propose a simplification. > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element
[ https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352138#comment-16352138 ] Elek, Marton commented on HADOOP-15007: --- +1 for separaing action item 3. +0 for action item 1 and 2. (This is not my personal preference but don't thinks it's a big deal, let's do it in any way.) My preference is: # Don't use tags in the code at all # Use enums to represent tags # Use string constants to represent tags I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better then 2. Just one additional note about a potential use case. I can imagine that some user would like to introduce custom tags for the configuration values (eg. MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration keys with a specific tag. With the "log at the first time" method the user will have a a warning/error for every custom tag. Maybe it's also not a big deal but I think it's a real use-case and in this case we don't need logging at all. In fact I wouldn't like to see any tag related log. Tags are maintained by the developer (except the previous use case). I wouldn't like to see any warnings during my cluster startup as there are no problems with my cluster even if the tags are misspelled. The warning should be displayed during the build time for he developers/reviewers. But again, I can live together with the existing solution, I just tried to propose a simplification. > Stabilize and document Configuration element > -- > > Key: HADOOP-15007 > URL: https://issues.apache.org/jira/browse/HADOOP-15007 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Blocker > > HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a > value. > We need to make sure that this feature is backwards compatible & usable in > production. That's docs, testing, marshalling etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org