[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-02-05 Thread Aki Tanaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353510#comment-16353510
 ] 

Aki Tanaka commented on HADOOP-15206:
-

[~jlowe]

Thank you for your insights. I have created a patch based on your comment.

As far as I tested, all the unit tests passed and I confirmed that the issue I 
was seeing was solved.

 

I greatly appreciate any and someone take a look. Alternative proposals are 
also very welcome.

 

 

Regarding the duplicated record scenario, the record was read twice when 
BZip2Codec starts reading at position 0 (BZip2 header) and position 4 (first 
BZip2 marker).

test.bz2:0+1 -> read 100 records

test.bz2:3+4 -> read 99 records

 

2018-02-05 20:49:51,598 ERROR [Thread-3] mapred.TestTextInputFormat2 
(TestTextInputFormat2.java:verifyPartitions(324)) - 
splits[0]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:0+1
 count=100
2018-02-05 20:49:51,605 ERROR [Thread-3] mapred.TestTextInputFormat2 
(TestTextInputFormat2.java:verifyPartitions(326)) - 
splits[1]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:1+1
 count=0
2018-02-05 20:49:51,608 ERROR [Thread-3] mapred.TestTextInputFormat2 
(TestTextInputFormat2.java:verifyPartitions(326)) - 
splits[2]=file:/Users/tanakah/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:2+1
 count=0

2018-02-05 20:49:51,614 ERROR [Thread-3] mapred.TestTextInputFormat2 
(TestTextInputFormat2.java:verifyPartitions(313)) - read 1
2018-02-05 20:49:51,617 WARN  [Thread-3] mapred.TestTextInputFormat2 
(TestTextInputFormat2.java:verifyPartitions(315)) - conflict with 1 in split 3 
at position 7

 

 

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Priority: Major
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-02-05 Thread Aki Tanaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aki Tanaka updated HADOOP-15206:

Attachment: HADOOP-15206.001.patch

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Priority: Major
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353264#comment-16353264
 ] 

genericqa commented on HADOOP-10571:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
43s{color} | {color:green} root generated 0 new + 1234 unchanged - 3 fixed = 
1234 total (was 1237) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 31s{color} | {color:orange} root: The patch generated 3 new + 769 unchanged 
- 35 fixed = 772 total (was 804) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 20s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
15s{color} | {color:green} hadoop-hdfs-nfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 
18s{color} | {color:green} hadoop-gridmix in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-openstack in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}235m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-10571 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909303/HADOOP-10571.05.patch 
|
| 

[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353214#comment-16353214
 ] 

genericqa commented on HADOOP-15007:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 4 new + 241 unchanged - 0 fixed = 245 total (was 241) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 48s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
1s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15007 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909280/HADOOP-15007.000.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2945e22c0e09 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60656bc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/testReport/ |
| Max. process+thread count | 1405 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14075/console |
| Powered by | Apache Yetus 

[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15007:

Status: Patch Available  (was: Open)

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
> Attachments: HADOOP-15007.000.patch
>
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15211) Distcp update not preserving root directory permissions

2018-02-05 Thread PRASHANT GOLASH (JIRA)
PRASHANT GOLASH created HADOOP-15211:


 Summary: Distcp update not preserving root directory permissions
 Key: HADOOP-15211
 URL: https://issues.apache.org/jira/browse/HADOOP-15211
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools/distcp
Affects Versions: 2.6.0
Reporter: PRASHANT GOLASH


"hadoop distcp -pugpb -update  " does not preserve permission for 
 directory. Although, it preserves permissions for child directories 
(/child1 etc.)

Using hadoop-distcp', version:'2.6.0-cdh5.7.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15134) ADL problems parsing JSON responses to include error details

2018-02-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353086#comment-16353086
 ] 

Steve Loughran commented on HADOOP-15134:
-

FWIW, suspect underlying cause is endpoint returning text/html because a proxy 
got in the way, and JSON parser isn't checking content-type first

> ADL problems parsing JSON responses to include error details
> 
>
> Key: HADOOP-15134
> URL: https://issues.apache.org/jira/browse/HADOOP-15134
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>  Labels: supportability
>
> Currently any failure of ADL's response JSON parsing results in the error 
> text like "Unexpected error happened reading response stream or parsing JSon 
> from rename()";". This is not useful. Fix by: including the exception text, 
> logging the Ex & info



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353064#comment-16353064
 ] 

genericqa commented on HADOOP-15204:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 1 new + 241 unchanged - 0 fixed = 242 total (was 241) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 44s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestRaceWhenRelogin |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15204 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909288/HADOOP-15204.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6d31ca2ae4b2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 33e6cdb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14073/testReport/ |
| Max. process+thread count | 1396 (vs. ulimit of 

[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15209:

Status: Open  (was: Patch Available)

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-15209:
---

Assignee: Steve Loughran

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15210) Handle FNFE from S3Guard.getMetadataStore() in S3A initialize()

2018-02-05 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15210:
---

 Summary: Handle FNFE from S3Guard.getMetadataStore() in S3A 
initialize()
 Key: HADOOP-15210
 URL: https://issues.apache.org/jira/browse/HADOOP-15210
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.0.0
Reporter: Steve Loughran


{{S3Guard.getMetadataStore()}} throws FileNotFoundExceptions up, as the 
comments say " rely on callers to catch and treat specially"

S3A Filesystem doesn't do that, instead it will just fail 
FileSystem.initialize; the FNFE  is generated by DynamoDBMetadataStore.

Are we happy with this? 

Downgrading has some appeal: if you don't have the table, it will keep going. 
But failures could be a sign of bad config, so maybe silent recovery is bad.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-05 Thread Andras Bokor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353044#comment-16353044
 ] 

Andras Bokor commented on HADOOP-10571:
---

{quote}
h4. LocalFileSystem

L142 we could move to making {{p}} another arg
{quote}
LocalFilesystem still uses commons logging and a comment says
{quote}This log is widely used in the org.apache.hadoop.fs code and tests, so 
must be considered something to only be changed with care.{quote}

Other comments are addressed.

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HADOOP-10571
> URL: https://issues.apache.org/jira/browse/HADOOP-10571
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, 
> HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch, 
> HADOOP-10571.05.patch
>
>
> When logging an exception, we often convert the exception to string or call 
> {{.getMessage}}. Instead we can use the log method overloads which take 
> {{Throwable}} as a parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-05 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HADOOP-10571:
--
Attachment: HADOOP-10571.05.patch

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HADOOP-10571
> URL: https://issues.apache.org/jira/browse/HADOOP-10571
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, 
> HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch, 
> HADOOP-10571.05.patch
>
>
> When logging an exception, we often convert the exception to string or call 
> {{.getMessage}}. Instead we can use the log method overloads which take 
> {{Throwable}} as a parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable

2018-02-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353020#comment-16353020
 ] 

Steve Loughran commented on HADOOP-14468:
-

One thing related to this is whether we should have a TTL on tombstone markers.

Even in non-auth mode, when we reconcile the listings, files recorded as 
deleted are omitted. If someone can create that file via another client, will 
it ever be seen in listings?

> S3Guard: make short-circuit getFileStatus() configurable
> 
>
> Key: HADOOP-14468
> URL: https://issues.apache.org/jira/browse/HADOOP-14468
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
> result from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter 
> {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
> the current behavior.  When false, S3AFileSystem will check both S3 and the 
> MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of 
> getFileStatus(), or if we only want to check both S3 and MetadataStore for 
> some internal callers such as open().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14918) remove the Local Dynamo DB test option

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14918:

Status: Open  (was: Patch Available)

> remove the Local Dynamo DB test option
> --
>
> Key: HADOOP-14918
> URL: https://issues.apache.org/jira/browse/HADOOP-14918
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0, 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14918-001.patch, HADOOP-14918-002.patch
>
>
> I'm going to propose cutting out the localdynamo test option for s3guard
> * the local DDB JAR is unmaintained/lags the SDK We work with...eventually 
> there'll be differences in API.
> * as the local dynamo DB is unshaded. it complicates classpath setup for the 
> build. Remove it and there's no need to worry about versions of anything 
> other than the shaded AWS
> * it complicates test runs. Now we need to test for both localdynamo *and* 
> real dynamo
> * but we can't ignore real dynamo, because that's the one which matters
> While the local option promises to reduce test costs, really, it's just 
> adding complexity. If you are testing with s3guard, you need to have a real 
> table to test against., And with the exception of those people testing s3a 
> against non-AWS, consistent endpoints, everyone should be testing with 
> S3Guard.
> -Straightforward to remove.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-05 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352956#comment-16352956
 ] 

Anu Engineer commented on HADOOP-15204:
---

[~ste...@apache.org], [~chris.douglas] Thanks for the comments. Patch v3 
addresses all the comments.
Details below:
bq. IDE shuffled imports; please revert
Thanks for catching this, Fixed.
bq. parseFromString() can just use Precondition.checkArgument for validation
Fixed.
bq. validation/parse errors to include value at error, and, ideally, config 
option too. Compare a stack trace saying "Value not in expected format", with 
one saying "value of option 'buffer.size' not in expected format "54exa"
Fixed.
bq. sanitizedValue.toLowerCase() should specify local for case conversion, same 
everywhere else used.
Fixed.
bq. What if a caller doesn't want to provide a string default value of the new 
getters, but just a number? That would let me return something like -1 to mean 
"no value set", which I can't do with the current API.
There is an API that takes a default float argument, and a default string 
argument with the storage unit.
bq. getStorageSize(String name, String defaultValue,
+ StorageUnit targetUnit) -- Does this come up often? 
We define the standard defaults as "5 GB", etc., so yes it is a convenient 
function.
bq. I'd lean toward MB instead of MEGABYTES, and similar.
Fixed. I agree, thanks for this suggestion, that does improve code readability.
bq. Please, no. This is the silliest dependency we have on Guava.
Fixed. I still use it in Configuration, since it is already in the file as an 
import.
 

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch, 
> HADOOP-15204.003.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HADOOP-15204:
--
Attachment: HADOOP-15204.003.patch

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch, 
> HADOOP-15204.003.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352934#comment-16352934
 ] 

Ajay Kumar commented on HADOOP-15007:
-

[~ste...@apache.org],[~anu],[~elek] thanks for valuable feedback. Adding first 
pass of patch with following changes:
* Replaced Enums with a common interface with Strings
* Logging moved at trace level. This should handle [~elek] case as well.( i.e 
"I wouldn't like to see any warnings during my cluster startup as there are no 
problems with my cluster even if the tags are misspelled. The warning should be 
displayed during the build time for he developers/reviewers." )
* Added test case for invalid tags
* Updated javadocs in Configuration for tag functionality.

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
> Attachments: HADOOP-15007.000.patch
>
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15007:

Attachment: HADOOP-15007.000.patch

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
> Attachments: HADOOP-15007.000.patch
>
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15007:

Attachment: (was: HADOOP-15007.000.patch)

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
> Attachments: HADOOP-15007.000.patch
>
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15007:

Attachment: HADOOP-15007.000.patch

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
> Attachments: HADOOP-15007.000.patch
>
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352906#comment-16352906
 ] 

genericqa commented on HADOOP-15209:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 24s{color} | {color:orange} root: The patch generated 2 new + 57 unchanged - 
0 fixed = 59 total (was 57) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
10s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 16s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.mapred.TestDeletedDirTracker |
|   | hadoop.tools.contract.TestLocalContractDistCp |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15209 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909258/HADOOP-15209-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ad790c896377 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e9a59c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 

[jira] [Commented] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP

2018-02-05 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352889#comment-16352889
 ] 

Sanjay Radia commented on HADOOP-15191:
---

Steve, can you please explain how this will be used?

For example will distcp call the fs-object to see if it has a bulk delete and 
then call that  fs's  bulk deletes? Alternatively we could add a bulk delete 
operation to the FileSystem and FileContext API and have distcp simply call 
fs.bulkDelete(...);  the fs implementation will either call the bulk delete 
operation  or call individual delete.  The second approach has the advantage 
that distcp's code is simpler.

> Add Private/Unstable BulkDelete operations to supporting object stores for 
> DistCP
> -
>
> Key: HADOOP-15191
> URL: https://issues.apache.org/jira/browse/HADOOP-15191
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, 
> HADOOP-15191-003.patch, HADOOP-15191-004.patch
>
>
> Large scale DistCP with the -delete option doesn't finish in a viable time 
> because of the final CopyCommitter doing a 1 by 1 delete of all missing 
> files. This isn't randomized (the list is sorted), and it's throttled by AWS.
> If bulk deletion of files was exposed as an API, distCP would do 1/1000 of 
> the REST calls, so not get throttled.
> Proposed: add an initially private/unstable interface for stores, 
> {{BulkDelete}} which declares a page size and offers a 
> {{bulkDelete(List)}} operation for the bulk deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15203) Support composite trusted channel resolver that supports both whitelist and blacklist

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352857#comment-16352857
 ] 

genericqa commented on HADOOP-15203:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m  
2s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
58s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15203 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909245/HADOOP-15203.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3a6bbe3c29f1 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e9a59c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14071/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-02-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HADOOP-11867:
-
Labels: performance  (was: )

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-02-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HADOOP-11867:
-
Component/s: hdfs-client

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-02-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HADOOP-11867:
-
Affects Version/s: (was: 2.8.0)
   3.0.0

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15209:

Status: Patch Available  (was: Open)

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15209:
---

 Summary: PoC: DistCp to eliminate needless deletion of files under 
deleted directories
 Key: HADOOP-15209
 URL: https://issues.apache.org/jira/browse/HADOOP-15209
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tools/distcp
Affects Versions: 2.9.0
Reporter: Steve Loughran


DistCP issues a delete(file) request even if is underneath an already deleted 
directory. This generates needless load on filesystems/object stores, and, if 
the store throttles delete, can dramatically slow down the delete operation.

If the distcp delete operation can build a history of deleted directories, then 
it will know when it does not need to issue those deletes.

Care is needed here to make sure that whatever structure is created does not 
overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352741#comment-16352741
 ] 

Steve Loughran commented on HADOOP-15209:
-

HADOOP-15209 patch 001

 

this builds up a set of paths which have have been deleted.

 

bug: its broken, as shown by test failures. I know the problem, but 
HADOOP-15208 shows my plan: move this out of distcp and experiment elsewhere.

 

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15209:

Attachment: HADOOP-15209-001.patch

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15208) DistCp to offer option to save src/dest filesets as alternative to delete()

2018-02-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352729#comment-16352729
 ] 

Steve Loughran commented on HADOOP-15208:
-

See HADOOP-15191 as one scale strategy

> DistCp to offer option to save src/dest filesets as alternative to delete()
> ---
>
> Key: HADOOP-15208
> URL: https://issues.apache.org/jira/browse/HADOOP-15208
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> There are opportunities to improve distcp delete performance and scalability 
> with object stores, but you need to test with production datasets to 
> determine if the optimizations work, don't run out of memory, etc.
> By adding the option to save the sequence files of source, dest listings, 
> people (myself included) can experiment with different strategies before 
> trying to commit one which doesn't scale



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15208) DistCp to offer option to save src/dest filesets as alternative to delete()

2018-02-05 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15208:
---

 Summary: DistCp to offer option to save src/dest filesets as 
alternative to delete()
 Key: HADOOP-15208
 URL: https://issues.apache.org/jira/browse/HADOOP-15208
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools/distcp
Affects Versions: 2.9.0
Reporter: Steve Loughran
Assignee: Steve Loughran


There are opportunities to improve distcp delete performance and scalability 
with object stores, but you need to test with production datasets to determine 
if the optimizations work, don't run out of memory, etc.

By adding the option to save the sequence files of source, dest listings, 
people (myself included) can experiment with different strategies before trying 
to commit one which doesn't scale



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP

2018-02-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15191:

Status: Open  (was: Patch Available)

> Add Private/Unstable BulkDelete operations to supporting object stores for 
> DistCP
> -
>
> Key: HADOOP-15191
> URL: https://issues.apache.org/jira/browse/HADOOP-15191
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, 
> HADOOP-15191-003.patch, HADOOP-15191-004.patch
>
>
> Large scale DistCP with the -delete option doesn't finish in a viable time 
> because of the final CopyCommitter doing a 1 by 1 delete of all missing 
> files. This isn't randomized (the list is sorted), and it's throttled by AWS.
> If bulk deletion of files was exposed as an API, distCP would do 1/1000 of 
> the REST calls, so not get throttled.
> Proposed: add an initially private/unstable interface for stores, 
> {{BulkDelete}} which declares a page size and offers a 
> {{bulkDelete(List)}} operation for the bulk deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10571) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352722#comment-16352722
 ] 

Steve Loughran commented on HADOOP-10571:
-

It's way past time to get this in, so lets aim for this week. I've reviewed it 
all by counting {} and arg numbers,  checking that e.getMessage() is only 
called when I'm happy that this value isn't null (i.e. if its a subclass of 
IOE),
and that no info is being lost compared to before.

As this goes near HDFS, can you create a JIRA here and mark it as part of this 
one? That'll let the hdfs dev team know what's coming.

h4. LocalFileSystem

L142 we could move to making {{p}} another arg

h4. DNS


L314. The log pattern doesn't include the localhost string

L432: review

h4. DataNode

L2435 that's complex enough that it should retain the debug enabled guard. 
L2435 replace use of %d with {}

L2724. Use e.toString()

L3360. Possibly better as {{LOG.debug("{}", sb)}} ; avoids calling 
sb.toString() when not needed

h4. DataXceiver

L717 retain isDebug guard to avoid calling Arrays.asList()

h3. EditLogBackupInputStream

revert. Ensures text of underlying error isn't lost on rethrow

h4.  StandbyCheckpointer


> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HADOOP-10571
> URL: https://issues.apache.org/jira/browse/HADOOP-10571
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.01.patch, HADOOP-10571.01.patch, 
> HADOOP-10571.02.patch, HADOOP-10571.03.patch, HADOOP-10571.04.patch
>
>
> When logging an exception, we often convert the exception to string or call 
> {{.getMessage}}. Instead we can use the log method overloads which take 
> {{Throwable}} as a parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15203) Support composite trusted channel resolver that supports both whitelist and blacklist

2018-02-05 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15203:

Attachment: HADOOP-15203.001.patch

> Support composite trusted channel resolver that supports both whitelist and 
> blacklist
> -
>
> Key: HADOOP-15203
> URL: https://issues.apache.org/jira/browse/HADOOP-15203
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: security
> Attachments: HADOOP-15203.000.patch, HADOOP-15203.001.patch
>
>
> support composite trusted channel resolver that supports both whitelist and 
> blacklist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread Zsolt Venczel (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352392#comment-16352392
 ] 

Zsolt Venczel commented on HADOOP-6852:
---

In the attached patch I have added previously available test files back, 
re-enabled test-cases and added a proposed fix:

within BZip2Codec I've changed the default read mode from CONTINUOUS to BYBLOCK 
at BZip2Codec.createInputStream(InputStream in, Decompressor decompressor) as 
BYBLOCK read mode handles concatenated bzip2 correctly and also this way it is 
consistent with mapred/LineRecordReader and input/LineRecordReader decompressor 
creation logic. As a result of the change the concatenated-bzip2 issue is fixed.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352394#comment-16352394
 ] 

genericqa commented on HADOOP-6852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  3m 
54s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909222/HADOOP-6852.01.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14070/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread Zsolt Venczel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HADOOP-6852:
-

Assignee: Zsolt Venczel

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread Zsolt Venczel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HADOOP-6852:
--
Status: Patch Available  (was: Open)

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread Zsolt Venczel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HADOOP-6852:
--
Attachment: HADOOP-6852.01.patch

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352138#comment-16352138
 ] 

Elek, Marton edited comment on HADOOP-15007 at 2/5/18 8:31 AM:
---

+1 for separaing action item 3.

+0 for action item 1 and 2. (This is not my personal preference but don't 
thinks it's a big deal, let's do it in any way.) My preference is:
 # Don't use tags in the code at all (only return existing tags as a 
List to the ui)
 # Use enums to represent tags
 # Use string constants to represent tags

I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better 
then 2.

 

Just one additional note about a potential use case. I can imagine that some 
user would like to introduce custom tags for the configuration values (eg. 
MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration 
keys with a specific tag. With the "log at the first time" method the user will 
have a a warning/error for every custom tag. Maybe it's also not a big deal but 
I think it's a real use-case and in this case we don't need logging at all.

In fact I wouldn't like to see any tag related log. Tags are maintained by the 
developer (except the previous use case). I wouldn't like to see any warnings 
during my cluster startup as there are no problems with my cluster even if the 
tags are misspelled. The warning should be displayed during the build time for 
he developers/reviewers.

But again, I can live together with the existing solution, I just tried to 
propose a simplification.

 


was (Author: elek):
+1 for separaing action item 3.

+0 for action item 1 and 2. (This is not my personal preference but don't 
thinks it's a big deal, let's do it in any way.) My preference is:
 # Don't use tags in the code at all
 # Use enums to represent tags
 # Use string constants to represent tags

I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better 
then 2.

 

Just one additional note about a potential use case. I can imagine that some 
user would like to introduce custom tags for the configuration values (eg. 
MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration 
keys with a specific tag. With the "log at the first time" method the user will 
have a a warning/error for every custom tag. Maybe it's also not a big deal but 
I think it's a real use-case and in this case we don't need logging at all.

In fact I wouldn't like to see any tag related log. Tags are maintained by the 
developer (except the previous use case). I wouldn't like to see any warnings 
during my cluster startup as there are no problems with my cluster even if the 
tags are misspelled. The warning should be displayed during the build time for 
he developers/reviewers.

But again, I can live together with the existing solution, I just tried to 
propose a simplification.

 

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15007) Stabilize and document Configuration element

2018-02-05 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352138#comment-16352138
 ] 

Elek, Marton commented on HADOOP-15007:
---

+1 for separaing action item 3.

+0 for action item 1 and 2. (This is not my personal preference but don't 
thinks it's a big deal, let's do it in any way.) My preference is:
 # Don't use tags in the code at all
 # Use enums to represent tags
 # Use string constants to represent tags

I agree with Anu that 2 is better then 3 and I agree we Steve that 1 is better 
then 2.

 

Just one additional note about a potential use case. I can imagine that some 
user would like to introduce custom tags for the configuration values (eg. 
MY_UPGRADE_TEST). I think it's usefull, they can mark specific configuration 
keys with a specific tag. With the "log at the first time" method the user will 
have a a warning/error for every custom tag. Maybe it's also not a big deal but 
I think it's a real use-case and in this case we don't need logging at all.

In fact I wouldn't like to see any tag related log. Tags are maintained by the 
developer (except the previous use case). I wouldn't like to see any warnings 
during my cluster startup as there are no problems with my cluster even if the 
tags are misspelled. The warning should be displayed during the build time for 
he developers/reviewers.

But again, I can live together with the existing solution, I just tried to 
propose a simplification.

 

> Stabilize and document Configuration  element
> --
>
> Key: HADOOP-15007
> URL: https://issues.apache.org/jira/browse/HADOOP-15007
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Blocker
>
> HDFS-12350 (moved to HADOOP-15005). Adds the ability to tag properties with a 
>  value.
> We need to make sure that this feature is backwards compatible & usable in 
> production. That's docs, testing, marshalling etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org