[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-05-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460170#comment-16460170
 ] 

Jason Lowe commented on MAPREDUCE-7086:
---

My apologies for the delay.  +1 lgtm, committing this.


> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-05-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460085#comment-16460085
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

[~jlowe] do you have any other feedback on the updated patch? thnx

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456193#comment-16456193
 ] 

Steve Loughran commented on MAPREDUCE-7086:
---

Sergey: sorry, I'm not skilled enough in this module to be a safe reviewer.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454768#comment-16454768
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

Ping? :)

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451419#comment-16451419
 ] 

Hadoop QA commented on MAPREDUCE-7086:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: 
The patch generated 10 new + 171 unchanged - 0 fixed = 181 total (was 171) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
47s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:b78c94f |
| JIRA Issue | MAPREDUCE-7086 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920540/MAPREDUCE-7086.01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b96198550e58 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9d6befb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7399/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7399/testReport/ |
| Max. process+thread count | 457 (vs. ulimit of 1) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 

[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451367#comment-16451367
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

Added the handling to the other FIF and tests; the handling is similar to the 
first one though (in getSplits rather than listStatus) because listStatus can 
be overridden.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450120#comment-16450120
 ] 

Jason Lowe commented on MAPREDUCE-7086:
---

Thanks for updating the patch!

Would it be clearer if "non-recursive" or something similar was in the property 
name?  Otherwise it gets confusing when recursive=true and ignore.subdirs=true 
as well.

Does mapreduce.lib.input.FileInputFormat need to be updated as well?  It looks 
like that take a slightly different tactic of allowing directories in 
non-recursive mode but generating degenerate splits for them.  ignore.subdirs 
arguably should make it not generate any splits for directory entries, even 
degenerate ones.

I agree with Steve that it would be good to add a unit test to verify the 
behavior of the new property.  It would also be nice to cleanup the checkstyle 
warnings which are akin to the whitespace warnings.


> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449035#comment-16449035
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

Will fix whitespace on next review iteration; or you can fix it on commit ;)

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449014#comment-16449014
 ] 

Hadoop QA commented on MAPREDUCE-7086:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: 
The patch generated 2 new + 87 unchanged - 0 fixed = 89 total (was 87) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | MAPREDUCE-7086 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920348/MAPREDUCE-7086.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d1c5591ee489 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42e82f0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7396/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
 |
| whitespace | 

[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448933#comment-16448933
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

Added the config, off by default; addressed other feedback.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448926#comment-16448926
 ] 

Wangda Tan commented on MAPREDUCE-7086:
---

[~sershe], oh that's my bad, moving the JIRA somehow cleaned the assignee, 
apologize for that. Jason has assigned this to you.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448823#comment-16448823
 ] 

Sergey Shelukhin commented on MAPREDUCE-7086:
-

Can someone assign this back to me? I will update the patch in due course

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448815#comment-16448815
 ] 

Hadoop QA commented on MAPREDUCE-7086:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
56s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | MAPREDUCE-7086 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920094/HADOOP-15403.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a98beaed3f38 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f411de6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7395/testReport/ |
| Max. process+thread count | 441 (vs. ulimit of 1) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7395/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT 

[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448750#comment-16448750
 ] 

Wangda Tan commented on MAPREDUCE-7086:
---

I just moved this project to MAPREDUCE.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org