[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330994#comment-16330994 ] Xuefu Zhang commented on HIVE-17257: Thanks for the update, [~csun]. I also verified with the patch and it fixed the problem for both MR and Spark. Will commit the patch shortly. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330983#comment-16330983 ] Chao Sun commented on HIVE-17257: - In the latest test run, most test failures are not new except the following 3: {code:java} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_15] {code} Tested locally, and I couldn't reproduce the failures - the output is the same whether with or without my patch (and llap_smb generate a different q.out file even without the patch). > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330454#comment-16330454 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12906544/HIVE-17257.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11613 tests executed *Failed tests:* {noformat} TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=149) [intersect_all.q,unionDistinct_1.q,orc_ppd_schema_evol_3a.q,table_nonprintable.q,tez_union_dynamic_partition.q,tez_union_dynamic_partition_2.q,temp_table_external.q,global_limit.q,llap_udf.q,schemeAuthority.q,cte_2.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,parallel_colstats.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=173) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=169) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_15] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=160) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=121) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=254) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=232) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=232) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8676/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8676/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8676/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12906544 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330408#comment-16330408 ] Hive QA commented on HIVE-17257: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 80e6f7b | | Default Java | 1.8.0_111 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8676/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330393#comment-16330393 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12906544/HIVE-17257.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=173) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=169) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=160) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=121) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=254) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=232) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=232) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8675/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8675/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8675/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12906544 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330349#comment-16330349 ] Hive QA commented on HIVE-17257: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 80e6f7b | | Default Java | 1.8.0_111 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8675/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329074#comment-16329074 ] Chao Sun commented on HIVE-17257: - {quote}+1 for the patch. However, I'm not sure if those test failures are related. {quote} The last test result has been removed so I'm not sure. I'm waiting for the latest patch to be triggered by jenkins. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329018#comment-16329018 ] Xuefu Zhang commented on HIVE-17257: +1 for the patch. However, I'm not sure if those test failures are related. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch, HIVE-17257.3.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323901#comment-16323901 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12905762/HIVE-17257.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11567 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=159) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part] (batchId=93) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=120) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query39] (batchId=247) org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation (batchId=213) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=225) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8590/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8590/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8590/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12905762 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323859#comment-16323859 ] Hive QA commented on HIVE-17257: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / fd4e222 | | Default Java | 1.8.0_111 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8590/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch, > HIVE-17257.2.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119381#comment-16119381 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880932/HIVE-17257.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10999 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_4] (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] (batchId=75) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6313/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6313/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6313/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880932 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch, HIVE-17257.1.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117565#comment-16117565 ] Chao Sun commented on HIVE-17257: - [~kellyzly]: the empty files maybe generated if the result set is empty and if you have multiple mapper/reducers with file sink. Example: {code} set hive.execution.engine=spark; set hive.auto.convert.join=false; set mapreduce.job.reduces=1000; create table dummy (a string); insert overwrite directory '/tmp/test' select src.key from src join dummy on src.key = dummy.a; {code} The above will generate 1000 empty files in /tmp/test. [~xuefuz]: I need to revise the patch. There's an issue where HoS won't launch task for the final merge job since the input data is empty. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116933#comment-16116933 ] Xuefu Zhang commented on HIVE-17257: Patch looks simple and good to me. Is it possible to have a test case on this? > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116172#comment-16116172 ] liyunzhang_intel commented on HIVE-17257: - [~csun]: before i met empty files when using parquet files. The reason is that hive read parquet meta info to construct ParquetInputSplit in ParquetRecordReaderBase#getSplit. In [ParquetRecordReaderBase#getSplit|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L106], sometimes return NULL. Thus will cause empty file. > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116144#comment-16116144 ] liyunzhang_intel commented on HIVE-17257: - Why there are empty files? The raw data is empty or the empty files is generated after loading? > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116004#comment-16116004 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880576/HIVE-17257.0.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10990 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=239) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=239) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6274/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6274/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6274/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880576 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17257) Hive should merge empty files
[ https://issues.apache.org/jira/browse/HIVE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115932#comment-16115932 ] Hive QA commented on HIVE-17257: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880576/HIVE-17257.0.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10989 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=239) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6] (batchId=7) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6272/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6272/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6272/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880576 - PreCommit-HIVE-Build > Hive should merge empty files > - > > Key: HIVE-17257 > URL: https://issues.apache.org/jira/browse/HIVE-17257 > Project: Hive > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17257.0.patch > > > Currently if merging file option is turned on and the dest dir contains large > number of empty files, Hive will not trigger merge task: > {code} > private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) { > AverageSize averageSize = getAverageSize(inpFs, dirPath); > if (averageSize.getTotalSize() <= 0) { > return -1; > } > if (averageSize.getNumFiles() <= 1) { > return -1; > } > if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) { > return averageSize.getTotalSize(); > } > return -1; > } > {code} > This logic doesn't seem right as the it seems better to combine these empty > files into one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)