[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265584#comment-15265584 ] Rui Li commented on HIVE-13572: --- Hi [~ashutoshc], I just logged HIVE-13662 to track that. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Rui Li >Assignee: Rui Li > Fix For: 2.1.0 > > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265560#comment-15265560 ] Ashutosh Chauhan commented on HIVE-13572: - [~lirui] If you are exploring setting permissions via FileSink operator can you create a follow-up jira to track that work? > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Rui Li >Assignee: Rui Li > Fix For: 2.1.0 > > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263427#comment-15263427 ] Rui Li commented on HIVE-13572: --- Thanks [~ashutoshc] for the review. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Rui Li >Assignee: Rui Li > Fix For: 2.1.0 > > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263390#comment-15263390 ] Rui Li commented on HIVE-13572: --- [~ashutoshc] - Yeah I think we can check this one in. Test failures either cannot be reproduced or fail in other runs, and therefore are not related to the patch here. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262955#comment-15262955 ] Ashutosh Chauhan commented on HIVE-13572: - I think setting perms in FS might be a bigger change. [~ruili] Shall we check this one in while you work on FS changes. I am +1 on current patch. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261933#comment-15261933 ] Hive QA commented on HIVE-13572: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800729/HIVE-13572.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 71 failed/errored test(s), 9948 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_3 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_5 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_3 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_4 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_5 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llapdecider org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testSimpleTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259412#comment-15259412 ] Rui Li commented on HIVE-13572: --- Here's the time (in seconds) spent of copying 183 files in my local test: ||W/O patch||Patch v1||Patch v2|| |16.36|3.6|0.72| > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259315#comment-15259315 ] Ashutosh Chauhan commented on HIVE-13572: - I see. Do you have numbers on how do these two approaches compare? Just trying to figure out how much improvement we are getting? > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259306#comment-15259306 ] Rui Li commented on HIVE-13572: --- Just created RS for v2 patch. Changes to sessionstate and shim are trying to add more work to threadpool and make the synchronization more efficient. It's not a bug. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258743#comment-15258743 ] Ashutosh Chauhan commented on HIVE-13572: - Can you create a RB for this? Also I see changes related to sessionstate and in shims. Was there any bug you encountered there? > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257765#comment-15257765 ] Rui Li commented on HIVE-13572: --- Thanks [~ashutoshc] for the suggestion. The v2 patch sets status using threadpool, which provided better performance in my local tests. I'll come up with another patch to do it in FS. One concern is that we may unnecessarily set the permission on intermediate data this way. We can decide which way to go when the patch is ready. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251306#comment-15251306 ] Ashutosh Chauhan commented on HIVE-13572: - +1 > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251214#comment-15251214 ] Rui Li commented on HIVE-13572: --- [~ashutoshc] would you mind take a look at this? Thanks. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)