[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-30 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265584#comment-15265584
 ] 

Rui Li commented on HIVE-13572:
---

Hi [~ashutoshc], I just logged HIVE-13662 to track that.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 2.1.0
>
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265560#comment-15265560
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

[~lirui] If you are exploring setting permissions via FileSink operator can you 
create a follow-up jira to track that work?

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 2.1.0
>
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-28 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263427#comment-15263427
 ] 

Rui Li commented on HIVE-13572:
---

Thanks [~ashutoshc] for the review.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 2.1.0
>
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-28 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263390#comment-15263390
 ] 

Rui Li commented on HIVE-13572:
---

[~ashutoshc] - Yeah I think we can check this one in. Test failures either 
cannot be reproduced or fail in other runs, and therefore are not related to 
the patch here.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262955#comment-15262955
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

I think setting perms in FS might be a bigger change. [~ruili] Shall we check 
this one in while you work on FS changes. I am +1 on current patch.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261933#comment-15261933
 ] 

Hive QA commented on HIVE-13572:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800729/HIVE-13572.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 71 failed/errored test(s), 9948 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_3
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_3
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llapdecider
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testSimpleTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout

[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259412#comment-15259412
 ] 

Rui Li commented on HIVE-13572:
---

Here's the time (in seconds) spent of copying 183 files in my local test:
||W/O patch||Patch v1||Patch v2||
|16.36|3.6|0.72|

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259315#comment-15259315
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

I see. Do you have numbers on how do these two approaches compare?  Just trying 
to figure out how much improvement we are getting?

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259306#comment-15259306
 ] 

Rui Li commented on HIVE-13572:
---

Just created RS for v2 patch. Changes to sessionstate and shim are trying to 
add more work to threadpool and make the synchronization more efficient. It's 
not a bug.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258743#comment-15258743
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

Can you create a RB for this? Also I see changes related to sessionstate and in 
shims. Was there any bug you encountered there?

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257765#comment-15257765
 ] 

Rui Li commented on HIVE-13572:
---

Thanks [~ashutoshc] for the suggestion. The v2 patch sets status using 
threadpool, which provided better performance in my local tests.
I'll come up with another patch to do it in FS. One concern is that we may 
unnecessarily set the permission on intermediate data this way. We can decide 
which way to go when the patch is ready.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251306#comment-15251306
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

+1

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251214#comment-15251214
 ] 

Rui Li commented on HIVE-13572:
---

[~ashutoshc] would you mind take a look at this? Thanks.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)