[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-17 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129989#comment-16129989
 ] 

liyunzhang_intel commented on HIVE-17321:
-

[~lirui]: +1 

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-17 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129982#comment-16129982
 ] 

Rui Li commented on HIVE-17321:
---

[~kellyzly], yes they're all related to orc tables.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129923#comment-16129923
 ] 

liyunzhang_intel commented on HIVE-17321:
-

[~lirui]: patch looks good. But I  have 1 question, why the statistic of 
limitpushdown.q changes? 
before
{code}
if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat)) && 
(noScan || partialScan)) {

{code}

Now
{code}
  if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat))
{code}
If the InputFormat is TextFile, i think your patch will not change the result.  
 If my understanding is not right, tell me.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129911#comment-16129911
 ] 

Xuefu Zhang commented on HIVE-17321:


+1 patch looks good to me. [~kellyzly], please let us know if you have more 
questions/comments.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129891#comment-16129891
 ] 

Rui Li commented on HIVE-17321:
---

Latest failures are not related. Changes to the golden files are all about 
statistics which is expected.
[~kellyzly], [~xuefuz] could you take a look? Thanks.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129875#comment-16129875
 ] 

Hive QA commented on HIVE-17321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882252/HIVE-17321.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6430/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882252 - PreCommit-HIVE-Build

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128407#comment-16128407
 ] 

Rui Li commented on HIVE-17321:
---

[~kellyzly], w/o the patch, analyze table w/o noscan/partialscan will launch a 
job containing only a TS. Therefore there won't be an FS to update the stats.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128375#comment-16128375
 ] 

liyunzhang_intel commented on HIVE-17321:
-

[~lirui]: understand, but i am very curious why the raw data size of orc table  
is zero?  When executing "INSERT OVERWRITE TABLE xxx SELECT * xxx",hive with 
orc will update statistics from orc footer in 
[FileSinkOperator#closeOp|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L1081]

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128332#comment-16128332
 ] 

Rui Li commented on HIVE-17321:
---

[~kellyzly], the problem is if you run analyze table w/o noscan/partialscan, 
the raw data size will be set to 0. HIVE-9560 solved the issue but it was only 
for MR and Tez. So Spark and MR will have different query plan for the analyze 
command.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-15 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128187#comment-16128187
 ] 

liyunzhang_intel commented on HIVE-17321:
-

[~lirui]: for orc, we need not compute raw data size by using 
noscan/partialscan. Because the statistic about raw data size is written to the 
metastore when the data load finish. More detail about how to collect raw data 
statistic you can see HIVE-17018.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127089#comment-16127089
 ] 

Hive QA commented on HIVE-17321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881901/HIVE-17321.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 41 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_elt] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join]
 (batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_10] 
(batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_11] 
(batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_12] 
(batchId=105)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] 
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] 
(batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] 
(batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] 
(batchId=138)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_1] 
(batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_2] 
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_3] 
(batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_4] 
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_5] 
(batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_6] 
(batchId=113)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_div0] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_pushdown]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] 
(batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapjoin] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_math_funcs]
 (batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin]
 (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_string_funcs]
 (batchId=125)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6397/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6397/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6397/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing