[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129989#comment-16129989 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: +1 > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129982#comment-16129982 ] Rui Li commented on HIVE-17321: --- [~kellyzly], yes they're all related to orc tables. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129923#comment-16129923 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: patch looks good. But I have 1 question, why the statistic of limitpushdown.q changes? before {code} if ((OrcInputFormat.class.isAssignableFrom(inputFormat) || MapredParquetInputFormat.class.isAssignableFrom(inputFormat)) && (noScan || partialScan)) { {code} Now {code} if ((OrcInputFormat.class.isAssignableFrom(inputFormat) || MapredParquetInputFormat.class.isAssignableFrom(inputFormat)) {code} If the InputFormat is TextFile, i think your patch will not change the result. If my understanding is not right, tell me. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129911#comment-16129911 ] Xuefu Zhang commented on HIVE-17321: +1 patch looks good to me. [~kellyzly], please let us know if you have more questions/comments. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129891#comment-16129891 ] Rui Li commented on HIVE-17321: --- Latest failures are not related. Changes to the golden files are all about statistics which is expected. [~kellyzly], [~xuefuz] could you take a look? Thanks. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129875#comment-16129875 ] Hive QA commented on HIVE-17321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882252/HIVE-17321.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10977 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=143) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6430/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882252 - PreCommit-HIVE-Build > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128407#comment-16128407 ] Rui Li commented on HIVE-17321: --- [~kellyzly], w/o the patch, analyze table w/o noscan/partialscan will launch a job containing only a TS. Therefore there won't be an FS to update the stats. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128375#comment-16128375 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: understand, but i am very curious why the raw data size of orc table is zero? When executing "INSERT OVERWRITE TABLE xxx SELECT * xxx",hive with orc will update statistics from orc footer in [FileSinkOperator#closeOp|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L1081] > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128332#comment-16128332 ] Rui Li commented on HIVE-17321: --- [~kellyzly], the problem is if you run analyze table w/o noscan/partialscan, the raw data size will be set to 0. HIVE-9560 solved the issue but it was only for MR and Tez. So Spark and MR will have different query plan for the analyze command. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128187#comment-16128187 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: for orc, we need not compute raw data size by using noscan/partialscan. Because the statistic about raw data size is written to the metastore when the data load finish. More detail about how to collect raw data statistic you can see HIVE-17018. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127089#comment-16127089 ] Hive QA commented on HIVE-17321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881901/HIVE-17321.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 41 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_elt] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_10] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_11] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_12] (batchId=105) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] (batchId=107) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] (batchId=138) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_1] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_2] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_3] (batchId=134) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_4] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_5] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_6] (batchId=113) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] (batchId=101) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_div0] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_pushdown] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapjoin] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_math_funcs] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_string_funcs] (batchId=125) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6397/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6397/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6397/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing