[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304472#comment-15304472 ] Ashutosh Chauhan commented on HIVE-13773: - [~pxiong] Is it the case that rowCounts are correct and only datasize is incorrect. If so, datasizes has not been implemented in ORCRecordUpdater yet. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java#L385 > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298676#comment-15298676 ] Ashutosh Chauhan commented on HIVE-13773: - I agree with you in general. But, in this particular test case stats are wrong even when query is run in isolation. > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298628#comment-15298628 ] Eugene Koifman commented on HIVE-13773: --- I don't see how queries can be answered from stats for Acid tables. Acid tables are versioned and stats, afaik, are not. So unless the query is asking for approximate info, I would not rely on stats. > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297119#comment-15297119 ] Prasanth Jayachandran commented on HIVE-13773: -- [~pxiong] I initially added it for ORC writers (not ORC updaters - ACID). ORC writers implement the StatsProvidingRecordWriter interface. This interface returns the internally gathered stats (row count and raw data size). ACID was added later and I guess it does not implement the interface as it cannot provide reliable stats (because of deletes). I wanted to make sure this works for non-ACID use case. Also, this stats gathering should happen in processOp() and closeOp(). The reason for that is, with hive.optimize.sort.dynamic.partition there is only one record writer open per reducer at any point. Before closing the previous writer in processOp() we need to collect the statistics and for the last writer we gather statistics in closeOp(). I am not clear why you are removing the stats collection from processOp(). > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290459#comment-15290459 ] Hive QA commented on HIVE-13773: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12804536/HIVE-13773.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 92 failed/errored test(s), 10068 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join1.q-schema_evol_text_vec_mapwork_part_all_complex.q-vector_complex_join.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-order.q-auto_join18_multi_distinct.q-union2.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llapdecider org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_fsstat org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_with_udf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_shufflejoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_udf_max org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testPreemptionQueueComparator org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority org.apache.hadoop.hive.llap.tez.TestConverters.testFragmentSpecToTaskSpec
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289434#comment-15289434 ] Pengcheng Xiong commented on HIVE-13773: [~ashutoshc], what i have observed is this. In the q file that I attached, there is an insert into. It reads from a table and then insert into a partition table. There are two configurations, hive.optimize.sort.dynamic.partition and also ACID. If we turn on only one of them, the stats of insert into works as we expected. However, if we turn on both of them, the stats of insert into got screwed up. Note that, the data is reading correctly. I suspect that HIVE-6455 introduced prevFsp and it may be wrongly configured when ACID is on. The remove of the related code makes the stats of insert into work. But I need the original author [~prasanth_j] to confirm. Thanks. > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287965#comment-15287965 ] Ashutosh Chauhan commented on HIVE-13773: - [~pxiong] Can you describe the bug? > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q
[ https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287806#comment-15287806 ] Pengcheng Xiong commented on HIVE-13773: the patch partially reverts "HIVE-6455: Scalable dynamic partitioning and bucketing optimization" by [~prasanth_j] and [~vikram.dixit]. Could you guys take a look? Thanks. > Stats state is not captured correctly in dynpart_sort_optimization_acid.q > - > > Key: HIVE-13773 > URL: https://issues.apache.org/jira/browse/HIVE-13773 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13773.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)