[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304472#comment-15304472
 ] 

Ashutosh Chauhan commented on HIVE-13773:
-

[~pxiong] Is it the case that rowCounts are correct and only datasize is 
incorrect. If so, datasizes has not been implemented in ORCRecordUpdater yet. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java#L385

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298676#comment-15298676
 ] 

Ashutosh Chauhan commented on HIVE-13773:
-

I agree with you in general. But, in this particular test case stats are wrong 
even when query is run in isolation.

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298628#comment-15298628
 ] 

Eugene Koifman commented on HIVE-13773:
---

I don't see how queries can be answered from stats for Acid tables.  Acid 
tables are versioned and stats, afaik, are not.
So unless the query is asking for approximate info, I would not rely on stats.

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch, t.q, t.q.out, t.q.out.right
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-23 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297119#comment-15297119
 ] 

Prasanth Jayachandran commented on HIVE-13773:
--

[~pxiong] I initially added it for ORC writers (not ORC updaters - ACID). ORC 
writers implement the StatsProvidingRecordWriter interface. This interface 
returns the internally gathered stats (row count and raw data size). ACID was 
added later and I guess it does not implement the interface as it cannot 
provide reliable stats (because of deletes). I wanted to make sure this works 
for non-ACID use case. Also, this stats gathering should happen in processOp() 
and closeOp(). The reason for that is, with 
hive.optimize.sort.dynamic.partition there is only one record writer open per 
reducer at any point. Before closing the previous writer in processOp() we need 
to collect the statistics and for the last writer we gather statistics in 
closeOp(). I am not clear why you are removing the stats collection from 
processOp().

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290459#comment-15290459
 ] 

Hive QA commented on HIVE-13773:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12804536/HIVE-13773.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 92 failed/errored test(s), 10068 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-auto_join1.q-schema_evol_text_vec_mapwork_part_all_complex.q-vector_complex_join.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-order.q-auto_join18_multi_distinct.q-union2.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llapdecider
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part_all_primitive
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_fsstat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_with_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_shufflejoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_udf_max
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testPreemptionQueueComparator
org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority
org.apache.hadoop.hive.llap.tez.TestConverters.testFragmentSpecToTaskSpec

[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-18 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289434#comment-15289434
 ] 

Pengcheng Xiong commented on HIVE-13773:


[~ashutoshc], what i have observed is this. In the q file that I attached, 
there is an insert into. It reads from a table and then insert into a partition 
table. There are two configurations, hive.optimize.sort.dynamic.partition and 
also ACID. If we turn on only one of them, the stats of insert into works as we 
expected. However, if we turn on both of them, the stats of insert into got 
screwed up. Note that, the data is reading correctly. I suspect that HIVE-6455 
introduced prevFsp and it may be wrongly configured when ACID is on. The remove 
of the related code makes the stats of insert into work. But I need the 
original author [~prasanth_j] to confirm. Thanks.

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287965#comment-15287965
 ] 

Ashutosh Chauhan commented on HIVE-13773:
-

[~pxiong] Can you describe the bug?


> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13773) Stats state is not captured correctly in dynpart_sort_optimization_acid.q

2016-05-17 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287806#comment-15287806
 ] 

Pengcheng Xiong commented on HIVE-13773:


the patch partially reverts "HIVE-6455: Scalable dynamic partitioning and 
bucketing optimization" by [~prasanth_j] and [~vikram.dixit]. Could you guys 
take a look? Thanks.

> Stats state is not captured correctly in dynpart_sort_optimization_acid.q
> -
>
> Key: HIVE-13773
> URL: https://issues.apache.org/jira/browse/HIVE-13773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13773.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)