[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-12 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Attachment: HIVE-17261.11.patch

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.10.patch, HIVE-17261.11.patch, 
> HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.4.patch, 
> HIVE-17261.5.patch, HIVE-17261.6.patch, HIVE-17261.7.patch, 
> HIVE-17261.8.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17474) Poor Performance about subquery like DS/query70 on HoS

2017-09-12 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164180#comment-16164180
 ] 

liyunzhang_intel commented on HIVE-17474:
-

[~lirui]: thanks for reply. I am debugging whether there is problem about 
statistics.
By the way,can we solve the problem by converting the common join to skewed 
join?
As  all keys in part2 is very big and the distinct key is very few(less than 
30), can we think this is a  skew case? I have tried to set 
hive.optimize.skewjoin as true and hive.skewjoin.key as 10. But it seems 
not effect.  I am very curious  why skew join does not have effect. From the 
doc, it seems will 
{code}
A join B on A.id=B.id 
And A skews for id=1. Then we perform the following two joins: 
1.  A join B on A.id=B.id and A.id!=1 
2.  A join B on A.id=B.id and A.id=1 
If B doesn’t skew on id=1, then #2 will be a map join.
{code}
I think after enabling skew join, all keys in part2 will be skewed keys, part2 
will map join with part1. 

> Poor Performance about subquery like DS/query70 on HoS
> --
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
> Attachments: explain.70.vec
>
>
> in 
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
>  {code}
> select  
> sum(ss_net_profit) as total_sum
>,s_state
>,s_county
>,grouping__id as lochierarchy
>, rank() over(partition by grouping__id, case when grouping__id == 2 then 
> s_state end order by sum(ss_net_profit)) as rank_within_parent
> from
> store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk
> join store s on s.s_store_sk  = ss.ss_store_sk
>  where
> d1.d_month_seq between 1193 and 1193+11
>  and s.s_state in
>  ( select s_state
>from  (select s_state as s_state, sum(ss_net_profit),
>  rank() over ( partition by s_state order by 
> sum(ss_net_profit) desc) as ranking
>   from   store_sales, store, date_dim
>   where  d_month_seq between 1193 and 1193+11
> and date_dim.d_date_sk = 
> store_sales.ss_sold_date_sk
> and store.s_store_sk  = store_sales.ss_store_sk
>   group by s_state
>  ) tmp1 
>where ranking <= 5
>  )
>  group by s_state,s_county with rollup
> order by
>lochierarchy desc
>   ,case when lochierarchy = 0 then s_state end
>   ,rank_within_parent
>  limit 100;
> {code}
>  let's analyze the query,
> part1: it calculates the sub-query and get the result of the state which 
> ss_net_profit is less than 5.
> part2: big table store_sales join small tables date_dim, store and get the 
> result.
> part3: part1 join part2
> the problem is on the part3, this is common join. The cardinality of part1 
> and part2 is low as there are not very different values about states( 
> actually there are 30 different values in the table store).  If use common 
> join, big data will go to the 30 reducers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-12 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Attachment: HIVE-17261.10.patch

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.10.patch, HIVE-17261.2.patch, 
> HIVE-17261.3.patch, HIVE-17261.4.patch, HIVE-17261.5.patch, 
> HIVE-17261.6.patch, HIVE-17261.7.patch, HIVE-17261.8.patch, HIVE-17261.diff, 
> HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed

2017-09-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-17410:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch [~anishek] and for the review [~sankarh]!


> repl load task during subsequent DAG generation does not start from the last 
> partition processed
> 
>
> Key: HIVE-17410
> URL: https://issues.apache.org/jira/browse/HIVE-17410
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17410.1.patch, HIVE-17410.2.patch, 
> HIVE-17410.3.patch
>
>
> DAG generation for repl load task was to be generated dynamically such that 
> if the load break happens at a partition load time then for subsequent runs 
> we should start post the last partition processed.
> We currently identify the point from where we have to process the event but 
> reinitialize the iterator to start from beginning of all partition's to 
> process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164166#comment-16164166
 ] 

Hive QA commented on HIVE-17514:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886662/HIVE-17514.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11039 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6790/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6790/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6790/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886662 - PreCommit-HIVE-Build

> Use SHA-256 for cookie signer to improve security
> -
>
> Key: HIVE-17514
> URL: https://issues.apache.org/jira/browse/HIVE-17514
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17514.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed

2017-09-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164164#comment-16164164
 ] 

Thejas M Nair commented on HIVE-17410:
--

+1


> repl load task during subsequent DAG generation does not start from the last 
> partition processed
> 
>
> Key: HIVE-17410
> URL: https://issues.apache.org/jira/browse/HIVE-17410
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-17410.1.patch, HIVE-17410.2.patch, 
> HIVE-17410.3.patch
>
>
> DAG generation for repl load task was to be generated dynamically such that 
> if the load break happens at a partition load time then for subsequent runs 
> we should start post the last partition processed.
> We currently identify the point from where we have to process the event but 
> reinitialize the iterator to start from beginning of all partition's to 
> process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164117#comment-16164117
 ] 

Hive QA commented on HIVE-17338:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886654/HIVE-17338.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 414 failed/errored test(s), 4859 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestAcidInputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=262)
TestAcidOnTez - did not produce a TEST-*.xml file (likely timed out) 
(batchId=215)
TestAcidTableSerializer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestAddResource - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
TestAdminUser - did not produce a TEST-*.xml file (likely timed out) 
(batchId=210)
TestAuthorizationPreEventListener - did not produce a TEST-*.xml file (likely 
timed out) (batchId=219)
TestAuthzApiEmbedAuthorizerInEmbed - did not produce a TEST-*.xml file (likely 
timed out) (batchId=214)
TestAuthzApiEmbedAuthorizerInRemote - did not produce a TEST-*.xml file (likely 
timed out) (batchId=214)
TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) 
(batchId=220)
TestAvroGenericRecordReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=262)
TestAvroHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=183)
TestAvroHCatStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=183)
TestBeeLineDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=239)
TestBeeLineExceptionHandling - did not produce a TEST-*.xml file (likely timed 
out) (batchId=178)
TestBeeLineHistory - did not produce a TEST-*.xml file (likely timed out) 
(batchId=178)
TestBeeLineWithArgs - did not produce a TEST-*.xml file (likely timed out) 
(batchId=221)
TestBeelineArgParsing - did not produce a TEST-*.xml file (likely timed out) 
(batchId=178)
TestBeelineConnectionUsingHiveSite - did not produce a TEST-*.xml file (likely 
timed out) (batchId=221)
TestBeelinePasswordOption - did not produce a TEST-*.xml file (likely timed 
out) (batchId=221)
TestBeelineWithUserHs2ConnectionFile - did not produce a TEST-*.xml file 
(likely timed out) (batchId=221)
TestBlobstoreCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=242)
TestBlobstoreNegativeCliDriver - did not produce a TEST-*.xml file (likely 
timed out) (batchId=242)
TestBucketIdResolverImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestCLIAuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=228)
TestClearDanglingScratchDir - did not produce a TEST-*.xml file (likely timed 
out) (batchId=220)
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=1)

[udf_upper.q,ctas_date.q,schema_evol_orc_acidvec_table_update.q,groupby_grouping_sets3.q,vector_decimal_5.q,bucket_map_join_spark4.q,timestamp_2.q,date_join1.q,constprog_type.q,timestamp_ints_casts.q,udf_negative.q,orc_merge_diff_fs.q,udf_substring_index.q,newline.q,diff_part_input_formats.q,auto_join_without_localtask.q,join46.q,ctas_uses_table_location.q,tez_bmj_schema_evolution.q,bucketmapjoin4.q,udf_context_aware.q,groupby2_noskew.q,authorization_non_id.q,sample_islocalmode_hook_hadoop20.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,input15.q,groupby6_noskew.q,udf_PI.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=10)

[vector_coalesce.q,join_emit_interval.q,udf5.q,udf_case_thrift.q,correlationoptimizer13.q,mapjoin2.q,ppd_repeated_alias.q,correlationoptimizer4.q,vector_windowing_navfn.q,vectorization_12.q,vector_number_compare_projection.q,ppd_windowing2.q,parquet_table_with_subschema.q,authorization_8.q,exim_07_all_part_over_nonoverlap.q,udf_locate.q,nullgroup4_multi_distinct.q,bucket6.q,udf_string.q,cbo_rp_insert.q,schema_evol_text_vecrow_table.q,udf_likeany.q,orc_ppd_char.q,udf_boolean.q,udf_xpath_double.q,index_compression.q,vector_if_expr.q,groupby_sort_skew_1.q,materialized_view_drop.q,acid_mapjoin.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=11)


[jira] [Updated] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17522:
--
Status: Patch Available  (was: Open)

> cleanup old 'repl dump' dirs
> 
>
> Key: HIVE-17522
> URL: https://issues.apache.org/jira/browse/HIVE-17522
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17522.1.patch
>
>
> We want to clean up the old dump dirs to save space and reduce scan time when 
> needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17522:
--
Attachment: HIVE-17522.1.patch

> cleanup old 'repl dump' dirs
> 
>
> Key: HIVE-17522
> URL: https://issues.apache.org/jira/browse/HIVE-17522
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17522.1.patch
>
>
> We want to clean up the old dump dirs to save space and reduce scan time when 
> needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17474) Poor Performance about subquery like DS/query70 on HoS

2017-09-12 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164056#comment-16164056
 ] 

Rui Li commented on HIVE-17474:
---

[~kellyzly], I think CommonMergeJoinOperator is specific to Tez and HoS doesn't 
use it. But seems CommonMergeJoinOperator is not map join for Tez - Tez also 
uses MapJoinOperator for map join. And you can also look at the edge type - I 
think map join will use a BROADCAST_EDGE.

So the problem is the estimated data size of Reducer12 is too big right? The 
graph is something like {{Map9 -> Reducer10 -> Reducer11 -> Reducer12}}. Do you 
know at which step the statistics begin to go incorrect?

> Poor Performance about subquery like DS/query70 on HoS
> --
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
> Attachments: explain.70.vec
>
>
> in 
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
>  {code}
> select  
> sum(ss_net_profit) as total_sum
>,s_state
>,s_county
>,grouping__id as lochierarchy
>, rank() over(partition by grouping__id, case when grouping__id == 2 then 
> s_state end order by sum(ss_net_profit)) as rank_within_parent
> from
> store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk
> join store s on s.s_store_sk  = ss.ss_store_sk
>  where
> d1.d_month_seq between 1193 and 1193+11
>  and s.s_state in
>  ( select s_state
>from  (select s_state as s_state, sum(ss_net_profit),
>  rank() over ( partition by s_state order by 
> sum(ss_net_profit) desc) as ranking
>   from   store_sales, store, date_dim
>   where  d_month_seq between 1193 and 1193+11
> and date_dim.d_date_sk = 
> store_sales.ss_sold_date_sk
> and store.s_store_sk  = store_sales.ss_store_sk
>   group by s_state
>  ) tmp1 
>where ranking <= 5
>  )
>  group by s_state,s_county with rollup
> order by
>lochierarchy desc
>   ,case when lochierarchy = 0 then s_state end
>   ,rank_within_parent
>  limit 100;
> {code}
>  let's analyze the query,
> part1: it calculates the sub-query and get the result of the state which 
> ss_net_profit is less than 5.
> part2: big table store_sales join small tables date_dim, store and get the 
> result.
> part3: part1 join part2
> the problem is on the part3, this is common join. The cardinality of part1 
> and part2 is low as there are not very different values about states( 
> actually there are 30 different values in the table store).  If use common 
> join, big data will go to the 30 reducers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17473) implement workload management pools

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17473:

Attachment: HIVE-17473.01.only.patch
HIVE-17473.01.patch

Rebased after the changes in the base patch

> implement workload management pools
> ---
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.01.only.patch, HIVE-17473.01.patch, 
> HIVE-17473.only.patch, HIVE-17473.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17473) implement workload management pools

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17473:

Attachment: (was: HIVE-17473.WIP.patch)

> implement workload management pools
> ---
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.01.only.patch, HIVE-17473.01.patch, 
> HIVE-17473.only.patch, HIVE-17473.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.11.patch

Rebased and updated

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.11.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17524) instrument LLAP metadata cache with separate counters by format

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17524:
---


> instrument LLAP metadata cache with separate counters by format
> ---
>
> Key: HIVE-17524
> URL: https://issues.apache.org/jira/browse/HIVE-17524
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Followup from HIVE-15665. ORC file tails, Parquet file tails, and ORC stripe 
> tails should be separately counted for jmx/iomem output. Technically cache 
> should know nothing about what bytebuffer is what, so perhaps it should be by 
> key type, and key types should be separated between the 3 more explicitly 
> (including the replacement on Long HDFS inode based keys with 2 separate 
> wrappers for primitive long).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17523:
--
Comment: was deleted

(was: https://github.com/apache/hive/pull/249)

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163993#comment-16163993
 ] 

slim bouguerra commented on HIVE-17523:
---

https://github.com/apache/hive/pull/249

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-17523:
--
Labels: pull-request-available  (was: )

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163992#comment-16163992
 ] 

ASF GitHub Bot commented on HIVE-17523:
---

GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/249

[HIVE-17523] Fix insert into bug

https://issues.apache.org/jira/browse/HIVE-17523

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive fix_insert_into

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #249


commit 579483ee878a16491d61c0eada5891880f507302
Author: Slim Bouguerra 
Date:   2017-09-09T00:12:14Z

set max tries to zero to make test faster.

Change-Id: Ia1523f3b565f4b08c76067fa0bac32d5171fb46e

commit ca5c7ccb1cc41dc3c8c76ccefb2d05b7d096a4ca
Author: Slim Bouguerra 
Date:   2017-09-11T22:39:40Z

Make insert into use data segment pusher to avoid duplication of logic some 
refactoring of exeception logging and handeling

Change-Id: I7bd8f29a83720f4cfba338acf27fb85b9774eafe

commit 3f2dd91c04839df7c068e1608b3e2af98babdc11
Author: Slim Bouguerra 
Date:   2017-09-13T01:04:05Z

cleaning and refactor

Change-Id: I9e2b14e6e32af095d2c4d2f9f1fbe8a9cced30ad




> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163986#comment-16163986
 ] 

slim bouguerra commented on HIVE-17523:
---

https://reviews.apache.org/r/62262/

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-17523:
-

Assignee: Nishant Bangarwa

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17523:
--
Attachment: HIVE-17523.patch

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17523:
--
Status: Patch Available  (was: Open)

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.04.patch

Rebasing, modifying to integrate with AM registry, addressing RB feedback

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-13748) TypeInfoParser cannot handle the dash in the field name of a complex type

2017-09-12 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163879#comment-16163879
 ] 

Dan Osipov edited comment on HIVE-13748 at 9/12/17 11:47 PM:
-

Looks like it should be possible to support by adding {{'-'}} to the list on 
this line:
https://github.com/prongs/apache-hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L280

Any side effects of doing that?
cc [~prasanth_j]


was (Author: danospv):
Looks like it should be possible to support by adding {{"-"}} to the list on 
this line:
https://github.com/prongs/apache-hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L280

Any side effects of doing that?
cc [~prasanth_j]

> TypeInfoParser cannot handle the dash in the field name of a complex type
> -
>
> Key: HIVE-13748
> URL: https://issues.apache.org/jira/browse/HIVE-13748
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
>
> hive> create table y(col struct<`a-b`:double> COMMENT 'type field has a 
> dash');
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: 
> Error: : expected at the position 8 of 'struct' but '-' is found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13748) TypeInfoParser cannot handle the dash in the field name of a complex type

2017-09-12 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163879#comment-16163879
 ] 

Dan Osipov commented on HIVE-13748:
---

Looks like it should be possible to support by adding {{"-"}} to the list on 
this line:
https://github.com/prongs/apache-hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L280

Any side effects of doing that?
cc [~prasanth_j]

> TypeInfoParser cannot handle the dash in the field name of a complex type
> -
>
> Key: HIVE-13748
> URL: https://issues.apache.org/jira/browse/HIVE-13748
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
>
> hive> create table y(col struct<`a-b`:double> COMMENT 'type field has a 
> dash');
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: 
> Error: : expected at the position 8 of 'struct' but '-' is found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Status: Patch Available  (was: Open)

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Attachment: HIVE-17493.2.patch

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Status: Open  (was: Patch Available)

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17275:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you, [~cdrome]!

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275.2-branch-2.2.patch, 
> HIVE-17275.2-branch-2.patch, HIVE-17275.2.patch, HIVE-17275-branch-2.2.patch, 
> HIVE-17275-branch-2.patch, HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:216)
>   

[jira] [Commented] (HIVE-17479) Staging directories do not get cleaned up for update/delete queries

2017-09-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163840#comment-16163840
 ] 

Eugene Koifman commented on HIVE-17479:
---

+1

> Staging directories do not get cleaned up for update/delete queries
> ---
>
> Key: HIVE-17479
> URL: https://issues.apache.org/jira/browse/HIVE-17479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17479.patch
>
>
> When these queries are internally rewritten, a new context is created with a 
> new execution id. This id is used to create the scratch directories. However, 
> only the original context is cleared, and thus the directories created with 
> the original execution id.
> The solution is to pass the execution id to the new context when the queries 
> are internally rewritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17479) Staging directories do not get cleaned up for update/delete queries

2017-09-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163840#comment-16163840
 ] 

Eugene Koifman edited comment on HIVE-17479 at 9/12/17 11:24 PM:
-

+1 (assuming test failures are not related)


was (Author: ekoifman):
+1

> Staging directories do not get cleaned up for update/delete queries
> ---
>
> Key: HIVE-17479
> URL: https://issues.apache.org/jira/browse/HIVE-17479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17479.patch
>
>
> When these queries are internally rewritten, a new context is created with a 
> new execution id. This id is used to create the scratch directories. However, 
> only the original context is cleared, and thus the directories created with 
> the original execution id.
> The solution is to pass the execution id to the new context when the queries 
> are internally rewritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17479) Staging directories do not get cleaned up for update/delete queries

2017-09-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17479:
--
Component/s: Transactions

> Staging directories do not get cleaned up for update/delete queries
> ---
>
> Key: HIVE-17479
> URL: https://issues.apache.org/jira/browse/HIVE-17479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17479.patch
>
>
> When these queries are internally rewritten, a new context is created with a 
> new execution id. This id is used to create the scratch directories. However, 
> only the original context is cleared, and thus the directories created with 
> the original execution id.
> The solution is to pass the execution id to the new context when the queries 
> are internally rewritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17511) Error while populating orc cache in llap

2017-09-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163838#comment-16163838
 ] 

Sergey Shelukhin commented on HIVE-17511:
-

There's trace dump in the logs which shows a cache buffer that doesn't match 
the file being read in the cache results. The processing in the trace itself 
was correct.. trying to figure out how this can happen.

> Error while populating orc cache in llap
> 
>
> Key: HIVE-17511
> URL: https://issues.apache.org/jira/browse/HIVE-17511
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Ashutosh Chauhan
>Assignee: Sergey Shelukhin
>
> Observed that while querying an error is thrown while loading cache in llap 
> daemons



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li reassigned HIVE-17522:
-


> cleanup old 'repl dump' dirs
> 
>
> Key: HIVE-17522
> URL: https://issues.apache.org/jira/browse/HIVE-17522
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>
> We want to clean up the old dump dirs to save space and reduce scan time when 
> needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Skip non-native/temporary tables for all major table/partition related scenarios

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Attachment: HIVE-17422.2.patch

> Skip non-native/temporary tables for all major table/partition related 
> scenarios
> 
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17422.1.patch, HIVE-17422.2.patch
>
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163801#comment-16163801
 ] 

Daniel Dai commented on HIVE-17496:
---

+1 pending tests. Tao confirm that driverMirror.close won't cause issue for 
followup tests as it only close the current statement.

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163793#comment-16163793
 ] 

Anthony Hsu commented on HIVE-17394:


Thanks, [~cwsteinbach] and [~rdsr] for the reviews!

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Skip non-native/temporary tables for all major table/partition related scenarios

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Attachment: HIVE-17422.1.patch

> Skip non-native/temporary tables for all major table/partition related 
> scenarios
> 
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17422.1.patch
>
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17488:
--
Attachment: HIVE-17488.3.patch

Patch 3 based on changes from the review comments.

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.2.patch, HIVE-17488.3.patch, HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17488:
--
Status: Patch Available  (was: Open)

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.2.patch, HIVE-17488.3.patch, HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17488:
--
Status: Open  (was: Patch Available)

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.2.patch, HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17474) Poor Performance about subquery like DS/query70 on HoS

2017-09-12 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163762#comment-16163762
 ] 

liyunzhang_intel commented on HIVE-17474:
-

[~lirui] , [~xuefuz]: after debugging in tez, found the part2 join part1 is 
common merge join(CommonMergeJoinOperator).
{code}
  Reducer 2 
Reduce Operator Tree:
  Merge Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0 _col7 (type: string)
  1 _col0 (type: string)

{code}


the implementation of CommonMergeJoin. Does hive on spark enable 
CommonMergeJoin?
{code}
/*
 * With an aim to consolidate the join algorithms to either hash based joins 
(MapJoinOperator) or
 * sort-merge based joins, this operator is being introduced. This operator 
executes a sort-merge
 * based algorithm. It replaces both the JoinOperator and the 
SMBMapJoinOperator for the tez side of
 * things. It works in either the map phase or reduce phase.
 *
 * The basic algorithm is as follows:
 *
 * 1. The processOp receives a row from a "big" table.
 * 2. In order to process it, the operator does a fetch for rows from the other 
tables.
 * 3. Once we have a set of rows from the other tables (till we hit a new key), 
more rows are
 *brought in from the big table and a join is performed.
 */
{code}

> Poor Performance about subquery like DS/query70 on HoS
> --
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
> Attachments: explain.70.vec
>
>
> in 
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
>  {code}
> select  
> sum(ss_net_profit) as total_sum
>,s_state
>,s_county
>,grouping__id as lochierarchy
>, rank() over(partition by grouping__id, case when grouping__id == 2 then 
> s_state end order by sum(ss_net_profit)) as rank_within_parent
> from
> store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk
> join store s on s.s_store_sk  = ss.ss_store_sk
>  where
> d1.d_month_seq between 1193 and 1193+11
>  and s.s_state in
>  ( select s_state
>from  (select s_state as s_state, sum(ss_net_profit),
>  rank() over ( partition by s_state order by 
> sum(ss_net_profit) desc) as ranking
>   from   store_sales, store, date_dim
>   where  d_month_seq between 1193 and 1193+11
> and date_dim.d_date_sk = 
> store_sales.ss_sold_date_sk
> and store.s_store_sk  = store_sales.ss_store_sk
>   group by s_state
>  ) tmp1 
>where ranking <= 5
>  )
>  group by s_state,s_county with rollup
> order by
>lochierarchy desc
>   ,case when lochierarchy = 0 then s_state end
>   ,rank_within_parent
>  limit 100;
> {code}
>  let's analyze the query,
> part1: it calculates the sub-query and get the result of the state which 
> ss_net_profit is less than 5.
> part2: big table store_sales join small tables date_dim, store and get the 
> result.
> part3: part1 join part2
> the problem is on the part3, this is common join. The cardinality of part1 
> and part2 is low as there are not very different values about states( 
> actually there are 30 different values in the table store).  If use common 
> join, big data will go to the 30 reducers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-17394:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Anthony!

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163756#comment-16163756
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

Comments was posted on rb. Please take a look.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Open  (was: Patch Available)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Attachment: HIVE-17465.3.patch

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Patch Available  (was: Open)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17521) Improve defaults for few runtime configs

2017-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17521:

Status: Patch Available  (was: Open)

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17521) Improve defaults for few runtime configs

2017-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17521:

Attachment: HIVE-17521.patch

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17521) Improve defaults for few runtime configs

2017-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-17521:
---


> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163734#comment-16163734
 ] 

Mithun Radhakrishnan commented on HIVE-17466:
-

I'm +1 on the API. I'll wait for tests to complete, and ensure that things look 
alright. The failures I've seen thus far are known and unrelated.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch, HIVE-17466.3.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Attachment: HIVE-17466.3.patch

Rebased for master.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch, HIVE-17466.3.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163704#comment-16163704
 ] 

Carl Steinbach commented on HIVE-17394:
---

The four test failures were already present in previous builds, so this looks 
like a clean run.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163631#comment-16163631
 ] 

Hive QA commented on HIVE-17394:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886650/HIVE-17394.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11036 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6788/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886650 - PreCommit-HIVE-Build

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163572#comment-16163572
 ] 

Eugene Koifman commented on HIVE-17482:
---

Are there any tests that prove this patch works?

The expectation on master is that all queries run in a transaction context.
(the system doesn't assert it thought I've not gotten there yet)
If you look at Driver.acquireLocks()  - it will bail if you don't have a 
transaction open.

Also, the caller is supposed to pass ValidTxnList to the reader in 
Configuration.  Where is that happening?  the current acquireLocks does it but 
it sets it on Configuration but will this be clobbered by the next fragment?

I think the semantics that we discussed earlier is that a complex query is 
broken down into fragments corresponding to each scan and each is executed as a 
separate query in a separate transaction.  So each fragment would start a txn, 
lock in the current snapshot and acquire locks.  This would create 
READ_COMMITTED semantics from whole query point of view.  Is that still the 
intent?  If so it's worth documenting this clearly in this bug or somewhere 
else.

It's not clear based on the diffs that this is what is happening.

When_GenericUDTFGetSplits.createPlanFragment()_
calls   _CommandProcessorResponse cpr = driver.compileAndRespond(query);_
it will start a txn using TransactionManager that belongs to the SessionState 
in current ThreadLocal
Then it seems like the code will create a new TransactionManager  to acquire 
locks (which I think will do nothing... thought it should be made to throw)

So if I'm reading this right it doesn't do the right thing




> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Skip non-native/temporary tables for all major table related scenarios

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Summary: Skip non-native/temporary tables for all major table related 
scenarios  (was: Don't dump non-native/temporary tables during incremental dump)

> Skip non-native/temporary tables for all major table related scenarios
> --
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Skip non-native/temporary tables for all major table/partition related scenarios

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Summary: Skip non-native/temporary tables for all major table/partition 
related scenarios  (was: Skip non-native/temporary tables for all major table 
related scenarios)

> Skip non-native/temporary tables for all major table/partition related 
> scenarios
> 
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17494) Bootstrap REPL DUMP throws exception if a partitioned table is dropped while reading partitions.

2017-09-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17494:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Patch pushed to master.

> Bootstrap REPL DUMP throws exception if a partitioned table is dropped while 
> reading partitions.
> 
>
> Key: HIVE-17494
> URL: https://issues.apache.org/jira/browse/HIVE-17494
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17494.01.patch
>
>
> When a table is dropped between fetching table and fetching partitions, then 
> bootstrap dump throws exception.
> 1. Fetch table names.
> 2. Get table
> 3. Dump table object
> 4. Drop table from another thread.
> 5. Fetch partitions (throws exception from fireReadTablePreEvent)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17504) Skip ACID table for replication

2017-09-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17504:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1. Patch pushed to master.

> Skip ACID table for replication
> ---
>
> Key: HIVE-17504
> URL: https://issues.apache.org/jira/browse/HIVE-17504
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Fix For: 3.0.0
>
> Attachments: HIVE-17504.1.patch
>
>
> Currently we are not supporting replicate ACID tables (which will be future 
> work).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Don't dump non-native/temporary tables during incremental dump

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Description: 
Currently during incremental dump, the non-native/temporary table info is 
partially dumped in metadata file and will be ignored later by the repl load. 
We can optimize it by moving the check (whether the table should be exported or 
not) earlier so that we don't save any info to dump file for such types of 
tables. CreateTableHandler already has this optimization, so we just need to 
apply similar logic to other scenarios.

The change is to apply the EximUtil.shouldExportTable check to all scenarios 
(e.g. alter table) that calls into the common dump method. 

  was:Currently during incremental dump, the non-native/temporary table info is 
partially dumped in metadata file and will be ignored later by the repl load. 
We can optimize it by moving the check (whether the table should be exported or 
not) earlier so that we don't save any info to dump file for such types of 
tables. CreateTableHandler already has this optimization, so we just need to 
apply similar logic to other scenarios.


> Don't dump non-native/temporary tables during incremental dump
> --
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17495) CachedStore: prewarm improvement (avoid multiple sql calls to read partition column stats), refactoring and caching some aggregate stats

2017-09-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163523#comment-16163523
 ] 

Gopal V commented on HIVE-17495:


bq. Are these tools you mentioned using embedded metastore ?

Yes, they are - the metastore thrift API is broken for cross-access between 
Hive1 and Hive2.

> CachedStore: prewarm improvement (avoid multiple sql calls to read partition 
> column stats), refactoring and caching some aggregate stats
> 
>
> Key: HIVE-17495
> URL: https://issues.apache.org/jira/browse/HIVE-17495
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-17495.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17495) CachedStore: prewarm improvement (avoid multiple sql calls to read partition column stats), refactoring and caching some aggregate stats

2017-09-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163518#comment-16163518
 ] 

Thejas M Nair commented on HIVE-17495:
--

[~sershe] [~gopalv]
Are these tools you mentioned using embedded metastore ? If not, I don't see 
how they would trigger pre-warm.

hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore 
is the config that enables caching. That config should get used only by 
metastore server, not clients.


> CachedStore: prewarm improvement (avoid multiple sql calls to read partition 
> column stats), refactoring and caching some aggregate stats
> 
>
> Key: HIVE-17495
> URL: https://issues.apache.org/jira/browse/HIVE-17495
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-17495.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163517#comment-16163517
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886642/HIVE-13567.22.patch

{color:green}SUCCESS:{color} +1 due to 41 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 248 failed/errored test(s), 11043 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=239)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=239)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=239)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_insert_overwrite] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_10] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_3a] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal_native] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket1] (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnStatsUpdateForStatsOptimizer_2]
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer5] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_limit] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_multi_single_reducer]
 (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input14_limit] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input3_limit] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby2] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby3] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_exception] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[outer_reference_windowed]
 (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_12] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_20] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats15a] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_noscan_2a] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[transform_acid] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_character_length] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_octet_length] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union33] (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts_special_characters]
 (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_multi_insert] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_character_length]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_octet_length] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_4] 
(batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_simple] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing] 
(batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_context] 
(batchId=31)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] 
(batchId=143)

[jira] [Resolved] (HIVE-17520) Temp table with CTAS tries to create staging directory in the actual database directory

2017-09-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-17520.
---
Resolution: Duplicate

Actually, similar solution already done in HIVE-15367

> Temp table with CTAS tries to create staging directory in the actual database 
> directory
> ---
>
> Key: HIVE-17520
> URL: https://issues.apache.org/jira/browse/HIVE-17520
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> User does not have FS permissions on the database directory.
> Note that normal CREATE TEMPORARY TABLE (no CTAS) on that database does work.
> However trying temp table with CTAS fails with the following error:
> {noformat}
> hive> create temporary table jdere_temp as select * from 
> tpch_text_1000.nation;
> FAILED: SemanticException 0:0 Error creating temporary folder on: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpch_text_1000.db.
>  Error encountered near token 'TOK_TMP_FILE'
> {noformat}
> Simple fix would be to set the staging directory to the temp table location 
> for the case of temp tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15212) merge branch into master

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15212:

Status: Patch Available  (was: Open)

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163468#comment-16163468
 ] 

Mithun Radhakrishnan commented on HIVE-17472:
-

For the record, the failing tests on {{branch-2.2}} seem to fail independently 
of this patch. :/

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch, HIVE-17472.3-branch-2.2.patch, 
> HIVE-17472.3-branch-2.patch, HIVE-17472.3.patch, 
> HIVE-17472.4-branch-2.2.patch, HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: HIVE-17489.2.patch

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch, 
> HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Attachment: HIVE-17472.4.patch
HIVE-17472.4-branch-2.patch
HIVE-17472.4-branch-2.2.patch

Dummy patches, to test the tests.

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch, HIVE-17472.3-branch-2.2.patch, 
> HIVE-17472.3-branch-2.patch, HIVE-17472.3.patch, 
> HIVE-17472.4-branch-2.2.patch, HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Comment: was deleted

(was: Dummy patches to run tests.)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: (was: HIVE-17472.4-branch-2.2.patch)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: (was: HIVE-17472.4.patch)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: (was: HIVE-17472.4-branch-2.patch)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: (was: HIVE-17489.1.patch)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: HIVE-17472.4.patch
HIVE-17472.4-branch-2.patch
HIVE-17472.4-branch-2.2.patch

Dummy patches to run tests.

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17472.4-branch-2.2.patch, 
> HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch, HIVE-17489.1.patch, 
> HIVE-17489.2-branch-2.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17520) Temp table with CTAS tries to create staging directory in the actual database directory

2017-09-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17520:
-


> Temp table with CTAS tries to create staging directory in the actual database 
> directory
> ---
>
> Key: HIVE-17520
> URL: https://issues.apache.org/jira/browse/HIVE-17520
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> User does not have FS permissions on the database directory.
> Note that normal CREATE TEMPORARY TABLE (no CTAS) on that database does work.
> However trying temp table with CTAS fails with the following error:
> {noformat}
> hive> create temporary table jdere_temp as select * from 
> tpch_text_1000.nation;
> FAILED: SemanticException 0:0 Error creating temporary folder on: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpch_text_1000.db.
>  Error encountered near token 'TOK_TMP_FILE'
> {noformat}
> Simple fix would be to set the staging directory to the temp table location 
> for the case of temp tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17495) CachedStore: prewarm improvement (avoid multiple sql calls to read partition column stats), refactoring and caching some aggregate stats

2017-09-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163396#comment-16163396
 ] 

Gopal V commented on HIVE-17495:


bq. So it isn't actually triggered from these tools.

Throwing the CachedStore configs into hive-site.xml does trigger these issues. 
Putting it in hiveserver2-site.xml + hivemetastore-site.xml won't (for obvious 
reasons).

> CachedStore: prewarm improvement (avoid multiple sql calls to read partition 
> column stats), refactoring and caching some aggregate stats
> 
>
> Key: HIVE-17495
> URL: https://issues.apache.org/jira/browse/HIVE-17495
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-17495.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17513) Refactor PathUtils to not contain instance fields

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17513:
--
Status: Patch Available  (was: Open)

> Refactor PathUtils to not contain instance fields
> -
>
> Key: HIVE-17513
> URL: https://issues.apache.org/jira/browse/HIVE-17513
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17513.1.patch
>
>
> This util class should just provide the static helper methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17513) Refactor PathUtils to not contain instance fields

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17513:
--
Attachment: HIVE-17513.1.patch

> Refactor PathUtils to not contain instance fields
> -
>
> Key: HIVE-17513
> URL: https://issues.apache.org/jira/browse/HIVE-17513
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17513.1.patch
>
>
> This util class should just provide the static helper methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17494) Bootstrap REPL DUMP throws exception if a partitioned table is dropped while reading partitions.

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163375#comment-16163375
 ] 

Hive QA commented on HIVE-17494:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886639/HIVE-17494.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11037 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=54)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6786/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6786/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6786/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886639 - PreCommit-HIVE-Build

> Bootstrap REPL DUMP throws exception if a partitioned table is dropped while 
> reading partitions.
> 
>
> Key: HIVE-17494
> URL: https://issues.apache.org/jira/browse/HIVE-17494
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17494.01.patch
>
>
> When a table is dropped between fetching table and fetching partitions, then 
> bootstrap dump throws exception.
> 1. Fetch table names.
> 2. Get table
> 3. Dump table object
> 4. Drop table from another thread.
> 5. Fetch partitions (throws exception from fireReadTablePreEvent)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17513) Refactor PathUtils to not contain instance fields

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17513:
--
Summary: Refactor PathUtils to not contain instance fields  (was: Refactor 
PathUtils to not contain state (instance fields))

> Refactor PathUtils to not contain instance fields
> -
>
> Key: HIVE-17513
> URL: https://issues.apache.org/jira/browse/HIVE-17513
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>
> This util class should just provide the static helper methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17495) CachedStore: prewarm improvement (avoid multiple sql calls to read partition column stats), refactoring and caching some aggregate stats

2017-09-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163371#comment-16163371
 ] 

Sergey Shelukhin commented on HIVE-17495:
-

It is, according also to [~gopalv]... it's triggered in setConf which is a 
generic initialization method

> CachedStore: prewarm improvement (avoid multiple sql calls to read partition 
> column stats), refactoring and caching some aggregate stats
> 
>
> Key: HIVE-17495
> URL: https://issues.apache.org/jira/browse/HIVE-17495
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-17495.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17511) Error while populating orc cache in llap

2017-09-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17511:
---

Assignee: Sergey Shelukhin

> Error while populating orc cache in llap
> 
>
> Key: HIVE-17511
> URL: https://issues.apache.org/jira/browse/HIVE-17511
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Ashutosh Chauhan
>Assignee: Sergey Shelukhin
>
> Observed that while querying an error is thrown while loading cache in llap 
> daemons



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163365#comment-16163365
 ] 

Sergey Shelukhin commented on HIVE-17482:
-

Doesn't really matter... +1 from my side, maybe [~ekoifman] should also take a 
look

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Status: Patch Available  (was: Open)

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Status: Open  (was: Patch Available)

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-12 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163337#comment-16163337
 ] 

Jason Dere commented on HIVE-17482:
---

Attempting to release locks seems to be valid, as this is also done during 
Driver.rollback() which can be called during a failed compilation. If there are 
no locks held the cleanup seems to not do much.
The alternative would be to skip the cleanup object in the case that acquiring 
the lock fails - if you think that is better I can change it to do that.

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17519) Transpose column stats display

2017-09-12 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163336#comment-16163336
 ] 

Vineet Garg commented on HIVE-17519:


I agree. It is pain to read 'desc formatted' output.

> Transpose column stats display
> --
>
> Key: HIVE-17519
> URL: https://issues.apache.org/jira/browse/HIVE-17519
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>
> currently {{describe formatted table1 insert_num}} shows the column 
> informations in a table like format...which is very hard to read - because 
> there are to many columns
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment bitVector   
>   
>  
> insert_numint 
>   
>   
> from deserializer   
> {code}
> I think it would be better to show the same information like this:
> {code}
> col_name  insert_num  
> data_type int 
> min   
> max   
> num_nulls 
> distinct_count
> avg_col_len   
> max_col_len   
> num_trues 
> num_falses
> comment   from deserializer   
> bitVector 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-09-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163326#comment-16163326
 ] 

Ashutosh Chauhan commented on HIVE-13567:
-

[~kgyrtkirk] I had some comments from last version of patch at 
https://reviews.apache.org/r/57614/ Does this new patch addresses those ?
Also can you please create a RB for your change.

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Zoltan Haindrich
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch, HIVE-13567.19.patch, HIVE-13567.20.patch, 
> HIVE-13567.21.patch, HIVE-13567.22.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-17394:
---
Description: 
The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. 
Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable Avro 
records case as checking if an Avro record schema object already exists in the 
cache requires traversing all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO might be to make use of the column TypeInfo which is 
already passed in the worker method.

  was:
The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. 
Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable Avro 
records case as checking if an Avro record schema object already exists in the 
cache requires traversing all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO might be to make use of the column TypeInfo which is 
already passed in the worker method.


> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot 

[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163287#comment-16163287
 ] 

Ratandeep Ratti commented on HIVE-17394:


The patch looks good to me Anthony.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17515:
--
Status: Patch Available  (was: Open)

> Use SHA-256 for GenericUDFMaskHash to improve security
> --
>
> Key: HIVE-17515
> URL: https://issues.apache.org/jira/browse/HIVE-17515
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17515.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17514:
--
Attachment: HIVE-17514.1.patch

> Use SHA-256 for cookie signer to improve security
> -
>
> Key: HIVE-17514
> URL: https://issues.apache.org/jira/browse/HIVE-17514
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17514.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17514:
--
Status: Patch Available  (was: Open)

> Use SHA-256 for cookie signer to improve security
> -
>
> Key: HIVE-17514
> URL: https://issues.apache.org/jira/browse/HIVE-17514
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17514.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17515:
--
Attachment: HIVE-17515.1.patch

> Use SHA-256 for GenericUDFMaskHash to improve security
> --
>
> Key: HIVE-17515
> URL: https://issues.apache.org/jira/browse/HIVE-17515
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17515.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15899) check CTAS over acid table

2017-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163211#comment-16163211
 ] 

Hive QA commented on HIVE-15899:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886563/HIVE-15899.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11041 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion02 
(batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testToAcidConversionMultiBucket 
(batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testUnionRemove (batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6785/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6785/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6785/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886563 - PreCommit-HIVE-Build

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15899.01.patch, HIVE-15899.02.patch, 
> HIVE-15899.03.patch, HIVE-15899.04.patch
>
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17515:
--
Description: See HIVE-17226 for detailed description.

> Use SHA-256 for GenericUDFMaskHash to improve security
> --
>
> Key: HIVE-17515
> URL: https://issues.apache.org/jira/browse/HIVE-17515
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Tao Li
>Assignee: Tao Li
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17514:
--
Description: See HIVE-17226 for detailed description.

> Use SHA-256 for cookie signer to improve security
> -
>
> Key: HIVE-17514
> URL: https://issues.apache.org/jira/browse/HIVE-17514
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-12 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163195#comment-16163195
 ] 

Tao Li commented on HIVE-17496:
---

Uploaded "HIVE-17496.3.patch" to fix test issue.

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-12 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Attachment: HIVE-17496.3.patch

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163129#comment-16163129
 ] 

Gergely Hajós commented on HIVE-17338:
--

Applied the suggested changes. Thank your the review!

> Utilities.get*Tasks multiple methods duplicate code
> ---
>
> Key: HIVE-17338
> URL: https://issues.apache.org/jira/browse/HIVE-17338
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Gergely Hajós
> Attachments: HIVE-17338.1.patch, HIVE-17338.2.patch
>
>
> As discussed in https://github.com/apache/hive/pull/212/files, the 3 
> functions can share a more general function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >