[jira] [Updated] (HIVE-15473) Progress Bar on Beeline client

2017-02-02 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15473:
---
Attachment: HIVE-15473.9.patch

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Attachments: HIVE-15473.2.patch, HIVE-15473.3.patch, 
> HIVE-15473.4.patch, HIVE-15473.5.patch, HIVE-15473.6.patch, 
> HIVE-15473.7.patch, HIVE-15473.8.patch, HIVE-15473.9.patch, 
> screen_shot_beeline.jpg
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15473) Progress Bar on Beeline client

2017-02-02 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15473:
---
Attachment: HIVE-15473.8.patch

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Attachments: HIVE-15473.2.patch, HIVE-15473.3.patch, 
> HIVE-15473.4.patch, HIVE-15473.5.patch, HIVE-15473.6.patch, 
> HIVE-15473.7.patch, HIVE-15473.8.patch, screen_shot_beeline.jpg
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15792) Hive should raise SemanticException when LPAD/RPAD pad character's length is 0

2017-02-02 Thread Nandakumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851180#comment-15851180
 ] 

Nandakumar commented on HIVE-15792:
---

Following are the results from different databases for LPAD/PPAD with null 
{{SELECT LPAD('x', 5, '')}}

||Database|| Output||
|Oracle | NULL |
| MySQL | NULL |
| Postgres |  |

> Hive should raise SemanticException when LPAD/RPAD pad character's length is 0
> --
>
> Key: HIVE-15792
> URL: https://issues.apache.org/jira/browse/HIVE-15792
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Chovan
>Assignee: Nandakumar
>Priority: Minor
>
> For example SELECT LPAD('A', 2, ''); will cause an infinite loop and the 
> running query will hang without any error.
> It would be great if this could be prevented by checking the pad character's 
> length and if it's 0 then throw a SemanticException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851161#comment-15851161
 ] 

Hive QA commented on HIVE-15797:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850743/HIVE-15797.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=122)

[auto_sortmerge_join_13.q,join4.q,join35.q,udf_percentile.q,join_reorder3.q,subquery_in.q,auto_join19.q,stats14.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,vector_groupby_3.q,vectorized_ptf.q,auto_join2.q,groupby1_map_skew.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cp_sel] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_stats] 
(batchId=76)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby]
 (batchId=154)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3345/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3345/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3345/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850743 - PreCommit-HIVE-Build

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.01.patch, HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Attachment: HIVE-15160.05.patch

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851121#comment-15851121
 ] 

Lefty Leverenz commented on HIVE-15784:
---

Looks good, thanks Matt.

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851114#comment-15851114
 ] 

Hive QA commented on HIVE-15784:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850740/HIVE-15784.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 34 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[structin] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_orderby_5] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=61)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_vector_dynpart_hashjoin_2]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby4]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby6]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_orderby_5]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join1]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join2]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join4]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_join46]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join3]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join4]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] 
(batchId=112)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3344/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3344/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3344/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 34 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850740 - PreCommit-HIVE-Build

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, 

[jira] [Commented] (HIVE-15728) Empty table returns result when querying partitioned field with a function

2017-02-02 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851092#comment-15851092
 ] 

Fei Hui commented on HIVE-15728:


hi [~therandomsuit]
I have tested, and thereis no problem in hive lastest version.

Total MapReduce CPU Time Spent: 1 seconds 620 msec
OK
NULL

> Empty table returns result when querying partitioned field with a function
> --
>
> Key: HIVE-15728
> URL: https://issues.apache.org/jira/browse/HIVE-15728
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Jared Leable
>
> If a partitioned table contained data and is then truncated a query will 
> still return a result when using a function on a partitioned field.
> create table test1 (
>   field string
> )
> partitioned by(dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into test1 
> partition(dt)
> select 'a','2017-01-01';
> -- to view inserted records
> select * from test1;
> -- to delete all records from the table
> truncate table test1;
> -- to view 0 records in the table
> select * from test1;
> -- still returns a result of '2017-01-01'
> select max(dt) from test1;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15728) Empty table returns result when querying partitioned field with a function

2017-02-02 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui reassigned HIVE-15728:
--

Assignee: Fei Hui

> Empty table returns result when querying partitioned field with a function
> --
>
> Key: HIVE-15728
> URL: https://issues.apache.org/jira/browse/HIVE-15728
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Jared Leable
>Assignee: Fei Hui
>
> If a partitioned table contained data and is then truncated a query will 
> still return a result when using a function on a partitioned field.
> create table test1 (
>   field string
> )
> partitioned by(dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into test1 
> partition(dt)
> select 'a','2017-01-01';
> -- to view inserted records
> select * from test1;
> -- to delete all records from the table
> truncate table test1;
> -- to view 0 records in the table
> select * from test1;
> -- still returns a result of '2017-01-01'
> select max(dt) from test1;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15798) LLAP run.sh should use stop --force

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851076#comment-15851076
 ] 

Hive QA commented on HIVE-15798:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850738/HIVE-15798.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=282)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3343/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3343/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3343/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850738 - PreCommit-HIVE-Build

> LLAP run.sh should use stop --force
> ---
>
> Key: HIVE-15798
> URL: https://issues.apache.org/jira/browse/HIVE-15798
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15798.patch
>
>
> It's both faster, and avoids slider issues when the app survives across 
> kerberization and cannot be stopped by regular stop, which assumes it should 
> have some token or other because the cluster is now secure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-02 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: HIVE-15796.wip.1.patch

Set default to false for testing.

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.wip.1.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-15800) hive --database dbName throws NPE

2017-02-02 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-15800.
---
Resolution: Invalid

Fixed by a full rebuild.

> hive --database dbName throws NPE
> -
>
> Key: HIVE-15800
> URL: https://issues.apache.org/jira/browse/HIVE-15800
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> branch: master
> {noformat}
> 2017-02-02T20:59:28,610 ERROR [be61c23b-5435-4287-8f3e-d894d81ac871 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.estimateRowSizeFromSchema(StatsUtils.java:538)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getNumRows(StatsUtils.java:178)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:202)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:152)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:140)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:97)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:302)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:96)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11136)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:275)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:513)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1305)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1445)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1225)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1215)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processSelectDatabase(CliDriver.java:547)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2017-02-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851055#comment-15851055
 ] 

Siddharth Seth commented on HIVE-11687:
---

The patch essentially makes the waitQueue accept additional fragments if other 
fragments are about to complete, or have not yet been scheduled (total allowed 
capacity = executors + wait queue).
Given that other threads are in the process of terminating, we could setup 
additional threads to start processing the new work.

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-11687.WIP.txt
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-11687:
--
Fix Version/s: (was: llap)
Affects Version/s: (was: llap)
 Target Version/s: 2.2.0
   Status: Patch Available  (was: Open)

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-11687.WIP.txt
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-11687:
--
Attachment: HIVE-11687.WIP.txt

[~rajesh.balamohan] reported lots of KILLED fragments for a non-concurrent run. 
(Double the number of fragments at times)

This is for the reported case, as well as another race where fragment 
completions reported to the AM can cause the AM to schedule another fragment on 
the same node before the thread running the previous fragment falls off.

WIP patch. Will add a few tests and try getting some numbers on the delays in 
reporting to the AM and the executor actually becoming available.

Tested for non-concurrent jobs.

[~prasanth_j], [~rajesh.balamohan] - could you please take a look.

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: llap
>
> Attachments: HIVE-11687.WIP.txt
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-11687:
-

Assignee: Siddharth Seth  (was: Prasanth Jayachandran)

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: llap
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-02-02 Thread Mike Fagan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Fagan updated HIVE-15795:
--
Status: Patch Available  (was: Open)

Adds support for index tables for Hive-Accumulo queries. Default Index entries 
follow the presto format

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Attachments: HIVE-15795.1.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Attachment: HIVE-11394.0991.patch

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.0991.patch, 
> HIVE-11394.099.patch, HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> 

[jira] [Updated] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-02-02 Thread Mike Fagan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Fagan updated HIVE-15795:
--
Attachment: HIVE-15795.1.patch

First patch for index table support 

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Attachments: HIVE-15795.1.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: Patch Available  (was: In Progress)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.0991.patch, 
> HIVE-11394.099.patch, HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> 

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: In Progress  (was: Patch Available)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.099.patch, 
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> 

[jira] [Commented] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851034#comment-15851034
 ] 

Hive QA commented on HIVE-15796:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850735/HIVE-15796.wip.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=78)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=109)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cross_product_check_2]
 (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_mapjoin] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multiinsert]
 (batchId=131)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3342/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3342/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3342/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850735 - PreCommit-HIVE-Build

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15801) Some logging improvements in LlapTaskScheduler

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15801:
--
Attachment: HIVE-15801.01.patch

Patch fixes the initial log line - which was misleading. Removes 
memoryPerExecutor etc being read from configs since that is read from the 
registry.

Moves some log lines to DEBUG level.

Logs dagStats every 10s while a dag is running.
Logs host status occasionally.

One non-logging fix: Contains a fix to propagate the wait queue size to the AM.

cc [~rajesh.balamohan], [~prasanth_j] for review.

> Some logging improvements in LlapTaskScheduler
> --
>
> Key: HIVE-15801
> URL: https://issues.apache.org/jira/browse/HIVE-15801
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15801.01.patch
>
>
> Excessive logging in some places. Not enough otherwise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15801) Some logging improvements in LlapTaskScheduler

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15801:
--
Target Version/s: 2.2.0

> Some logging improvements in LlapTaskScheduler
> --
>
> Key: HIVE-15801
> URL: https://issues.apache.org/jira/browse/HIVE-15801
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15801.01.patch
>
>
> Excessive logging in some places. Not enough otherwise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15801) Some logging improvements in LlapTaskScheduler

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15801:
--
Status: Patch Available  (was: Open)

> Some logging improvements in LlapTaskScheduler
> --
>
> Key: HIVE-15801
> URL: https://issues.apache.org/jira/browse/HIVE-15801
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15801.01.patch
>
>
> Excessive logging in some places. Not enough otherwise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15801) Some logging improvements in LlapTaskScheduler

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15801:
--
Description: Excessive logging in some places. Not enough otherwise.

> Some logging improvements in LlapTaskScheduler
> --
>
> Key: HIVE-15801
> URL: https://issues.apache.org/jira/browse/HIVE-15801
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>
> Excessive logging in some places. Not enough otherwise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15801) Some logging improvements in LlapTaskScheduler

2017-02-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-15801:
-


> Some logging improvements in LlapTaskScheduler
> --
>
> Key: HIVE-15801
> URL: https://issues.apache.org/jira/browse/HIVE-15801
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851014#comment-15851014
 ] 

Alexander Pivovarov commented on HIVE-15746:


str_to_map documentation was fixed as well (LanguageManual UDF wiki) 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15746) Fix default delimiter2 in str_to_map UDF or in method description

2017-02-02 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-15746:
---
Affects Version/s: (was: 2.2.0)
   2.1.1
Fix Version/s: 2.2.0

> Fix default delimiter2 in str_to_map UDF or in method description
> -
>
> Key: HIVE-15746
> URL: https://issues.apache.org/jira/browse/HIVE-15746
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15746.1.patch
>
>
> According to UDF wiki and to GenericUDFStringToMap.java class comments 
> default delimiter 2 should be '='.
> But in the code default_del2 = ":"
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java#L53
> We need to fix code or fix the method description and UDF wiki
> Let me know what you think?
> {code}
> str_to_map("a=1,b=2")
> vs
> str_to_map("a:1,b:2")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15779) LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of same priority

2017-02-02 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850999#comment-15850999
 ] 

Prasanth Jayachandran commented on HIVE-15779:
--

There is a test failure in TestTaskExecutorService

> LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of 
> same priority
> ---
>
> Key: HIVE-15779
> URL: https://issues.apache.org/jira/browse/HIVE-15779
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.2.0
>
> Attachments: HIVE-15779.1.patch, HIVE-15779.2.patch
>
>
> Observed cases, where in tasks within same vertex were competing with each 
> and getting killed
> {noformat}
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003877_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003179_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003877_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_002959_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003832_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002959_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003723_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003254_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003723_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003560_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003076_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003560_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003775_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004011_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003775_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003842_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004045_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003842_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003953_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003915_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003953_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003819_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003919_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003819_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_002074_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003790_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002074_8 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003670_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003736_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003670_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003153_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003877_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003153_8 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003328_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003775_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003328_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003817_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003842_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003817_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_004065_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003723_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_004065_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003902_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003560_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003902_7 because 
> of lower priority
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851000#comment-15851000
 ] 

Lefty Leverenz commented on HIVE-15774:
---

Thanks for the docs, [~wzheng].  Here's the link for 
*hive.txn.strict.locking.mode*:

* [hive.txn.strict.locking.mode | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.strict.locking.mode]

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850995#comment-15850995
 ] 

Hive QA commented on HIVE-15160:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850727/HIVE-15160.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11026 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_limit]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query70] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_limit] 
(batchId=133)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3341/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3341/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3341/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850727 - PreCommit-HIVE-Build

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15779) LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of same priority

2017-02-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15779:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~sseth], [~prasanth_j], [~sershe]. Committed to master.

> LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of 
> same priority
> ---
>
> Key: HIVE-15779
> URL: https://issues.apache.org/jira/browse/HIVE-15779
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.2.0
>
> Attachments: HIVE-15779.1.patch, HIVE-15779.2.patch
>
>
> Observed cases, where in tasks within same vertex were competing with each 
> and getting killed
> {noformat}
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003877_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003179_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003877_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_002959_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003832_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002959_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003723_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003254_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003723_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003560_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003076_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003560_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003775_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004011_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003775_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003842_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004045_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003842_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003953_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003915_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003953_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003819_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003919_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003819_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_002074_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003790_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002074_8 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003670_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003736_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003670_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003153_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003877_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003153_8 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003328_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003775_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003328_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003817_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003842_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003817_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_004065_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003723_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_004065_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003902_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003560_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003902_7 because 
> of lower priority
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15779) LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of same priority

2017-02-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850991#comment-15850991
 ] 

Siddharth Seth commented on HIVE-15779:
---

+1 for the new patch.

> LLAP: WaitQueue comparators should return 0 when tasks of the same DAG are of 
> same priority
> ---
>
> Key: HIVE-15779
> URL: https://issues.apache.org/jira/browse/HIVE-15779
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-15779.1.patch, HIVE-15779.2.patch
>
>
> Observed cases, where in tasks within same vertex were competing with each 
> and getting killed
> {noformat}
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003877_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003179_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003877_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_002959_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003832_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002959_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003723_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003254_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003723_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003560_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003076_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003560_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003775_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004011_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003775_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003842_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_004045_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003842_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003953_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003915_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003953_7 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003819_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003919_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003819_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_002074_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003790_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_002074_8 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_003670_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003736_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003670_7 because 
> of lower priority
> [IPC Server handler 1 on 44598 (1484282558103_4855_1_00_003153_8)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003877_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003153_8 because 
> of lower priority
> [IPC Server handler 0 on 44598 (1484282558103_4855_1_00_003328_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003775_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003328_7 because 
> of lower priority
> [IPC Server handler 4 on 44598 (1484282558103_4855_1_00_003817_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003842_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003817_7 because 
> of lower priority
> [IPC Server handler 2 on 44598 (1484282558103_4855_1_00_004065_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003723_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_004065_7 because 
> of lower priority
> [IPC Server handler 3 on 44598 (1484282558103_4855_1_00_003902_7)] 
> impl.TaskExecutorService: attempt_1484282558103_4855_1_00_003560_7 evicted 
> from wait queue in favor of attempt_1484282558103_4855_1_00_003902_7 because 
> of lower priority
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15509) Add back the script + transform tests to minitez

2017-02-02 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15509:
-
Attachment: HIVE-15509.3.patch

> Add back the script + transform tests to minitez
> 
>
> Key: HIVE-15509
> URL: https://issues.apache.org/jira/browse/HIVE-15509
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15509.1.patch, HIVE-15509.1.patch, 
> HIVE-15509.2.patch, HIVE-15509.3.patch
>
>
> Script operator cannot run in minillap and so was removed from the minillap 
> test suite. But tez supports script + transform. Add the removed tests back 
> to minitez test suite. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15777) propagate LLAP app ID to ATS and log it

2017-02-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850978#comment-15850978
 ] 

Sergey Shelukhin commented on HIVE-15777:
-

Will test on cluster and commit tomorrow.

> propagate LLAP app ID to ATS and log it 
> 
>
> Key: HIVE-15777
> URL: https://issues.apache.org/jira/browse/HIVE-15777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15777.01.patch, HIVE-15777.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: HIVE-15672.05.patch

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: (was: HIVE-15672.05.patch)

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: HIVE-15672.05.patch

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850957#comment-15850957
 ] 

Lefty Leverenz commented on HIVE-15765:
---

Thanks [~hagleitn], that's a good first draft for a Comments section in the DDL 
doc.

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15799) LLAP: rename VertorDeserializeOrcWriter

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15799:
---


> LLAP: rename VertorDeserializeOrcWriter
> ---
>
> Key: HIVE-15799
> URL: https://issues.apache.org/jira/browse/HIVE-15799
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> As convenient as it is to grep for, based on continuous RB comments I am not 
> sure the world is yet ready for vertorized execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15797:

Attachment: HIVE-15797.01.patch

Rebased, fixed the typo. Hardcoding constants offends my engineering 
sensibilities :)

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.01.patch, HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15675) ql.hooks.TestQueryHooks failure

2017-02-02 Thread Jun He (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850946#comment-15850946
 ] 

Jun He commented on HIVE-15675:
---

The failures are not related.
Thanks for [~pvary] for reviewing.
[~aihuaxu], could you pls help to commit to master? Thanks.

> ql.hooks.TestQueryHooks failure
> ---
>
> Key: HIVE-15675
> URL: https://issues.apache.org/jira/browse/HIVE-15675
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jun He
>Assignee: Jun He
> Attachments: HIVE-15675.0.patch, HIVE-15675.1.patch, 
> HIVE-15675.2.patch
>
>
> ql.parse.TestQBCompact creates table "foo" in initialization but doesn't 
> clean it after its testcases are finished. This will cause 
> ql.hooks.TestQueryHooks::testCompileFailure failed as testCompileFailure 
> expects that "foo" doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850941#comment-15850941
 ] 

Gunther Hagleitner commented on HIVE-15765:
---

[~leftylev] the COMMENT keyword is used to attach comments (explanations) to 
database objects, I think. So you can add a comment to a table or column to 
explain what it does. This particular jira implements a sql standard way of 
commenting your SQL statements in general. The comment keyword can only happen 
in certain places in DML/DDL statements, but we now have 2 ways of commenting 
any piece of SQL you write.

Line comments (already in the code before this jira):

Example:
select
bla -- bla bla bla
from foo;

Basically everything from "--" to the end of the line is considered a comment 
and ignored by the compiler.

Bracketed comments (new addition):

Example:
select bla /* bla bla bla */ from bla;

Everything between /* and */ is considered a comment and ignored by the 
compiler. (can be multi line or part of a line, etc).

There's an exception. "/*+" marks the beginning of a compiler hint and can be 
used to send "hints" to the compiler.

Example:
select /*+ MAPJOIN(a) */ * from a join b on (a.key = b.key)

Does that make it clearer?

/* also i've updated the fix version */

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850942#comment-15850942
 ] 

Hive QA commented on HIVE-15797:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850726/HIVE-15797.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3340/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3340/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3340/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-02-03 02:18:54.724
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3340/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-02-03 02:18:54.727
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at feaa65f HIVE-15765: Support bracketed comments (Gunther 
Hagleitner, reviewed by Pengcheng Xiong)
+ git clean -f -d
Removing 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/VectorizerReason.java
Removing ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g.orig
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorExplainVectorization.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/VectorAppMasterEventDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFileSinkDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFilterDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorLimitDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinInfo.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSMBJoinDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSelectDesc.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSparkHashTableSinkDesc.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSparkPartitionPruningSinkDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTableScanDesc.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/VectorizationCondition.java
Removing ql/src/test/results/clientpositive/llap/vector_const.q.out
Removing ql/src/test/results/clientpositive/llap/vector_empty_where.q.out
Removing ql/src/test/results/clientpositive/llap/vector_join.q.out
Removing 
ql/src/test/results/clientpositive/llap/vector_non_constant_in_expr.q.out
Removing 
ql/src/test/results/clientpositive/llap/vector_orc_string_reader_empty_dict.q.out
Removing ql/src/test/results/clientpositive/llap/vector_string_decimal.q.out
Removing ql/src/test/results/clientpositive/llap/vector_tablesample_rows.q.out
Removing ql/src/test/results/clientpositive/llap/vector_udf2.q.out
Removing 
ql/src/test/results/clientpositive/llap/vectorization_offset_limit.q.out
Removing ql/src/test/results/clientpositive/llap/vectorized_mapjoin2.q.out
Removing ql/src/test/results/clientpositive/tez/vector_acid3.q.out
Removing ql/src/test/results/clientpositive/tez/vector_adaptor_usage_mode.q.out
Removing ql/src/test/results/clientpositive/tez/vector_aggregate_9.q.out
Removing 
ql/src/test/results/clientpositive/tez/vector_aggregate_without_gby.q.out
Removing ql/src/test/results/clientpositive/tez/vector_auto_smb_mapjoin_14.q.out
Removing ql/src/test/results/clientpositive/tez/vector_between_columns.q.out
Removing ql/src/test/results/clientpositive/tez/vector_between_in.q.out
Removing ql/src/test/results/clientpositive/tez/vector_binary_join_groupby.q.out
Removing ql/src/test/results/clientpositive/tez/vector_bround.q.out
Removing ql/src/test/results/clientpositive/tez/vector_bucket.q.out
Removing ql/src/test/results/clientpositive/tez/vector_cast_constant.q.out
Removing ql/src/test/results/clientpositive/tez/vector_char_2.q.out
Removing ql/src/test/results/clientpositive/tez/vector_char_4.q.out
Removing ql/src/test/results/clientpositive/tez/vector_char_cast.q.out
Removing 

[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850939#comment-15850939
 ] 

Hive QA commented on HIVE-11394:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850722/HIVE-11394.099.patch

{color:green}SUCCESS:{color} +1 due to 161 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=42)

[orc_llap.q,load_local_dir_test.q,alter_db_owner.q,auto_sortmerge_join_1.q,udf_isnotnull.q,topn.q,alter_concatenate_indexed_table.q,partition_wise_fileformat7.q,escape_sortby1.q,vector_struct_in.q,list_bucket_query_multiskew_1.q,current_date_timestamp.q,vectorized_join46.q,cbo_rp_simple_select.q,multiMapJoin1.q,reduceSinkDeDuplication_pRS_key_empty.q,except_all.q,vector_char_simple.q,index_auto.q,drop_partitions_filter4.q,type_widening.q,statsfs.q,parquet_thrift_array_of_primitives.q,groupby_sort_5.q,leftsemijoin.q,join_on_varchar.q,rcfile_default_format.q,special_character_in_tabnames_1.q,authorization_6.q,udf_bin.q]
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp_funcs]
 (batchId=28)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_div0]
 (batchId=94)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit]
 (batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testExprNodeBetweenWithDynamicValue
 (batchId=259)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3339/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3339/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3339/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850722 - PreCommit-HIVE-Build

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.099.patch, 
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: 

[jira] [Commented] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850936#comment-15850936
 ] 

Lefty Leverenz commented on HIVE-15797:
---

Typo alert:  "(deprecacted)" -> "(deprecated)" in the description for 
hive.groupby.orderby.position.alias.

Also, why does the deprecation advice for hive.groupby.orderby.position.alias 
refer to HIVE_ORDERBY_POSITION_ALIAS.varname and 
HIVE_GROUPBY_POSITION_ALIAS.varname rather than simply 
hive.orderby.position.alias and hive.groupby.position.alias?  Do you expect 
those varnames to change?

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-15765:
--
Fix Version/s: 2.2.0

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850919#comment-15850919
 ] 

Matt McCline commented on HIVE-15784:
-

[~leftylev] Thank you for noticing the out-of-date comment.

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: Patch Available  (was: In Progress)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Attachment: HIVE-15784.02.patch

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: In Progress  (was: Patch Available)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850909#comment-15850909
 ] 

Lefty Leverenz commented on HIVE-15765:
---

Doc note:  This should be documented in the wiki, with version information, but 
I'm not sure where it belongs.

How do these comments compare with the COMMENT keyword -- are they 
interchangeable or is /*blahblah */ more like a code comment?

The COMMENT keyword is documented in these places:
* Tutorial
** [Creating Tables | 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-CreatingTables]
** [Altering Tables | 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-AlteringTables]
** [Loading Data | 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-LoadingData]
* DDL
** [Create Database | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateDatabase]
** [Create Table | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable]
** [Partitioned Tables (examples only) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables]
** [External Tables (example only) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables]
** [Bucketed Sorted Tables (example only) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables]
** [Change Column Name/Type/Position/Comment | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment]
** [Add/Replace Columns | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns]
** [Create View | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateView]
** [Create Index | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateIndex]
* DML and Select docs:  no comment

Other comment doc:
* [DDL -- Alter Table Properties -- Alter Table Comment | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableComment]

Perhaps we need a section on comments in each of the major SQL docs (DDL, DML, 
and Select).


/* Fix Version/s needs to be updated. */

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC2.2
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15798) LLAP run.sh should use stop --force

2017-02-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850904#comment-15850904
 ] 

Sergey Shelukhin edited comment on HIVE-15798 at 2/3/17 1:54 AM:
-

[~gopalv] [~sseth] one-line patch. In the API patch, I already set the force 
flag I think (will double check and fix on an iteration/commit if not).


was (Author: sershe):
[~gopalv] [~sseth] one-line patch

> LLAP run.sh should use stop --force
> ---
>
> Key: HIVE-15798
> URL: https://issues.apache.org/jira/browse/HIVE-15798
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15798.patch
>
>
> It's both faster, and avoids slider issues when the app survives across 
> kerberization and cannot be stopped by regular stop, which assumes it should 
> have some token or other because the cluster is now secure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15798) LLAP run.sh should use stop --force

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15798:

Status: Patch Available  (was: Open)

> LLAP run.sh should use stop --force
> ---
>
> Key: HIVE-15798
> URL: https://issues.apache.org/jira/browse/HIVE-15798
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15798.patch
>
>
> It's both faster, and avoids slider issues when the app survives across 
> kerberization and cannot be stopped by regular stop, which assumes it should 
> have some token or other because the cluster is now secure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15798) LLAP run.sh should use stop --force

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15798:

Attachment: HIVE-15798.patch

[~gopalv] [~sseth] one-line patch

> LLAP run.sh should use stop --force
> ---
>
> Key: HIVE-15798
> URL: https://issues.apache.org/jira/browse/HIVE-15798
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15798.patch
>
>
> It's both faster, and avoids slider issues when the app survives across 
> kerberization and cannot be stopped by regular stop, which assumes it should 
> have some token or other because the cluster is now secure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15798) LLAP run.sh should use stop --force

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15798:
---


> LLAP run.sh should use stop --force
> ---
>
> Key: HIVE-15798
> URL: https://issues.apache.org/jira/browse/HIVE-15798
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15798.patch
>
>
> It's both faster, and avoids slider issues when the app survives across 
> kerberization and cannot be stopped by regular stop, which assumes it should 
> have some token or other because the cluster is now secure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-02-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850899#comment-15850899
 ] 

Matt McCline commented on HIVE-15784:
-

Query result differences:

vector_binary_join_groupby.q
vector_orderby_5.q
tez_join_hash.q
tez_vector_dynpart_hashjoin_2.q
vector_join30.q
vector_outer_join1.q
vector_outer_join2.q
vector_outer_join4.q
vectorized_join46.q


> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15765:
--
Labels: TODOC2.2  (was: )

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC2.2
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-02 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Status: Patch Available  (was: Open)

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-02 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: HIVE-15796.wip.patch

Attaching WIP patch to go through all the tests.

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850876#comment-15850876
 ] 

Ashutosh Chauhan commented on HIVE-15797:
-

LGTM. My guess it should result in update of few golden files.

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-02 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850873#comment-15850873
 ] 

Prasanth Jayachandran commented on HIVE-15672:
--

Will have to look at it again. Don't have the full context of the text cache 
yet.

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15509) Add back the script + transform tests to minitez

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850872#comment-15850872
 ] 

Hive QA commented on HIVE-15509:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850576/HIVE-15509.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3338/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3338/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3338/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850576 - PreCommit-HIVE-Build

> Add back the script + transform tests to minitez
> 
>
> Key: HIVE-15509
> URL: https://issues.apache.org/jira/browse/HIVE-15509
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15509.1.patch, HIVE-15509.1.patch, 
> HIVE-15509.2.patch
>
>
> Script operator cannot run in minillap and so was removed from the minillap 
> test suite. But tez supports script + transform. Add the removed tests back 
> to minitez test suite. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14445) upgrade maven surefire to 2.19.1

2017-02-02 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850827#comment-15850827
 ] 

Wei Zheng commented on HIVE-14445:
--

I have exactly the same problem that [~djaiswal] is facing.

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15777) propagate LLAP app ID to ATS and log it

2017-02-02 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850824#comment-15850824
 ] 

Jason Dere commented on HIVE-15777:
---

+1

> propagate LLAP app ID to ATS and log it 
> 
>
> Key: HIVE-15777
> URL: https://issues.apache.org/jira/browse/HIVE-15777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15777.01.patch, HIVE-15777.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-15765:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

test failures are unrelated. committed to master. thank you [~pxiong] for the 
review!

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13667) Improve performance for ServiceInstanceSet.getByHost

2017-02-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-13667:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~sseth], [~prasanth_j], [~sershe]. Committed to master.

> Improve performance for ServiceInstanceSet.getByHost
> 
>
> Key: HIVE-13667
> URL: https://issues.apache.org/jira/browse/HIVE-13667
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Fix For: 2.2.0
>
> Attachments: HIVE-13667.1.patch, HIVE-13667.2.patch, 
> HIVE-13667.3.patch, HIVE-13667.4.patch, HIVE-13667.5.patch
>
>
> ServiceInstanceSet.getByHost is used for scheduling local tasks as well as 
> constructing the log URL.
> It ends up traversing all hosts on each lookup. This should be avoided.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Attachment: HIVE-15160.04.patch

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15797:

Status: Patch Available  (was: Open)

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850789#comment-15850789
 ] 

Hive QA commented on HIVE-14007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850678/HIVE-14007.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 146 failed/errored test(s), 10224 tests 
executed
*Failed tests:*
{noformat}
TestBitFieldReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestBitPack - did not produce a TEST-*.xml file (likely timed out) (batchId=238)
TestColumnStatistics - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestColumnStatisticsImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestDataReaderProperties - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestDynamicArray - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestInStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestIntegerCompressionReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=237)
TestJsonFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestMemoryManager - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestNewIntegerEncoding - did not produce a TEST-*.xml file (likely timed out) 
(batchId=239)
TestOrcNullOptimization - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOrcTimezone1 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOrcTimezone2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOrcTimezone3 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOrcWideTable - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestOutStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestRLEv2 - did not produce a TEST-*.xml file (likely timed out) (batchId=237)
TestReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestRecordReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestRunLengthByteReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRunLengthIntegerReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=238)
TestSchemaEvolution - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestSerializationUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestStreamName - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestStringDictionary - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestStringRedBlackTree - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestTypeDescription - did not produce a TEST-*.xml file (likely timed out) 
(batchId=239)
TestUnrolledBitPack - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestVectorOrcFile - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestZlib - did not produce a TEST-*.xml file (likely timed out) (batchId=238)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.llap.cache.TestIncrementalObjectSizeEstimator.testMetadata
 (batchId=282)
org.apache.hadoop.hive.llap.cache.TestOrcMetadataCache.testGetPut (batchId=282)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeDeleteUpdate (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeUpdateDelete (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands2.testACIDwithSchemaEvolutionAndCompaction
 (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testAcidWithSchemaEvolution 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge2 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testInsertOverwriteWithSelfJoin 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge2 (batchId=263)

[jira] [Updated] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15797:

Attachment: HIVE-15797.patch

[~ashutoshc] can you take a look? Thanks

> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15797.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15797) separate the configs for gby and oby position alias usage

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15797:
---


> separate the configs for gby and oby position alias usage
> -
>
> Key: HIVE-15797
> URL: https://issues.apache.org/jira/browse/HIVE-15797
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850783#comment-15850783
 ] 

Ashutosh Chauhan commented on HIVE-15388:
-

This brings in two changes:
* Parenthesis are now mandatory for expressions in predicate. select 1 = 2 IN 
(true, false) needs to be written as select (1=2) in (true, false)
* For interval, only constant literals are supported, instead of expressions. 
select date '2012-01-01' - interval (-dt*dt) day wont be supported only date 
'2012-01-01' - interval 2 day

I think its ok to bring in these changes since first is not supported syntax on 
postgres and oracle, so its likely non-standard to begin with. Second one is 
not shipped in any release, so we can chose to revert that now. 
[~hagleitn] thoughts?

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>   

[jira] [Assigned] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-02 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-15796:
---


> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Attachment: HIVE-11394.099.patch

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.099.patch, 
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> 

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: In Progress  (was: Patch Available)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.099.patch, 
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> 

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2017-02-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: Patch Available  (was: In Progress)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.091.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, 
> HIVE-11394.094.patch, HIVE-11394.095.patch, HIVE-11394.096.patch, 
> HIVE-11394.097.patch, HIVE-11394.098.patch, HIVE-11394.099.patch, 
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> ...
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: alltypesorc
>   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Group By Operator
>   keys: cint (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> 

[jira] [Updated] (HIVE-15688) LlapServiceDriver - an option to start the cluster immediately

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15688:

Summary: LlapServiceDriver - an option to start the cluster immediately  
(was: LlapServiceDriver - an option to get rid of run.sh)

> LlapServiceDriver - an option to start the cluster immediately
> --
>
> Key: HIVE-15688
> URL: https://issues.apache.org/jira/browse/HIVE-15688
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15688.01.patch, HIVE-15688.patch
>
>
> run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
> connections to RM and other crap, for   2-5sec. of overhead per call, 
> depending on the machine/cluster.
> What we need is a mode for llapservicedriver that would not generate run.sh, 
> but would rather run the cluster immediately by calling the corresponding 4 
> slider APIs. Should probably be the default, too. For compat with scripts we 
> might generate blank run.sh for now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15688) LlapServiceDriver - an option to get rid of run.sh

2017-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15688:

Attachment: HIVE-15688.01.patch

Changed the default

> LlapServiceDriver - an option to get rid of run.sh
> --
>
> Key: HIVE-15688
> URL: https://issues.apache.org/jira/browse/HIVE-15688
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15688.01.patch, HIVE-15688.patch
>
>
> run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
> connections to RM and other crap, for   2-5sec. of overhead per call, 
> depending on the machine/cluster.
> What we need is a mode for llapservicedriver that would not generate run.sh, 
> but would rather run the cluster immediately by calling the corresponding 4 
> slider APIs. Should probably be the default, too. For compat with scripts we 
> might generate blank run.sh for now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.11

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850718#comment-15850718
 ] 

Hive QA commented on HIVE-15708:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850667/HIVE-15708.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 90 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ambiguitycheck] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_timestamp] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_on_constant] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_outer_join_ppr] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_udaf_percentile_approx_23]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constantfolding] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog2] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_2] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fouter_join_ppr] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_unused] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_merging] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[louter_join_ppr] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoins] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ops_comparison] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_char] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[outer_join_ppr] 
(batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_date] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_timestamp] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_varchar] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_check] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_in_plan] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_outer_join1] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[router_join_ppr] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp] (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_comparison2] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[type_conversions_1] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_percentile_approx_23]
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf3] (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_reflect2] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_cast] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapjoin] 
(batchId=35)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)

[jira] [Comment Edited] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850605#comment-15850605
 ] 

Wei Zheng edited comment on HIVE-15774 at 2/2/17 10:40 PM:
---

Wiki doc updated:
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Transactions
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions


was (Author: wzheng):
Wiki doc updated: 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Transactions

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850615#comment-15850615
 ] 

Hive QA commented on HIVE-15388:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850665/HIVE-15388.05.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11009 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.ql.parse.TestQBSubQuery.testExtractSubQueries 
(batchId=258)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3335/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3335/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3335/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850665 - PreCommit-HIVE-Build

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick 

[jira] [Updated] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15774:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Wiki doc updated: 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Transactions

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850590#comment-15850590
 ] 

Wei Zheng commented on HIVE-15774:
--

Committed to master

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850583#comment-15850583
 ] 

Wei Zheng commented on HIVE-15774:
--

Test failures not related

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850532#comment-15850532
 ] 

Hive QA commented on HIVE-15774:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850662/HIVE-15774.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=78)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3334/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3334/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3334/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850662 - PreCommit-HIVE-Build

> Ensure DbLockManager backward compatibility for non-ACID resources
> --
>
> Key: HIVE-15774
> URL: https://issues.apache.org/jira/browse/HIVE-15774
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15774.1.patch, HIVE-15774.2.patch, 
> HIVE-15774.3.patch
>
>
> In pre-ACID days, users perform operations such as INSERT with either 
> ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
> designed to take advantage of no locking and they take care of the control of 
> concurrency, this works well with good performance.
> With ACID, if users enable transactions (i.e. using DbTxnManager & 
> DbLockManager), then for all the operations, different types of locks will be 
> acquired accordingly by DbLockManager, even for non-ACID resources. This may 
> impact the performance of some workflows designed for pre-ACID use cases.
> A viable solution would be to differentiate the locking mode for ACID and 
> non-ACID resources, so that DbLockManager will continue its current behavior 
> for ACID tables, but will be able to acquire a less strict lock type for 
> non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15700) BytesColumnVector can get stuck trying to resize byte buffer

2017-02-02 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15700:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> BytesColumnVector can get stuck trying to resize byte buffer
> 
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 2.2.0
>
> Attachments: HIVE-15700.1.patch, HIVE-15700.2.patch, 
> HIVE-15700.3.patch, HIVE-15700.4.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck 
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
>  @bci=22, line=245 (Compiled frame; information may be imprecise)
>  - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int, 
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int, int, boolean) @bci=536, line=442 (Compiled frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int) @bci=110, line=761 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
>  java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector() 
> @bci=119, line=388 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8, 
> line=239 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124, 
> line=319 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
>  java.util.Map) @bci=30, line=185 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
> java.util.Map) @bci=159, line=168 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, 
> line=370 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73 
> (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61 
> (Interpreted frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1724 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38, 
> line=61 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1, 
> line=37 (Interpreted frame)
>  - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted 
> frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
> The reducer's input was 167 9MB binary values coming from the previous map 
> job. Per [~gopalv] the BytesColumnVector is stuck trying to reallocate/copy 
> all of these values into the same memory buffer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850480#comment-15850480
 ] 

Pengcheng Xiong commented on HIVE-15388:


Sure here it is https://reviews.apache.org/r/56240/

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR 

[jira] [Commented] (HIVE-15765) Support bracketed comments

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850436#comment-15850436
 ] 

Hive QA commented on HIVE-15765:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850660/HIVE-15765.3.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testAbandonedSessionMetrics
 (batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build//testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build//console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850660 - PreCommit-HIVE-Build

> Support bracketed comments
> --
>
> Key: HIVE-15765
> URL: https://issues.apache.org/jira/browse/HIVE-15765
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-15765.1.patch, HIVE-15765.1.patch, 
> HIVE-15765.2.patch, HIVE-15765.3.patch
>
>
> C-style comments are in the SQL spec as well as supported by all major DBs. 
> The are useful for inline annotation of the SQL. We should have them too.
> Example:
> {noformat}
> select
> /*+ MAPJOIN(a) */ /* mapjoin hint */
> a /* column */
> from foo join bar;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15778) DROP INDEX (non-existent) throws NPE when using DbNotificationListener

2017-02-02 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850426#comment-15850426
 ] 

Aihua Xu edited comment on HIVE-15778 at 2/2/17 8:24 PM:
-

Pushed to master. Thanks Vamsee for the work.


was (Author: aihuaxu):
Pushed to master. Thanks Vansee for the work.

> DROP INDEX (non-existent) throws NPE when using DbNotificationListener 
> ---
>
> Key: HIVE-15778
> URL: https://issues.apache.org/jira/browse/HIVE-15778
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vamsee Yarlagadda
>Assignee: Vamsee Yarlagadda
> Fix For: 2.2.0
>
> Attachments: HIVE-15778.v0.patch
>
>
> Trying to execute a DROP INDEX operation on a non-existant index throws NPE.  
> {code}
> 0: jdbc:hive2://nightly-unsecure-1.gce.cloude> DROP INDEX IF EXISTS vamsee1 
> ON sample_07;
> INFO  : Compiling 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4): 
> DROP INDEX IF EXISTS vamsee1 ON sample_07
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4); 
> Time taken: 0.238 seconds
> INFO  : Executing 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4): 
> DROP INDEX IF EXISTS vamsee1 ON sample_07
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException
> INFO  : Completed executing 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4); 
> Time taken: 0.061 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException 
> (state=08S01,code=1)
> {code}
> HMS log:
> {code}
> 2017-01-31 16:27:29,421 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-3]: 
> MetaException(message:java.lang.NullPointerException)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5823)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_index_by_name(HiveMetaStore.java:4403)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>   at com.sun.proxy.$Proxy16.drop_index_by_name(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_index_by_name.getResult(ThriftHiveMetastore.java:10803)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_index_by_name.getResult(ThriftHiveMetastore.java:10787)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hive.hcatalog.messaging.json.JSONDropIndexMessage.(JSONDropIndexMessage.java:46)
>   at 
> org.apache.hive.hcatalog.messaging.json.JSONMessageFactory.buildDropIndexMessage(JSONMessageFactory.java:159)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onDropIndex(DbNotificationListener.java:280)
>   at 
> 

[jira] [Updated] (HIVE-15778) DROP INDEX (non-existent) throws NPE when using DbNotificationListener

2017-02-02 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15778:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Vansee for the work.

> DROP INDEX (non-existent) throws NPE when using DbNotificationListener 
> ---
>
> Key: HIVE-15778
> URL: https://issues.apache.org/jira/browse/HIVE-15778
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vamsee Yarlagadda
>Assignee: Vamsee Yarlagadda
> Fix For: 2.2.0
>
> Attachments: HIVE-15778.v0.patch
>
>
> Trying to execute a DROP INDEX operation on a non-existant index throws NPE.  
> {code}
> 0: jdbc:hive2://nightly-unsecure-1.gce.cloude> DROP INDEX IF EXISTS vamsee1 
> ON sample_07;
> INFO  : Compiling 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4): 
> DROP INDEX IF EXISTS vamsee1 ON sample_07
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4); 
> Time taken: 0.238 seconds
> INFO  : Executing 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4): 
> DROP INDEX IF EXISTS vamsee1 ON sample_07
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException
> INFO  : Completed executing 
> command(queryId=hive_20170131162727_663a0909-2a82-44f9-a800-f4a35abaeaa4); 
> Time taken: 0.061 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException 
> (state=08S01,code=1)
> {code}
> HMS log:
> {code}
> 2017-01-31 16:27:29,421 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-3]: 
> MetaException(message:java.lang.NullPointerException)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5823)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_index_by_name(HiveMetaStore.java:4403)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>   at com.sun.proxy.$Proxy16.drop_index_by_name(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_index_by_name.getResult(ThriftHiveMetastore.java:10803)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_index_by_name.getResult(ThriftHiveMetastore.java:10787)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hive.hcatalog.messaging.json.JSONDropIndexMessage.(JSONDropIndexMessage.java:46)
>   at 
> org.apache.hive.hcatalog.messaging.json.JSONMessageFactory.buildDropIndexMessage(JSONMessageFactory.java:159)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onDropIndex(DbNotificationListener.java:280)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_index_by_name_core(HiveMetaStore.java:4469)
>   at 
> 

[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-02-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850407#comment-15850407
 ] 

Josh Elser commented on HIVE-15795:
---

Made you the assignee here, Mike. Happy to help review this one -- agreed that 
this would be a good improvement.

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-02-02 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned HIVE-15795:
-

Assignee: Mike Fagan  (was: Josh Elser)

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850399#comment-15850399
 ] 

Ashutosh Chauhan commented on HIVE-15388:
-

Can you create a RB for this?

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 

  1   2   >