[jira] [Updated] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.

2016-07-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14067:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to master. Thanks for the patch [~zxu]!


> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.
> 
>
> Key: HIVE-14067
> URL: https://issues.apache.org/jira/browse/HIVE-14067
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: HIVE-14067.000.patch, HIVE-14067.000.patch, 
> HIVE-14067.001.patch
>
>
> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.

2016-07-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14067:
-
Fix Version/s: 2.2.0

> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.
> 
>
> Key: HIVE-14067
> URL: https://issues.apache.org/jira/browse/HIVE-14067
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14067.000.patch, HIVE-14067.000.patch, 
> HIVE-14067.001.patch
>
>
> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383607#comment-15383607
 ] 

Ashutosh Chauhan commented on HIVE-13995:
-

[~hsubramaniyan] Can you create a RB for your .4 patch?

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383593#comment-15383593
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:27 AM:


[~stakiar] I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
decouple the normalization related code and these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).


was (Author: taoli-hwx):
I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
decouple the normalization related code and these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14262:
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Test failures are not related.
Thanks for the review [~sushanth]!
Patch committed to master and branch-2.1

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383593#comment-15383593
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:22 AM:


I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
decouple the normalization related code and these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).


was (Author: taoli-hwx):
I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
isolate the normalization related code from these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383593#comment-15383593
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:20 AM:


I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
isolate the normalization related code from these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).


was (Author: taoli-hwx):
~Sahil Takiar I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
isolate the normalization related code from these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383593#comment-15383593
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:19 AM:


~Sahil Takiar I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
isolate the normalization related code from these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).


was (Author: taoli-hwx):
[~Sahil Takiar] I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even 
better to deprecate the buffered row mode completely due to OOM issue. I don't 
think this is a breaking change since it does not affect the query result. I am 
not sure about the correct behavior with "--incremental=false" though (maybe we 
have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass 
(e.g. IncrementalRowsWithNormalization). The reason is that the non-table 
formats don't require column width normalization at all so it's better to 
isolate the normalization related code from these formats. Without any code 
change (other than setting default of incremental to true), the non-table 
formats should just work fine. Only the table format will involve the 
normalization code path (e.g. your incremental normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-18 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Attachment: HIVE-14214.04.patch

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-18 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Status: In Progress  (was: Patch Available)

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-18 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Status: Patch Available  (was: In Progress)

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383533#comment-15383533
 ] 

Hive QA commented on HIVE-14224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818652/HIVE-14224.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10335 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/570/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/570/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-570/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818652 - PreCommit-HIVE-MASTER-Build

> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-18 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383523#comment-15383523
 ] 

Chaoyu Tang commented on HIVE-14205:


[~Yibing] Thanks, I was able to apply the patch and doing the review. In the 
meantime, let's wait for the test result from the precommit build.

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype) stored as 
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> OK
> CREATE TABLE `avro_union_test2`(
>   `value` uniontype COMMENT '')
> ROW FORMAT SERDE
>   

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.4.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383479#comment-15383479
 ] 

Rui Li commented on HIVE-10022:
---

[~sushanth], could you please verify if the patch works with the case 
[~niklaus.xiao] mentioned? Thanks.

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Pankit Thapar
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.3.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-18 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-14205:
--
Attachment: HIVE-14205.4.patch

I have verified this patch can be applied:
{noformat}
➜  repo git:(master) patch -p0 <~/Downloads/HIVE-14205.4.patch
File data/files/union_non_nullable.avro: git binary diffs are not supported.
File data/files/union_nullable.avro: git binary diffs are not supported.
patching file ql/src/test/queries/clientnegative/avro_non_nullable_union.q
patching file ql/src/test/queries/clientpositive/avro_nullable_union.q
patching file ql/src/test/results/clientnegative/avro_non_nullable_union.q.out
patching file ql/src/test/results/clientpositive/avro_nullable_union.q.out
patching file 
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
patching file 
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
patching file 
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
patching file 
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
{noformat}

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at 

[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:45 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for \-\-incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the \-\-incremental is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
\-\-incremental for non-table formats. What if the user specifies a non-table 
format and also \-\-incremental=false? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for \-\-incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:44 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for \-\-incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for \--incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:44 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:44 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for \--incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:39 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:38 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {--incremental} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {{--}}incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:38 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
{{--incremental}} for non-table formats. What if the user specifies a non-table 
format and also {{--incremental=false}}? Do we want to do buffered rows in this 
case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {--incremental} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:36 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:37 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {{--}}incremental is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also {{--incremental=false}}? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:36 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for {{--incremental}} is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the {{--incremental}} is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:35 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for "{--incremental}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:33 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for "{--incremental}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:32 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for "--incremental" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "--incremental" is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"--incremental" for non-table formats. What if the user specifies a non-table 
format and also "--incremental=false"? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li edited comment on HIVE-14169 at 7/19/16 12:33 AM:
-

Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.


was (Author: taoli-hwx):
Hi Sahil,

2 quick questions:

1. I think the default setting for "{{--incremental}}" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "{{--incremental}}" is not specified, which I think is a typical use 
case.

2. Looks like we always do the incremental rows regardless of the setting for 
"{{--incremental}}" for non-table formats. What if the user specifies a 
non-table format and also "{{--incremental}}=false"? Do we want to do buffered 
rows in this case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383364#comment-15383364
 ] 

Ashutosh Chauhan commented on HIVE-14277:
-

+1

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383363#comment-15383363
 ] 

Tao Li commented on HIVE-14169:
---

Hi Sahil,

2 quick questions:

1. I think the default setting for "--incremental" is still false with your 
change, right? If that's true, we still go into the code path of buffered rows 
if the "--incremental" is not specified, which I think is a typical use case.

2. Looks like we always do the incremental rows regardless of the setting for 
"--incremental" for non-table formats. What if the user specifies a non-table 
format and also "--incremental=false"? Do we want to do buffered rows in this 
case?

Thanks.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11918) Implement/Enable constant related optimization rules in Calcite

2016-07-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-11918.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Implement/Enable constant related optimization rules in Calcite
> ---
>
> Key: HIVE-11918
> URL: https://issues.apache.org/jira/browse/HIVE-11918
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
>
> Right now, Hive optimizer (Calcite) is short of the constant related 
> optimization rules. For example, constant folding, constant propagation and 
> constant transitive rules. Although Hive later provides those rules in the 
> logical optimizer, we would like to implement those inside Calcite. This will 
> benefit the current optimization as well as the optimization based on return 
> path that we are planning to use in the future. This JIRA is the umbrella 
> JIRA to implement/enable those rules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12773) Address current test case failures in Hive

2016-07-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-12773.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

All sub tasks for this has been done. New failures are tracked in new jiras : )

> Address current test case failures in Hive
> --
>
> Key: HIVE-12773
> URL: https://issues.apache.org/jira/browse/HIVE-12773
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
>
> We have around 17-20 test case failures on master. We would like to 
> investigate them and remove all the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-18 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383351#comment-15383351
 ] 

Pengcheng Xiong commented on HIVE-14277:


[~ashutoshc], could u take a look? Thanks.

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Status: Patch Available  (was: Open)

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-07-18 Thread Andrew Sears (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1538#comment-1538
 ] 

Andrew Sears commented on HIVE-13985:
-

Doc'd in the wiki!

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC1.3, TODOC2.1.1, TODOC2.2
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13708) Create table should verify datatypes supported by the serde

2016-07-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13708:

Status: Open  (was: Patch Available)

> Create table should verify datatypes supported by the serde
> ---
>
> Key: HIVE-13708
> URL: https://issues.apache.org/jira/browse/HIVE-13708
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-13708.1.patch, HIVE-13708.2.patch, 
> HIVE-13708.3.patch, HIVE-13708.4.patch
>
>
> As [~Goldshuv] mentioned in HIVE-.
> Create table with serde such as OpenCSVSerde allows for creation of table 
> with columns of arbitrary types. But 'describe table' would still return 
> string datatypes, and so does selects on the table.
> This is misleading and would result in users not getting intended results.
> The create table ideally should disallow the creation of such tables with 
> unsupported types.
> Example posted by [~Goldshuv] in HIVE- -
> {noformat}
> CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
> ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
> serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
> STORED AS TEXTFILE 
> LOCATION '' 
> tblproperties ("skip.header.line.count"="1");
> {noformat}
> Now consider this sql:
> hive> select min(totalprice) from test;
> in this case given my data, the result should have been 874.89, but the 
> actual result became 11.57 (as it is first according to byte ordering of 
> a string type). this is a wrong result.
> hive> desc extended test;
> OK
> o_totalprice  string  from deserializer
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13708) Create table should verify datatypes supported by the serde

2016-07-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383315#comment-15383315
 ] 

Ashutosh Chauhan commented on HIVE-13708:
-

[~hsubramaniyan] you may want to upload your .1 patch on HIVE-13709 which makes 
sense for that issue.

> Create table should verify datatypes supported by the serde
> ---
>
> Key: HIVE-13708
> URL: https://issues.apache.org/jira/browse/HIVE-13708
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-13708.1.patch, HIVE-13708.2.patch, 
> HIVE-13708.3.patch, HIVE-13708.4.patch
>
>
> As [~Goldshuv] mentioned in HIVE-.
> Create table with serde such as OpenCSVSerde allows for creation of table 
> with columns of arbitrary types. But 'describe table' would still return 
> string datatypes, and so does selects on the table.
> This is misleading and would result in users not getting intended results.
> The create table ideally should disallow the creation of such tables with 
> unsupported types.
> Example posted by [~Goldshuv] in HIVE- -
> {noformat}
> CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
> ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
> serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
> STORED AS TEXTFILE 
> LOCATION '' 
> tblproperties ("skip.header.line.count"="1");
> {noformat}
> Now consider this sql:
> hive> select min(totalprice) from test;
> in this case given my data, the result should have been 874.89, but the 
> actual result became 11.57 (as it is first according to byte ordering of 
> a string type). this is a wrong result.
> hive> desc extended test;
> OK
> o_totalprice  string  from deserializer
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383272#comment-15383272
 ] 

Owen O'Malley commented on HIVE-13974:
--

Comments:
* This is looking much better, thanks.
* This still needs comprehensive unit tests.
* The new RecordReaderImpl.isStructCategory() should be replaced with the 
negation of TypeDescription.Category.isPrimitive().
* The new RecordReaderImpl.primitiveTypeNeedsConversion is the same as 
TypeDescription.equals() after HIVE-14242.
* Don't use guava in the ORC module.
* The SchemaEvolution.fake field got left in after debugging.
* Instead of using OrcUtils.getOrcTypes to convert TypeDescription into 
List, use the TypeDescription directly in genIncludedColumns and 
setSearchArgument.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch, HIVE-13974.092.patch, 
> HIVE-13974.093.patch, HIVE-13974.094.patch, HIVE-13974.095.patch, 
> HIVE-13974.096.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383270#comment-15383270
 ] 

Hive QA commented on HIVE-14262:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818612/HIVE-14262.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10336 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/568/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/568/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-568/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818612 - PreCommit-HIVE-MASTER-Build

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14039) HiveServer2: Make the usage of server with JDBC thirft serde enabled, backward compatible for older clients

2016-07-18 Thread Ziyang Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziyang Zhao updated HIVE-14039:
---
Attachment: HIVE-14039.2.patch

> HiveServer2: Make the usage of server with JDBC thirft serde enabled, 
> backward compatible for older clients
> ---
>
> Key: HIVE-14039
> URL: https://issues.apache.org/jira/browse/HIVE-14039
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.1
>Reporter: Vaibhav Gumashta
>Assignee: Ziyang Zhao
> Attachments: HIVE-14039.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14039) HiveServer2: Make the usage of server with JDBC thirft serde enabled, backward compatible for older clients

2016-07-18 Thread Ziyang Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziyang Zhao updated HIVE-14039:
---
Attachment: (was: HIVE-14039.1.patch)

> HiveServer2: Make the usage of server with JDBC thirft serde enabled, 
> backward compatible for older clients
> ---
>
> Key: HIVE-14039
> URL: https://issues.apache.org/jira/browse/HIVE-14039
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.1
>Reporter: Vaibhav Gumashta
>Assignee: Ziyang Zhao
> Attachments: HIVE-14039.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13723) Executing join query on type Float using Thrift Serde will result in Float cast to Double error

2016-07-18 Thread Ziyang Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383246#comment-15383246
 ] 

Ziyang Zhao commented on HIVE-13723:


org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
Above four tests passed in my local.

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
Above six tests also failed in previous builds(505-509).

So all of them are not related to this patch

> Executing join query on type Float using Thrift Serde will result in Float 
> cast to Double error
> ---
>
> Key: HIVE-13723
> URL: https://issues.apache.org/jira/browse/HIVE-13723
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, Serializers/Deserializers
>Affects Versions: 2.1.0
>Reporter: Ziyang Zhao
>Assignee: Ziyang Zhao
>Priority: Critical
> Attachments: HIVE-13723.4.patch.txt
>
>
> After enable thrift Serde, execute the following queries in beeline,
> >create table test1 (a int);
> >create table test2 (b float);
> >insert into test1 values (1);
> >insert into test2 values (1);
> >select * from test1 join test2 on test1.a=test2.b;
> this will give the error:
> java.lang.Exception: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"b":1.0}
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.7.1.2.4.0.0-169.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.7.1.2.4.0.0-169.jar:?]
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"b":1.0}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:168) 
> ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.1.2.4.0.0-169.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_95]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[?:1.7.0_95]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[?:1.7.0_95]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[?:1.7.0_95]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_95]
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"b":1.0}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:568) 
> ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:159) 
> ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.1.2.4.0.0-169.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_95]
> at 

[jira] [Updated] (HIVE-14242) Backport ORC-53 to Hive

2016-07-18 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14242:
-
Attachment: HIVE-14242.patch

> Backport ORC-53 to Hive
> ---
>
> Key: HIVE-14242
> URL: https://issues.apache.org/jira/browse/HIVE-14242
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-14242.patch
>
>
> ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem 
> in TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14242) Backport ORC-53 to Hive

2016-07-18 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14242:
-
Status: Patch Available  (was: Open)

> Backport ORC-53 to Hive
> ---
>
> Key: HIVE-14242
> URL: https://issues.apache.org/jira/browse/HIVE-14242
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem 
> in TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-07-18 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13560:
--
Attachment: HIVE-13560.8.patch

Fixing UT failures.

> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch, HIVE-13560.2.patch, 
> HIVE-13560.3.patch, HIVE-13560.4.patch, HIVE-13560.5.patch, 
> HIVE-13560.6.patch, HIVE-13560.7.patch, HIVE-13560.8.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14273) branch1 test

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14273:
--
Status: Patch Available  (was: Open)

> branch1 test
> 
>
> Key: HIVE-14273
> URL: https://issues.apache.org/jira/browse/HIVE-14273
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14273-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13027) Configuration changes to improve logging performance

2016-07-18 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383196#comment-15383196
 ] 

Prasanth Jayachandran commented on HIVE-13027:
--

Yes. I updated the Hive Logging section with some description about 
asynchronous logging. Thanks [~leftylev] for catching this!

> Configuration changes to improve logging performance
> 
>
> Key: HIVE-13027
> URL: https://issues.apache.org/jira/browse/HIVE-13027
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0
>
> Attachments: HIVE-13027.1.patch, HIVE-13027.2.patch, 
> HIVE-13027.3.patch, HIVE-13027.3.patch, HIVE-13027.4.patch
>
>
> For LLAP and HS2, some configuration changes can be made to improve logging 
> performance
> 1) LOG4j2's async logger claims to have 6-68 times better performance than 
> synchronous logger. https://logging.apache.org/log4j/2.x/manual/async.html
> 2) Replace File appenders with RandomAccessFileAppender that claims to be 
> 20-200% more performant.
> https://logging.apache.org/log4j/2.x/manual/appenders.html#RandomAccessFileAppender
> Also make async logging configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14039) HiveServer2: Make the usage of server with JDBC thirft serde enabled, backward compatible for older clients

2016-07-18 Thread Ziyang Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziyang Zhao updated HIVE-14039:
---
Attachment: HIVE-14039.1.patch

Added a new member in SessionState to track protocol version. Take protocol 
version into consideration when decide which serde to use.

> HiveServer2: Make the usage of server with JDBC thirft serde enabled, 
> backward compatible for older clients
> ---
>
> Key: HIVE-14039
> URL: https://issues.apache.org/jira/browse/HIVE-14039
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.1
>Reporter: Vaibhav Gumashta
>Assignee: Ziyang Zhao
> Attachments: HIVE-14039.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2016-07-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14274:

Attachment: HIVE-14274.1.patch

This solution allows for columns to be added to the end of structs. It looks 
like adding support for arbitrary column-schema evolution in structs would be 
very tricky.

(Note: The solution doesn't change {{HCatRecordReader}} at all, since the 
entire struct is projected correctly by the reader.)

> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>
>
> Consider this sequence of table/partition creation and schema evolution:
> {code:sql}
> -- Create table.
> CREATE EXTERNAL TABLE `simple_text` (
> foo STRING,
> bar STRUCT
> )
> PARTITIONED BY ( dt STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ':'
> STORED AS TEXTFILE ;
> -- Add partition.
> ALTER TABLE simple_text ADD PARTITION ( dt='0' );
> -- Alter the struct-column to add a new sub-field.
> ALTER TABLE simple_text CHANGE COLUMN bar bar STRUCT zoo:STRING>;
> {code}
> The {{dt='0'}} partition's schema indicates 2 fields in {{bar}}. The data can 
> be read using Hive, but not through HCatLoader. The error looks as follows:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: data_raw: 
> Store(hdfs://dilithiumblue-nn1.blue.ygrid.yahoo.com:8020/tmp/temp-643668868/tmp-1639945319:org.apache.pig.impl.io.TFileStorage)
>  - scope-1 Operator Key: scope-1): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>   ... 16 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
> Error converting read value to tuple
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
>   at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:63)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:118)
>   at 
> 

[jira] [Updated] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2016-07-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14274:

Status: Patch Available  (was: Open)

> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>
>
> Consider this sequence of table/partition creation and schema evolution:
> {code:sql}
> -- Create table.
> CREATE EXTERNAL TABLE `simple_text` (
> foo STRING,
> bar STRUCT
> )
> PARTITIONED BY ( dt STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ':'
> STORED AS TEXTFILE ;
> -- Add partition.
> ALTER TABLE simple_text ADD PARTITION ( dt='0' );
> -- Alter the struct-column to add a new sub-field.
> ALTER TABLE simple_text CHANGE COLUMN bar bar STRUCT zoo:STRING>;
> {code}
> The {{dt='0'}} partition's schema indicates 2 fields in {{bar}}. The data can 
> be read using Hive, but not through HCatLoader. The error looks as follows:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: data_raw: 
> Store(hdfs://dilithiumblue-nn1.blue.ygrid.yahoo.com:8020/tmp/temp-643668868/tmp-1639945319:org.apache.pig.impl.io.TFileStorage)
>  - scope-1 Operator Key: scope-1): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>   ... 16 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
> Error converting read value to tuple
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
>   at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:63)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:118)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
>   ... 17 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
>   at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>   at java.util.ArrayList.get(ArrayList.java:429)
>   at 

[jira] [Updated] (HIVE-11516) Fix JDBC compliance issues

2016-07-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11516:
-
Assignee: Tao Li

> Fix JDBC compliance issues
> --
>
> Key: HIVE-11516
> URL: https://issues.apache.org/jira/browse/HIVE-11516
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Reporter: Thejas M Nair
>Assignee: Tao Li
>
> There are several methods in JDBC driver implementation that still throw 
> UnSupportedException. This and other jdbc spec non compliant behavior causes 
> issues when JDBC driver is used with external tools and libraries.
> For example, Jmeter calls HiveStatement.setQueryTimeout and this was 
> resulting in an exception. HIVE-10726 makes it possible to have a workaround 
> for this.
> Creating this jira for ease of tracking such issues. Please mark new jiras as 
> blocking this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14165) Enable faster S3 Split Computation by listing files in blocks

2016-07-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14165:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-14269

> Enable faster S3 Split Computation by listing files in blocks
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>
> During split computation when a large number of files are required to be 
> listed from S3, instead of executing 1 API call per file, one can optimize by 
> listing 1000 files in each API call. This would reduce the amount of time 
> required for listing files.
> Qubole has this optimization in place as detailed here: 
> https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.

2016-07-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383175#comment-15383175
 ] 

zhihai xu commented on HIVE-14067:
--

The test failures are not related to my change.

> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.
> 
>
> Key: HIVE-14067
> URL: https://issues.apache.org/jira/browse/HIVE-14067
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: HIVE-14067.000.patch, HIVE-14067.000.patch, 
> HIVE-14067.001.patch
>
>
> Rename pendingCount to activeCalls in HiveSessionImpl  for easier 
> understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress

2016-07-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383159#comment-15383159
 ] 

zhihai xu commented on HIVE-14258:
--

Thanks [~jxiang] for review and committing the patch!

> Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too 
> long to finish without reporting progress
> ---
>
> Key: HIVE-14258
> URL: https://issues.apache.org/jira/browse/HIVE-14258
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.1.0
>
> Attachments: HIVE-14258.patch
>
>
> Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too 
> long to finish without reporting progress.
> This timeout happened when reducer.close() is called in ReduceTask.java.
> CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop 
> over every row in the AbstractRowContainer. This can take a long time if 
> there are a large number or rows, and during this time, it does not report 
> progress. If this runs for long enough more than "mapreduce.task.timeout", 
> ApplicationMaster will kill the task for failing to report progress.
> we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack 
> trace in the 10 minutes before AM killed the reduce task at 2016-07-15 
> 07:19:11.
> The following three stack traces can prove it:
> at 2016-07-15 07:09:42:
> {code}
> "main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:272)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x0007deecefb0> (a 
> org.apache.hadoop.fs.BufferedFSInputStream)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at 
> org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252)
> at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
> at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
> - locked <0x0007deecb978> (a 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
> at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359)
> - locked <0x0007deec8f70> (a 
> org.apache.hadoop.io.SequenceFile$Reader)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491)
> - locked <0x0007deec8f70> (a 
> org.apache.hadoop.io.SequenceFile$Reader)
> at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
> - locked <0x0007deec82f0> (a 
> org.apache.hadoop.mapred.SequenceFileRecordReader)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at 

[jira] [Resolved] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-13369.
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   1.3.0

committed to branch-1,branch2.1 and master

thanks Owen for the review

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, 
> HIVE-13369.6.patch, HIVE-13369.branch-1.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383150#comment-15383150
 ] 

Mohit Sabharwal commented on HIVE-14229:


This patch is about hive.reloadable.aux.jars.path (and not hive.aux.jars.path), 
correct ?

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-18 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383122#comment-15383122
 ] 

Prasanth Jayachandran commented on HIVE-14224:
--

+1. I think we should get rid of dagId based routing since that seems redundant 
now with queryId-dagId.log file name. Just have a single "routing" appender 
with queryId-dagId.log. We can do pattern based log aggregation when yarn 
supports it.  

> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13894) Fix more json related JDK8 test failures Part 2

2016-07-18 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-13894:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-13547

> Fix more json related JDK8 test failures Part 2
> ---
>
> Key: HIVE-13894
> URL: https://issues.apache.org/jira/browse/HIVE-13894
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: 2.2.0
>
> Attachments: HIVE-13894.patch
>
>
> After merge of java8 branch to master, some more json ordering related 
> failures 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-18 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14224:
--
Attachment: HIVE-14224.03.patch

Updated patch with RB comments addressed. Also added logic to handle the case 
where a filename collision could happen.

> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14229:

Status: Patch Available  (was: Open)

Patch-1: add the aux jars to the classpath as well, so the local task for 
example select udf(x) from src; will also be able to run.

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14269) Performance optimizations for data on S3

2016-07-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383018#comment-15383018
 ] 

Sergio Peña commented on HIVE-14269:


Thanks [~ashutoshc]. I'll take a look at them.

> Performance optimizations for data on S3
> 
>
> Key: HIVE-14269
> URL: https://issues.apache.org/jira/browse/HIVE-14269
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> Working with tables that resides on Amazon S3 (or any other object store) 
> have several performance impact when reading or writing data, and also 
> consistency issues.
> This JIRA is an umbrella task to monitor all the performance improvements 
> that can be done in Hive to work better with S3 data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14229:

Attachment: HIVE-14229.1.patch

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14271) FileSinkOperator should not rename files to final paths when S3 is the default destination

2016-07-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14271:
---
Assignee: Abdullah Yousufi  (was: Sergio Peña)

> FileSinkOperator should not rename files to final paths when S3 is the 
> default destination
> --
>
> Key: HIVE-14271
> URL: https://issues.apache.org/jira/browse/HIVE-14271
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
>
> FileSinkOperator does a rename of {{outPaths -> finalPaths}} when it finished 
> writing all rows to a temporary path. The problem is that S3 does not support 
> renaming.
> Two options can be considered:
> a. Use a copy operation instead. After FileSinkOperator writes all rows to 
> outPaths, then the commit method will do a copy() call instead of move().
> b. Write row by row directly to the S3 path (see HIVE-1620). This may add 
> better performance calls, but we should take care of the cleanup part in case 
> of writing errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382997#comment-15382997
 ] 

Aihua Xu commented on HIVE-14267:
-

+1.

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382991#comment-15382991
 ] 

Sushanth Sowmyan commented on HIVE-14262:
-

(We should still let the test runs finish with .2.patch though, I think, for 
verification)

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382989#comment-15382989
 ] 

Sushanth Sowmyan commented on HIVE-14262:
-

Ah, makes sense. +1. I remember now that I had a similar issue with that bit a 
while back.

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14229:

Attachment: (was: HIVE-14229.1.patch)

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14269) Performance optimizations for data on S3

2016-07-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382978#comment-15382978
 ] 

Ashutosh Chauhan commented on HIVE-14269:
-

HIVE-14128 & HIVE-14129 might be of interest to you. Both of them led to 
substantial gains when I did some initial testing using those.

> Performance optimizations for data on S3
> 
>
> Key: HIVE-14269
> URL: https://issues.apache.org/jira/browse/HIVE-14269
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> Working with tables that resides on Amazon S3 (or any other object store) 
> have several performance impact when reading or writing data, and also 
> consistency issues.
> This JIRA is an umbrella task to monitor all the performance improvements 
> that can be done in Hive to work better with S3 data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14229:

Attachment: HIVE-14229.1.patch

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382952#comment-15382952
 ] 

Thejas M Nair commented on HIVE-14262:
--

The 2.patch won't need the .q.out files to be updated. In 1.patch, I changed 
the WriteEntity.equals method thinking that it is necessary to prevent any 
writeentity getting added as part of the loop over partitions from overwriting 
ones already added by SemanticAnalyzer, just in case it has the more accurate 
writeType.
But on taking a closer look at the Sets.union method, that wasn't necessary, 
and I added comment around that in 2.patch.


> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14229:

Summary: the jars in hive.aux.jar.paths are not added to session classpath  
 (was: the jars in hive.aux.jar.paths are not added to HiveServer2 classpath )

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.branch-1.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, 
> HIVE-13369.6.patch, HIVE-13369.branch-1.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Open  (was: Patch Available)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, HIVE-13369.6.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Mohit for reviewing.

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14198.1.patch, HIVE-14198.2.patch, 
> HIVE-14198.3.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382925#comment-15382925
 ] 

Aihua Xu commented on HIVE-14198:
-

The tests are not related.

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch, HIVE-14198.2.patch, 
> HIVE-14198.3.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382919#comment-15382919
 ] 

Hive QA commented on HIVE-14198:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818568/HIVE-14198.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10334 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/567/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/567/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-567/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818568 - PreCommit-HIVE-MASTER-Build

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch, HIVE-14198.2.patch, 
> HIVE-14198.3.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382905#comment-15382905
 ] 

Sushanth Sowmyan commented on HIVE-14262:
-

+1 for the code change, but it looks like this impacts a lot of .q.out files 
that also need to be updated.

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14269) Performance optimizations for data on S3

2016-07-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382903#comment-15382903
 ] 

Sergio Peña commented on HIVE-14269:


HIVE-1620 is an old patch (back to 2011) that contains some work to improve 
access to S3.

> Performance optimizations for data on S3
> 
>
> Key: HIVE-14269
> URL: https://issues.apache.org/jira/browse/HIVE-14269
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> Working with tables that resides on Amazon S3 (or any other object store) 
> have several performance impact when reading or writing data, and also 
> consistency issues.
> This JIRA is an umbrella task to monitor all the performance improvements 
> that can be done in Hive to work better with S3 data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9478) Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter

2016-07-18 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-9478.
---
Resolution: Cannot Reproduce
  Assignee: Vihang Karajgaonkar

> Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter
> 
>
> Key: HIVE-9478
> URL: https://issues.apache.org/jira/browse/HIVE-9478
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.13.1
>Reporter: Mac Noland
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> When I first got started with beeline, since I was already the cloud user on 
> my edge node (not using Kerbose), I was logging into beeline without passing 
> in a user via –n.  Not sure if that was right or not, but seemed to let me 
> create tables.  However, when doing this and trying to do a select, I would 
> get the error below.  See Appendix #1 for me creating a table and then doing 
> a select to get the error.
> I spent some time messing around with different things and stumbled upon 
> where if a passed in the –n parameter, the select command was completed 
> successfully.  See Appendix #2 for that output.
>  
> I’m not sure why I could create tables but not select on them when not 
> passing in –n?  But then select on the table when passing in my username via 
> –n?  I’m not using Kerberos authentication so my expert contact might have 
> suggested that setup might give me a unusual experience.
> Anyway, I’m off and running using the –n parameter but wanted to share my 
> experience as I cut over to beeline.  Thanks again in advance for everyone's 
> help and great work on Hive and Beeline.
> Appendix 1
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default '' 
> ''
> scan complete in 4ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> create table test123 (test123 
> int);
> No rows affected (0.217 seconds)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
> java.sql.SQLException: Error while processing statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:277)
> at org.apache.hive.beeline.Commands.execute(Commands.java:736)
> at org.apache.hive.beeline.Commands.sql(Commands.java:657)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:908)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:770)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:732)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:467)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
> Appendix #2
> Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true -n 
> cloud
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default 
> cloud ''
> scan complete in 3ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> +---+--+
> |  _c0  |
> +---+--+
> | NULL  |
> +---+--+
> 1 row selected (34.084 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-18 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382902#comment-15382902
 ] 

Sahil Takiar commented on HIVE-14169:
-

[~thejas] any comments on this JIRA or HIVE-14170?

Thanks!

--Sahil

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-18 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382890#comment-15382890
 ] 

Tao Li commented on HIVE-14254:
---

Sergio Peña, thanks for the reivew.

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> ​Hive 2.1.0-SNAPSHOT
> ​Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-07-18 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382878#comment-15382878
 ] 

Chris Nauroth commented on HIVE-13990:
--

[~thejas], is this possibly a duplicate of HIVE-9182?  There was an uncommitted 
patch on that one.  During code review for that patch, I gave feedback that you 
can avoid {{getAclStatus}} calls by checking {{FsPermission#getAclBit}}.  For 
any {{FileSystem}} that doesn't implement ACLs, the ACL bit will always be 
false.  I expect this would work for all {{FileSystem}} implementations and 
avoid tight coupling to HDFS server-side configuration.  I think the same 
feedback applies here.

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, 
> HIVE-13990.1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9478) Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter

2016-07-18 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382885#comment-15382885
 ] 

Vihang Karajgaonkar commented on HIVE-9478:
---

Thanks [~mcdonaldnoland] for confirming.

> Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter
> 
>
> Key: HIVE-9478
> URL: https://issues.apache.org/jira/browse/HIVE-9478
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.13.1
>Reporter: Mac Noland
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> When I first got started with beeline, since I was already the cloud user on 
> my edge node (not using Kerbose), I was logging into beeline without passing 
> in a user via –n.  Not sure if that was right or not, but seemed to let me 
> create tables.  However, when doing this and trying to do a select, I would 
> get the error below.  See Appendix #1 for me creating a table and then doing 
> a select to get the error.
> I spent some time messing around with different things and stumbled upon 
> where if a passed in the –n parameter, the select command was completed 
> successfully.  See Appendix #2 for that output.
>  
> I’m not sure why I could create tables but not select on them when not 
> passing in –n?  But then select on the table when passing in my username via 
> –n?  I’m not using Kerberos authentication so my expert contact might have 
> suggested that setup might give me a unusual experience.
> Anyway, I’m off and running using the –n parameter but wanted to share my 
> experience as I cut over to beeline.  Thanks again in advance for everyone's 
> help and great work on Hive and Beeline.
> Appendix 1
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default '' 
> ''
> scan complete in 4ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> create table test123 (test123 
> int);
> No rows affected (0.217 seconds)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
> java.sql.SQLException: Error while processing statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:277)
> at org.apache.hive.beeline.Commands.execute(Commands.java:736)
> at org.apache.hive.beeline.Commands.sql(Commands.java:657)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:908)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:770)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:732)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:467)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
> Appendix #2
> Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true -n 
> cloud
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default 
> cloud ''
> scan complete in 3ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> +---+--+
> |  _c0  |
> +---+--+
> | NULL  |
> +---+--+
> 1 row selected (34.084 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9478) Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter

2016-07-18 Thread Mac Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382857#comment-15382857
 ] 

Mac Noland commented on HIVE-9478:
--

Hi Vihang.  We can probably close the issue.  I'm not able to produce it any 
longer either.

> Beeline CLI Creating and Selecting On Tables Without Passing in -n Parameter
> 
>
> Key: HIVE-9478
> URL: https://issues.apache.org/jira/browse/HIVE-9478
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.13.1
>Reporter: Mac Noland
>Priority: Minor
>
> When I first got started with beeline, since I was already the cloud user on 
> my edge node (not using Kerbose), I was logging into beeline without passing 
> in a user via –n.  Not sure if that was right or not, but seemed to let me 
> create tables.  However, when doing this and trying to do a select, I would 
> get the error below.  See Appendix #1 for me creating a table and then doing 
> a select to get the error.
> I spent some time messing around with different things and stumbled upon 
> where if a passed in the –n parameter, the select command was completed 
> successfully.  See Appendix #2 for that output.
>  
> I’m not sure why I could create tables but not select on them when not 
> passing in –n?  But then select on the table when passing in my username via 
> –n?  I’m not using Kerberos authentication so my expert contact might have 
> suggested that setup might give me a unusual experience.
> Anyway, I’m off and running using the –n parameter but wanted to share my 
> experience as I cut over to beeline.  Thanks again in advance for everyone's 
> help and great work on Hive and Beeline.
> Appendix 1
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default '' 
> ''
> scan complete in 4ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> create table test123 (test123 
> int);
> No rows affected (0.217 seconds)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
> java.sql.SQLException: Error while processing statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:277)
> at org.apache.hive.beeline.Commands.execute(Commands.java:736)
> at org.apache.hive.beeline.Commands.sql(Commands.java:657)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:908)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:770)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:732)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:467)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
> Appendix #2
> Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> cloud@c-10-206-76-8:~>  beeline -u 
> jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default --verbose=true -n 
> cloud
> issuing: !connect jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default 
> cloud ''
> scan complete in 3ms
> Connecting to jdbc:hive2://c-10-206-76-8.int.cis.trcloud:1/default
> Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
> Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.13.1-cdh5.2.0 by Apache Hive
> 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> select avg(test123) from 
> test123;
> +---+--+
> |  _c0  |
> +---+--+
> | NULL  |
> +---+--+
> 1 row selected (34.084 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Patch Available  (was: Open)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: HIVE-14267.patch

When operations are timed out, the metric open_operations does not get 
decremented. 
The attached patch decrements the counter in 
HiveSessionImpl.closeTimedOutOperations(). 

I have also verified that the operations are being decremented when the 
sessions are timed out (idle session timeout value < idle operation timeout 
value) and when timed-out sessions are closed.

 

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14262) Inherit writetype from partition WriteEntity for table WriteEntity

2016-07-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14262:
-
Attachment: HIVE-14262.2.patch

> Inherit writetype from partition WriteEntity for table WriteEntity
> --
>
> Key: HIVE-14262
> URL: https://issues.apache.org/jira/browse/HIVE-14262
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14262.1.patch, HIVE-14262.2.patch
>
>
> For partitioned table operations, a Table WriteEntity is being added to the 
> list to be authorized if there is a partition in the output list from 
> semantic analyzer. 
> However, it is being added with a default WriteType of DDL_NO_TASK.
> The new Table WriteEntity should be created with the WriteType of the 
> partition WriteEntity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress

2016-07-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382785#comment-15382785
 ] 

zhihai xu commented on HIVE-14258:
--

The test failures are not related to my change.

> Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too 
> long to finish without reporting progress
> ---
>
> Key: HIVE-14258
> URL: https://issues.apache.org/jira/browse/HIVE-14258
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-14258.patch
>
>
> Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too 
> long to finish without reporting progress.
> This timeout happened when reducer.close() is called in ReduceTask.java.
> CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop 
> over every row in the AbstractRowContainer. This can take a long time if 
> there are a large number or rows, and during this time, it does not report 
> progress. If this runs for long enough more than "mapreduce.task.timeout", 
> ApplicationMaster will kill the task for failing to report progress.
> we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack 
> trace in the 10 minutes before AM killed the reduce task at 2016-07-15 
> 07:19:11.
> The following three stack traces can prove it:
> at 2016-07-15 07:09:42:
> {code}
> "main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:272)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x0007deecefb0> (a 
> org.apache.hadoop.fs.BufferedFSInputStream)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at 
> org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252)
> at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
> at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
> - locked <0x0007deecb978> (a 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
> at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359)
> - locked <0x0007deec8f70> (a 
> org.apache.hadoop.io.SequenceFile$Reader)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491)
> - locked <0x0007deec8f70> (a 
> org.apache.hadoop.io.SequenceFile$Reader)
> at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
> - locked <0x0007deec82f0> (a 
> org.apache.hadoop.mapred.SequenceFileRecordReader)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> 

[jira] [Updated] (HIVE-14265) Partial stats in Join operator may lead to data size estimate of 0

2016-07-18 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14265:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2.1. Thanks for the review [~ashutoshc]!

> Partial stats in Join operator may lead to data size estimate of 0
> --
>
> Key: HIVE-14265
> URL: https://issues.apache.org/jira/browse/HIVE-14265
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Nita Dembla
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14265.patch
>
>
> For some tables, we might not have the column stats available. However, if 
> the table is partitioned, we will have the stats for partition columns.
> When we estimate the size of the data produced by a join operator, we end up 
> using only the columns that are available for the calculation e.g. partition 
> columns in this case.
> However, even in these cases, we should add the data size for those columns 
> for which we do not have stats (_default size for the column type x estimated 
> number of rows_).
> To reproduce, the following example can be used:
> {noformat}
> create table sample_partitioned (x int) partitioned by (y int);
> insert into sample_partitioned partition(y=1) values (1),(2);
> create temporary table sample as select * from sample_partitioned;
> analyze table sample compute statistics for columns;
> explain select sample_partitioned.x from sample_partitioned, sample where 
> sample.y = sample_partitioned.y;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382697#comment-15382697
 ] 

Owen O'Malley commented on HIVE-13369:
--

+1 on HIVE-13369.6.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, HIVE-13369.6.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382629#comment-15382629
 ] 

Hive QA commented on HIVE-14123:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818555/HIVE-14123.7.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10343 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_skewtable
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/566/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/566/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-566/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818555 - PreCommit-HIVE-MASTER-Build

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14135:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~vihangk1] for your contribution. I committed this 2.2

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> HIVE-14135.3.patch, csv.txt, csv2.txt, dsv.txt, longKeyValues.txt, 
> output_after.txt, output_before.txt, table.txt, tsv.txt, tsv2.txt, 
> vertical.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14227) Investigate invalid SessionHandle and invalid OperationHandle

2016-07-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382488#comment-15382488
 ] 

Aihua Xu commented on HIVE-14227:
-

I spent some time understanding the issue. Seems the problem comes from 
disconnection between connection and session. For beeline client, we are 
creating a connection and then open a session on top of it. Then we close 
session and destroy the connection. That's fine. While if a connection 
(connection2) is created and we make calls with an existing session, that's 
legal. But if the other connection (connection1) in which session is opened 
gets destroyed, the session gets destroyed as well and we will see invalid 
sessionHandle from connection2.  Hope my understanding is correct. 

I'm thinking of the following approach: similar to a call openSession(), we can 
add a call bindSession(sessionHandle) for a new connection with an existing 
session handle. Then each connection will be aware of the session and the 
following calls can be called without sessionHandle. Also the session will 
remember associated connections with it and session can be really closed when 
there is no connections.

> Investigate invalid SessionHandle and invalid OperationHandle
> -
>
> Key: HIVE-14227
> URL: https://issues.apache.org/jira/browse/HIVE-14227
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> There are the following warnings. 
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-55]: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Invalid SessionHandle: 
> SessionHandle [1bc00251-64e9-4a95-acb7-a7f53f773528]
> at 
> org.apache.hive.service.cli.session.SessionManager.getSession(SessionManager.java:318)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> {noformat}
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-1060]: Error closing operation:
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=13d930dd-316c-4c09-9f44-fee5f483e73d]
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:185)
> at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:408)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:664)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1513)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1498)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382510#comment-15382510
 ] 

Sergio Peña commented on HIVE-14254:


Patch looks good.
+1

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> ​Hive 2.1.0-SNAPSHOT
> ​Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14261) Support set/unset partition parameters

2016-07-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382506#comment-15382506
 ] 

Sergio Peña commented on HIVE-14261:


This patch looks good. I run a few other tests with non-partitioned tables, and 
they work very well.
+1

> Support set/unset partition parameters
> --
>
> Key: HIVE-14261
> URL: https://issues.apache.org/jira/browse/HIVE-14261
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14261.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >