[jira] [Work logged] (HIVE-22232) NPE when hive.order.columnalignment is set to false

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22232?focusedWorklogId=319915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319915
 ]

ASF GitHub Bot logged work on HIVE-22232:
-

Author: ASF GitHub Bot
Created on: 28/Sep/19 00:45
Start Date: 28/Sep/19 00:45
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on issue #783: HIVE-22232
URL: https://github.com/apache/hive/pull/783#issuecomment-536135626
 
 
   Addressed comment in follow-up commit. @vineetgarg02 , can you take another 
look? Thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319915)
Time Spent: 40m  (was: 0.5h)

> NPE when hive.order.columnalignment is set to false
> ---
>
> Key: HIVE-22232
> URL: https://issues.apache.org/jira/browse/HIVE-22232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22232.01.patch, HIVE-22232.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When {{hive.order.columnalignment}} is disabled and the plan contains an 
> Aggregate operator, we hit a NPE.
> {code}
>  java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:163)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1555)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12630)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1385)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1332)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1327)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:124)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:217)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22232) NPE when hive.order.columnalignment is set to false

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22232:
---
Attachment: HIVE-22232.01.patch

> NPE when hive.order.columnalignment is set to false
> ---
>
> Key: HIVE-22232
> URL: https://issues.apache.org/jira/browse/HIVE-22232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22232.01.patch, HIVE-22232.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When {{hive.order.columnalignment}} is disabled and the plan contains an 
> Aggregate operator, we hit a NPE.
> {code}
>  java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:163)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1555)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12630)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1385)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1332)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1327)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:124)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:217)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22265) Ordinals in view are not being picked up in materialized view

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22265:
---
Component/s: Materialized views

> Ordinals in view are not being picked up in materialized view
> -
>
> Key: HIVE-22265
> URL: https://issues.apache.org/jira/browse/HIVE-22265
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Minor
> Attachments: lv-exp-unalias.sql
>
>
> There is a conf which allows ordinals to be used for group by, i.e. 
> hive.groupby.position.alias
> This isn't being picked up by materialized views when set in a view.  
> Workaround is to not use ordinals.  Script is attached.
> Example:
> create view campaigns.campaign_data_lview_bad as SELECT platform, 
> platform_version, currency, sum(amount) as sum_amount, sum(duration) as 
> sum_duration, count(user_    id) count_user_id, min(amount) min_amount, 
> max(amount) max_amount, year , month FROM `campaigns`.`campaign_data` GROUP 
> BY 1, 2, 3, 9, 10;
> create materialized view aview620_bad  stored as orc as select platform, 
> platform_version, currency, sum_amount, sum_duration, count_user_id, 
> min_amount, max_amoun    t, year, month from 
> `campaigns`.`campaign_data_lview_bad`;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22263) MV has distinct on columns and query has count(distinct) on one of the columns, we do not trigger rewritin

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22263:
---
Component/s: Materialized views

> MV has distinct on columns and query has count(distinct) on one of the 
> columns, we do not trigger rewritin
> --
>
> Key: HIVE-22263
> URL: https://issues.apache.org/jira/browse/HIVE-22263
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: count-distinct.sql, count-distinct2.sql
>
>
> Count distinct issues with materialized views.  Two scripts attached
> 1) 
> create materialized view base_aview stored as orc as select distinct c1 c1, 
> c2 c2 from base;
> explain extended select count(distinct c1) from base group by c2 ;
> 2)
> create materialized view base_aview stored as orc as SELECT c1 c1, c2 c2, 
> sum(c2) FROM base group by 1,2;
> explain extended select count(distinct c1) from base group by c2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22259) Rewriting fails for `BETWEEN` clauses with different ranges in MV and query

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22259:
---
Component/s: Materialized views

> Rewriting fails for `BETWEEN` clauses with different ranges in MV and query
> ---
>
> Key: HIVE-22259
> URL: https://issues.apache.org/jira/browse/HIVE-22259
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Steve Carlin
>Priority: Major
> Attachments: expr5.sql
>
>
> Script attached.
> The following query does not rewrite:
> create materialized view view9 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id >= 1 and 
> prod_id < 31);
>  
> -- this is not ok
> explain extended select  * from sales where cust_id between 1 and 20 and 
> prod_id < 31;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22261) Materialized view rewriting does not support window functions

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22261:
---
Component/s: Materialized views

> Materialized view rewriting does not support window functions
> -
>
> Key: HIVE-22261
> URL: https://issues.apache.org/jira/browse/HIVE-22261
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: af2.sql
>
>
> Materialized views don't support window functions.  At a minimum, we should 
> print a friendlier message when the rewrite fails (it can still be created 
> with a "disable rewrite")
> Script is attached
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22260) Materialized view rewriting does not support `UNION` operator, exact match can work under view

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22260:
---
Component/s: Materialized views

> Materialized view rewriting does not support `UNION` operator, exact match 
> can work under view
> --
>
> Key: HIVE-22260
> URL: https://issues.apache.org/jira/browse/HIVE-22260
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: complex0.sql
>
>
> In this case, a view can be created that hides some nastier syntax like a 
> "union".  
> A materialized view can contain the view with a simple query.  So if the end 
> query just uses the view, it should rewrite to the materialized view. 
> Furthermore, an exception is thrown when it contains the "union" while 
> creating the view.  At a minimum, we should print a friendlier message when 
> the rewrite fails.
> A script is attached.
> An example of this:
> create view logical_complex0 as
> with t as
> (select c1 as a, c2 as b from tab1 where c2 in (select f from logical_simple 
> where g > 0)
> union
> select tab3.c1 as c, tab4.c2 as d from tab3, tab4 where tab3.c2 = tab4.c2)
> select a, b
> from t;
>  
> – query separator
>  
> create materialized view aview_complex0 stored as orc as
> select a as x, b as y, count(*)
> from logical_complex0
> group by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22259) Rewriting fails for `BETWEEN` clauses with different ranges in MV and query

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22259:
---
Component/s: CBO

> Rewriting fails for `BETWEEN` clauses with different ranges in MV and query
> ---
>
> Key: HIVE-22259
> URL: https://issues.apache.org/jira/browse/HIVE-22259
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Reporter: Steve Carlin
>Priority: Major
> Attachments: expr5.sql
>
>
> Script attached.
> The following query does not rewrite:
> create materialized view view9 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id >= 1 and 
> prod_id < 31);
>  
> -- this is not ok
> explain extended select  * from sales where cust_id between 1 and 20 and 
> prod_id < 31;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22262:
---
Component/s: Materialized views

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22264) Degenerate case where mv not being used: computing aggregate on group by field

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22264:
---
Component/s: Materialized views

> Degenerate case where mv not being used: computing aggregate on group by field
> --
>
> Key: HIVE-22264
> URL: https://issues.apache.org/jira/browse/HIVE-22264
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: compensable.sql
>
>
> This is a degenerate case, but still should work.  There is no reason to do a 
> min(userid) when grouping by userid (should just use "userid" directly), but 
> it should rewrite regardless
> Script is attached.
> Example:
> create materialized view view1 stored as orc as select userid, sum(sales_amt) 
> from base group by 1;
> explain extended select min(userid) from base group by userid;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22253) General task tracking improvements for materialized views

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22253:
---
Component/s: Materialized views

> General task tracking improvements for materialized views
> -
>
> Key: HIVE-22253
> URL: https://issues.apache.org/jira/browse/HIVE-22253
> Project: Hive
>  Issue Type: Task
>  Components: Materialized views
>Reporter: Steve Carlin
>Priority: Major
>
> We have a whole lot of tests from a different system that created and tested 
> materialized views.
> This Jira serves as the parent task for all the shortcomings that were found 
> when running these tests on Hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22256) Rewriting fails when `IN` clause has items in different order in MV and query.

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22256:
---
Component/s: Materialized views

> Rewriting fails when `IN` clause has items in different order in MV and query.
> --
>
> Key: HIVE-22256
> URL: https://issues.apache.org/jira/browse/HIVE-22256
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: expr2.sql
>
>
> Rewriting fails on following materialized view and query (script is also 
> attached):
> create materialized view view2 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id in (1,2,3,4,5));
> explain extended select prod_id, cust_id  from sales where cust_id in 
> (5,1,2,3,4);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22254) Mappings.NoElementException: no target in mapping, in `MaterializedViewAggregateRule

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22254:
---
Component/s: Materialized views

> Mappings.NoElementException: no target in mapping, in 
> `MaterializedViewAggregateRule
> 
>
> Key: HIVE-22254
> URL: https://issues.apache.org/jira/browse/HIVE-22254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Minor
> Attachments: ojoin_full.sql
>
>
> A Mappings.NoElementException happens on an edge condition for a query using 
> a materialized view.
> The query contains a "group by" clause which contains fields from both sides 
> of a join.  There is no real reason to group by this same field twice, but 
> there is also no reason that this shouldn't succeed.
> Attached is a script which causes this failure.  The query causing the 
> problem looks like this:
> explain extended select sum(1)
> from fact inner join dim1
> on fact.f1 = dim1.pk1
> group by f1, pk1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22257) Commutativity of operations is not taken into account, e.g., '+'

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22257:
---
Component/s: Materialized views

> Commutativity of operations is not taken into account, e.g., '+'
> 
>
> Key: HIVE-22257
> URL: https://issues.apache.org/jira/browse/HIVE-22257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: expr9.sql
>
>
> ...as stated in subject.  Script to reproduce is attached.
> Query and materialized view are as follows:
> create materialized view view5 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id + prod_id > 1 + 
> 2);
> explain extended select  prod_id, cust_id  from sales where prod_id + cust_id 
> > 1 + 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22258) Rewriting fails for `IN` clauses in MV and query when we use equals or subset in the query

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22258:
---
Component/s: Materialized views

> Rewriting fails for `IN` clauses in MV and query when we use equals or subset 
> in the query
> --
>
> Key: HIVE-22258
> URL: https://issues.apache.org/jira/browse/HIVE-22258
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: in-pred.sql
>
>
> ...as stated in title.  Script is attached.  The issue can be seen with these 
> queries:
>  
> create materialized view av1 stored as orc as select state, year, 
> sum(population) from census_pop where year IN (2010, 2018) group by state, 
> year;
> -- this is ok
> explain extended select state, year, sum(population) from census_pop where 
> year IN (2010, 2018) group by state, year;
> -- this is not ok
> explain extended select state, year, sum(population) from census_pop where 
> year = 2010 group by state, year;
> -- this is not ok
> explain extended select state, year, sum(population) from census_pop where 
> year in (2010) group by state, year;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22266) Addendum fix to have HS2 pom add explicit curator dependency

2019-09-27 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-22266:
-
Status: Patch Available  (was: Open)

> Addendum fix to have HS2 pom add explicit curator dependency
> 
>
> Key: HIVE-22266
> URL: https://issues.apache.org/jira/browse/HIVE-22266
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-22266.patch
>
>
> It might be better to add an explicit dependency on apache-curator in the 
> service/pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22266) Addendum fix to have HS2 pom add explicit curator dependency

2019-09-27 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-22266:
-
Attachment: HIVE-22266.patch

> Addendum fix to have HS2 pom add explicit curator dependency
> 
>
> Key: HIVE-22266
> URL: https://issues.apache.org/jira/browse/HIVE-22266
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-22266.patch
>
>
> It might be better to add an explicit dependency on apache-curator in the 
> service/pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains only base files

2019-09-27 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HIVE-22255:
-
Summary: Hive don't trigger Major Compaction automatically if table 
contains only base files   (was: Hive don't trigger Major Compaction 
automatically if table contains all base files )

> Hive don't trigger Major Compaction automatically if table contains only base 
> files 
> 
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
> ++{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22266) Addendum fix to have HS2 pom add explicit curator dependency

2019-09-27 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-22266:



> Addendum fix to have HS2 pom add explicit curator dependency
> 
>
> Key: HIVE-22266
> URL: https://issues.apache.org/jira/browse/HIVE-22266
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>
> It might be better to add an explicit dependency on apache-curator in the 
> service/pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains all base files

2019-09-27 Thread Dinesh Chitlangia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939771#comment-16939771
 ] 

Dinesh Chitlangia commented on HIVE-22255:
--

[~Rajkumar Singh] Thanks for filing this issue. Isn't {{insert overwrite}} 
supposed to wipe out existing base file and create new one?

> Hive don't trigger Major Compaction automatically if table contains all base 
> files 
> ---
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
> ++{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains all base files

2019-09-27 Thread Rajkumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh reassigned HIVE-22255:
-

Assignee: Rajkumar Singh

> Hive don't trigger Major Compaction automatically if table contains all base 
> files 
> ---
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
> ++{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2019-09-27 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated HIVE-22262:

Parent: HIVE-22253
Issue Type: Sub-task  (was: Bug)

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22260) Materialized view rewriting does not support `UNION` operator, exact match can work under view

2019-09-27 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated HIVE-22260:

Description: 
In this case, a view can be created that hides some nastier syntax like a 
"union".  

A materialized view can contain the view with a simple query.  So if the end 
query just uses the view, it should rewrite to the materialized view. 

Furthermore, an exception is thrown when it contains the "union" while creating 
the view.  At a minimum, we should print a friendlier message when the rewrite 
fails.

A script is attached.

An example of this:

create view logical_complex0 as

with t as

(select c1 as a, c2 as b from tab1 where c2 in (select f from logical_simple 
where g > 0)

union

select tab3.c1 as c, tab4.c2 as d from tab3, tab4 where tab3.c2 = tab4.c2)

select a, b

from t;

 

– query separator

 

create materialized view aview_complex0 stored as orc as

select a as x, b as y, count(*)

from logical_complex0

group by 1, 2;

  was:
In this case, a view can be created that hides some nastier syntax like a 
"union".  

A materialized view can contain the view with a simple query.  So if the end 
query just uses the view, it should rewrite to the materialized view.  A script 
is attached.

An example of this:

create view logical_complex0 as

with t as

(select c1 as a, c2 as b from tab1 where c2 in (select f from logical_simple 
where g > 0)

union

select tab3.c1 as c, tab4.c2 as d from tab3, tab4 where tab3.c2 = tab4.c2)

select a, b

from t;

 

-- query separator

 

create materialized view aview_complex0 stored as orc as

select a as x, b as y, count(*)

from logical_complex0

group by 1, 2;


> Materialized view rewriting does not support `UNION` operator, exact match 
> can work under view
> --
>
> Key: HIVE-22260
> URL: https://issues.apache.org/jira/browse/HIVE-22260
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: complex0.sql
>
>
> In this case, a view can be created that hides some nastier syntax like a 
> "union".  
> A materialized view can contain the view with a simple query.  So if the end 
> query just uses the view, it should rewrite to the materialized view. 
> Furthermore, an exception is thrown when it contains the "union" while 
> creating the view.  At a minimum, we should print a friendlier message when 
> the rewrite fails.
> A script is attached.
> An example of this:
> create view logical_complex0 as
> with t as
> (select c1 as a, c2 as b from tab1 where c2 in (select f from logical_simple 
> where g > 0)
> union
> select tab3.c1 as c, tab4.c2 as d from tab3, tab4 where tab3.c2 = tab4.c2)
> select a, b
> from t;
>  
> – query separator
>  
> create materialized view aview_complex0 stored as orc as
> select a as x, b as y, count(*)
> from logical_complex0
> group by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22257) Commutativity of operations is not taken into account, e.g., '+'

2019-09-27 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated HIVE-22257:

Attachment: expr9.sql

> Commutativity of operations is not taken into account, e.g., '+'
> 
>
> Key: HIVE-22257
> URL: https://issues.apache.org/jira/browse/HIVE-22257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: expr9.sql
>
>
> ...as stated in subject.  Script to reproduce is attached.
> Query and materialized view are as follows:
> create materialized view view5 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id + prod_id > 1 + 
> 2);
> explain extended select  prod_id, cust_id  from sales where prod_id + cust_id 
> > 1 + 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains all base files

2019-09-27 Thread Rajkumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh updated HIVE-22255:
--
Description: 
user may run into the issue if the table consists of all base files but no 
delta, then the following condition will yield false and automatic major 
compaction will be skipped.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]

 

Steps to Reproduce:
 # create Acid table 
{code:java}
//  create table myacid(id int);
{code}

 # Run multiple insert table 
{code:java}
// insert overwrite table myacid values(1);insert overwrite table myacid 
values(2),(3),(4){code}

 # DFS ls output
{code:java}
// dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
++
|                     DFS Output                     |
++
| drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001 |
| -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
| -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
| drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002 |
| -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
| -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
++{code}
 
you will see that Major compaction will not be trigger until you run alter 
table compact MAJOR.

  was:
user may run into the issue if the table consists of all base files but no 
delta, then the following condition will yield false and automatic major 
compaction will be skipped.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]

 

Steps to Reproduce:
 # create Acid table 
{code:java}
//  create table myacid(id int);
{code}

 # Run multiple insert table 
{code:java}
// insert overwrite table myacid values(1);insert overwrite table myacid 
values(2),(3),(4){code}

 # DFS ls output
{code:java}
// dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
++
|                     DFS Output                     |
++
| drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001 |
| -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
| -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
/warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
| drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002 |
| -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
| -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
/warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
++
{code}


> Hive don't trigger Major Compaction automatically if table contains all base 
> files 
> ---
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Priority: Major
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 

[jira] [Updated] (HIVE-22244) Added default ACLs for znodes on a non-kerberized cluster

2019-09-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-22244:
--
Attachment: HIVE-22244.4.patch

> Added default ACLs for znodes on a non-kerberized cluster
> -
>
> Key: HIVE-22244
> URL: https://issues.apache.org/jira/browse/HIVE-22244
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-22244.1.patch, HIVE-22244.2.patch, 
> HIVE-22244.3.patch, HIVE-22244.4.patch
>
>
> Set default ACLs for znodes on a non-kerberized cluster: 
> Create/Read/Delete/Write/Admin to the world



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21975) Fix incremental compilation

2019-09-27 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated HIVE-21975:

Attachment: HIVE-21975.3.patch

> Fix incremental compilation
> ---
>
> Key: HIVE-21975
> URL: https://issues.apache.org/jira/browse/HIVE-21975
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21975.1.patch, HIVE-21975.2.patch, 
> HIVE-21975.3.patch, HIVE-21975.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> we have an incremental compilation issue around SA ; mostly because of "? 
> extends Serializable"
> it could be reproduced with:
> {code}
> git clean -dfx
> mvn install -pl ql -am -DskipTests
> touch `find . -name Sema*A*java` `find . -name Task*Factory.java`
> mvn install -pl ql  -DskipTests
> {code}
> error is:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile 
> (default-compile) on project hive-exec: Compilation failure: Compilation 
> failure: 
> [ERROR] 
> /mnt/work/hwx/hive/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:[12573,60]
>  incompatible types: java.util.List extends java.io.Serializable>> cannot be converted to 
> java.util.List>
> [ERROR] 
> /mnt/work/hwx/hive/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:[15187,49]
>  incompatible types: java.util.List> 
> cannot be converted to java.util.List extends java.io.Serializable>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22209) Creating a materialized view with no tables should be handled more gracefully

2019-09-27 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated HIVE-22209:

Attachment: HIVE-22209.2.patch

> Creating a materialized view with no tables should be handled more gracefully
> -
>
> Key: HIVE-22209
> URL: https://issues.apache.org/jira/browse/HIVE-22209
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22209.1.patch, HIVE-22209.2.patch, HIVE-22209.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, materialized views without a table reference are not supported. 
> However, instead of printing a clear message about it, when a materialized 
> view is created without a table reference, we fail with an unclear message.
> {code}
> > create materialized view mv_test1 as select 5;
> (...)
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request 
> failed :
> INSERT INTO MV_TABLES_USED (MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) )
> INFO : Completed executing 
> command(queryId=hive_20190916203511_b609cccf-f5e3-45dd-abfd-6e869d94e39a); 
> Time taken: 10.469 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaExcep
> tion(message:Add request failed : INSERT INTO MV_TABLES_USED 
> (MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) ) (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-21924:

Attachment: HIVE-21924.2.patch
Status: Patch Available  (was: Open)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.2.patch, HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-21924:

Status: Open  (was: Patch Available)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.2.patch, HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22246) Beeline reflector should handle map types

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939685#comment-16939685
 ] 

Hive QA commented on HIVE-22246:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981503/HIVE-22246.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17009 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18763/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18763/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18763/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981503 - PreCommit-HIVE-Build

> Beeline reflector should handle map types
> -
>
> Key: HIVE-22246
> URL: https://issues.apache.org/jira/browse/HIVE-22246
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-22246.1.patch, HIVE-22246.2.patch
>
>
> Since beeline {{Reflector}} is not handling Map types, it ends up converting 
> values from {{beeline.properties}} to "null" and throws NPE with {{"}}beeline 
> --hivevar x=1 --hivevar y=1".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22252) Fix caught NullPointerExceptions generated during EXPLAIN

2019-09-27 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman updated HIVE-22252:

Attachment: HIVE-22252.1.patch

> Fix caught NullPointerExceptions generated during EXPLAIN
> -
>
> Key: HIVE-22252
> URL: https://issues.apache.org/jira/browse/HIVE-22252
> Project: Hive
>  Issue Type: Bug
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Minor
> Attachments: HIVE-22252.1.patch
>
>
> While debugging an issue I noticed that during EXPLAIN the following methods 
> throw a NullPointerException:
> VectorColumnOutputMapping::finalize
> AbstractOperatorDesc::getUserLevelStatistics
> AbstractOperatorDesc::getColumnExprMapForExplain
> The exceptions do end up getting caught but we should add null checks and 
> gracefully to be less wasteful and to aid future debugging.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22252) Fix caught NullPointerExceptions generated during EXPLAIN

2019-09-27 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman updated HIVE-22252:

Status: Patch Available  (was: Open)

> Fix caught NullPointerExceptions generated during EXPLAIN
> -
>
> Key: HIVE-22252
> URL: https://issues.apache.org/jira/browse/HIVE-22252
> Project: Hive
>  Issue Type: Bug
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Minor
> Attachments: HIVE-22252.1.patch
>
>
> While debugging an issue I noticed that during EXPLAIN the following methods 
> throw a NullPointerException:
> VectorColumnOutputMapping::finalize
> AbstractOperatorDesc::getUserLevelStatistics
> AbstractOperatorDesc::getColumnExprMapForExplain
> The exceptions do end up getting caught but we should add null checks and 
> gracefully to be less wasteful and to aid future debugging.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22252) Fix caught NullPointerExceptions generated during EXPLAIN

2019-09-27 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman reassigned HIVE-22252:
---


> Fix caught NullPointerExceptions generated during EXPLAIN
> -
>
> Key: HIVE-22252
> URL: https://issues.apache.org/jira/browse/HIVE-22252
> Project: Hive
>  Issue Type: Bug
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Minor
>
> While debugging an issue I noticed that during EXPLAIN the following methods 
> throw a NullPointerException:
> VectorColumnOutputMapping::finalize
> AbstractOperatorDesc::getUserLevelStatistics
> AbstractOperatorDesc::getColumnExprMapForExplain
> The exceptions do end up getting caught but we should add null checks and 
> gracefully to be less wasteful and to aid future debugging.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22246) Beeline reflector should handle map types

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939635#comment-16939635
 ] 

Hive QA commented on HIVE-22246:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} beeline in master has 48 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18763/dev-support/hive-personality.sh
 |
| git revision | master / b53521a |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: beeline U: beeline |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18763/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Beeline reflector should handle map types
> -
>
> Key: HIVE-22246
> URL: https://issues.apache.org/jira/browse/HIVE-22246
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-22246.1.patch, HIVE-22246.2.patch
>
>
> Since beeline {{Reflector}} is not handling Map types, it ends up converting 
> values from {{beeline.properties}} to "null" and throws NPE with {{"}}beeline 
> --hivevar x=1 --hivevar y=1".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22163) CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled

2019-09-27 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-22163:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks, Krisztian!

> CBO: Enabling CBO turns on stats estimation, even when the estimation is 
> disabled
> -
>
> Key: HIVE-22163
> URL: https://issues.apache.org/jira/browse/HIVE-22163
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Gopal Vijayaraghavan
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22163.1.patch, HIVE-22163.1.patch, 
> HIVE-22163.1.patch, HIVE-22163.2.patch, HIVE-22163.3.patch, 
> HIVE-22163.4.patch, HIVE-22163.4.patch, HIVE-22163.5.patch, 
> HIVE-22163.5.patch, HIVE-22163.5.patch, HIVE-22163.5.patch, 
> HIVE-22163.5.patch, HIVE-22163.5.patch, HIVE-22163.5.patch
>
>
> {code}
> create table claims(claim_rec_id bigint, claim_invoice_num string, typ_c int);
> alter table claims update statistics set 
> ('numRows'='1154941534','rawDataSize'='1135307527922');
> set hive.stats.estimate=false;
> explain extended select count(1) from claims where typ_c=3;
> set hive.stats.ndv.estimate.percent=5e-7;
> explain extended select count(1) from claims where typ_c=3;
> {code}
> Expecting the standard /2 for the single filter, but we instead get 5 rows.
> {code}
> 'Map Operator Tree:'
> 'TableScan'
> '  alias: claims'
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 5 Data size: 19 Basic stats: 
> COMPLETE Column stats: NONE'
> {code}
> The estimation is in effect, as changing the estimate.percent changes this.
> {code}
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 230988307 Data size: 877755567 
> Basic stats: COMPLETE Column stats: NONE'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22235) CommandProcessorResponse should not be an exception

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939619#comment-16939619
 ] 

Hive QA commented on HIVE-22235:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981517/HIVE-22235.02.patch

{color:green}SUCCESS:{color} +1 due to 73 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1007 failed/errored test(s), 16782 tests 
executed
*Failed tests:*
{noformat}
TestAuthzApiEmbedAuthorizerInRemote - did not produce a TEST-*.xml file (likely 
timed out) (batchId=246)
TestCLIAuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=286)
TestCreateUdfEntities - did not produce a TEST-*.xml file (likely timed out) 
(batchId=246)
TestDDLWithRemoteMetastoreSecondNamenode - did not produce a TEST-*.xml file 
(likely timed out) (batchId=246)
TestDFSErrorHandling - did not produce a TEST-*.xml file (likely timed out) 
(batchId=281)
TestHiveMetaStoreAlterColumnPar - did not produce a TEST-*.xml file (likely 
timed out) (batchId=246)
TestHiveProtoEventsCleanerTask - did not produce a TEST-*.xml file (likely 
timed out) (batchId=246)
TestHs2Hooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=246)
TestJdbcDriver2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=284)
TestJdbcGenericUDTFGetSplits - did not produce a TEST-*.xml file (likely timed 
out) (batchId=281)
TestJdbcGenericUDTFGetSplits2 - did not produce a TEST-*.xml file (likely timed 
out) (batchId=281)
TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file (likely timed 
out) (batchId=286)
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=286)
TestJdbcWithMiniHS2ErasureCoding - did not produce a TEST-*.xml file (likely 
timed out) (batchId=286)
TestJdbcWithMiniLlapArrow - did not produce a TEST-*.xml file (likely timed 
out) (batchId=283)
TestJdbcWithSQLAuthUDFBlacklist - did not produce a TEST-*.xml file (likely 
timed out) (batchId=286)
TestJdbcWithSQLAuthorization - did not produce a TEST-*.xml file (likely timed 
out) (batchId=286)
TestMetaStoreLimitPartitionRequest - did not produce a TEST-*.xml file (likely 
timed out) (batchId=246)
TestMultiSessionsHS2WithLocalClusterSpark - did not produce a TEST-*.xml file 
(likely timed out) (batchId=286)
TestNoSaslAuth - did not produce a TEST-*.xml file (likely timed out) 
(batchId=286)
TestRestrictedList - did not produce a TEST-*.xml file (likely timed out) 
(batchId=283)
TestServiceDiscovery - did not produce a TEST-*.xml file (likely timed out) 
(batchId=283)
TestTriggersMoveWorkloadManager - did not produce a TEST-*.xml file (likely 
timed out) (batchId=283)
TestTriggersNoTezSessionPool - did not produce a TEST-*.xml file (likely timed 
out) (batchId=281)
TestTriggersTezSessionPoolManager - did not produce a TEST-*.xml file (likely 
timed out) (batchId=283)
TestXSRFFilter - did not produce a TEST-*.xml file (likely timed out) 
(batchId=281)
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[case_with_row_sequence]
 (batchId=295)
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[invalid_row_sequence]
 (batchId=295)
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[serde_regex]
 (batchId=295)
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[udtf_explode2]
 (batchId=295)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=296)
org.apache.hadoop.hive.cli.TestKuduNegativeCliDriver.testCliDriver[kudu_config] 
(batchId=289)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[retry_failure]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[retry_failure_oom]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[retry_failure_reorder]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[retry_failure_stat_changes]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_retry_failure]
 (batchId=179)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[add_partition_with_whitelist]
 (batchId=103)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[addpart1] 
(batchId=101)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[allow_change_col_type_par_neg]
 (batchId=101)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_external_acid]
 (batchId=101)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_external_with_default_constraint]
 (batchId=101)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_external_with_notnull_constraint]
 (batchId=101)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_file_format]
 (batchId=102)

[jira] [Commented] (HIVE-22235) CommandProcessorResponse should not be an exception

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939611#comment-16939611
 ] 

Hive QA commented on HIVE-22235:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
40s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
2s{color} | {color:blue} ql in master has 1566 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
41s{color} | {color:blue} service in master has 49 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} cli in master has 9 extant Findbugs warnings. {color} 
|
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} hcatalog/core in master has 36 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} hcatalog/hcatalog-pig-adapter in master has 2 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} hcatalog/webhcat/java-client in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
40s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} itests/util in master has 53 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
1s{color} | {color:red} ql: The patch generated 83 new + 1747 unchanged - 149 
fixed = 1830 total (was 1896) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} service: The patch generated 2 new + 40 unchanged - 0 
fixed = 42 total (was 40) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} cli: The patch generated 0 new + 39 unchanged - 2 
fixed = 39 total (was 41) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} hcatalog/core: The patch generated 0 new + 78 
unchanged - 13 fixed = 78 total (was 91) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hcatalog/hcatalog-pig-adapter: The patch generated 4 
new + 186 unchanged - 8 fixed = 190 total (was 194) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} The patch server-extensions passed checkstyle 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} 

[jira] [Updated] (HIVE-22241) Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar

2019-09-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22241:
---
Attachment: HIVE-22241.02.patch

> Implement UDF to interpret date/timestamp using its internal representation 
> and Gregorian-Julian hybrid calendar
> 
>
> Key: HIVE-22241
> URL: https://issues.apache.org/jira/browse/HIVE-22241
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22241.01.patch, HIVE-22241.02.patch, 
> HIVE-22241.02.patch, HIVE-22241.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF that converts a date/timestamp to new *proleptic Gregorian calendar* (ISO 
> 8601 standard), which is produced by extending the Gregorian calendar 
> backward to dates preceding its official introduction in 1582, assuming that 
> its internal days/milliseconds since epoch is calculated using legacy 
> *Gregorian-Julian hybrid* calendar, i.e., calendar that supports both the 
> Julian and Gregorian calendar systems with the support of a single 
> discontinuity, which corresponds by default to the Gregorian date when the 
> Gregorian calendar was instituted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22234) Hive replication fails with table already exist error when replicating from old version of hive.

2019-09-27 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22234:
---
Status: Patch Available  (was: Open)

> Hive replication fails with table already exist error when replicating from 
> old version of hive.
> 
>
> Key: HIVE-22234
> URL: https://issues.apache.org/jira/browse/HIVE-22234
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22234.01.patch, HIVE-22234.02.patch, 
> HIVE-22234.03.patch, HIVE-22234.04.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIve replication from old version where HIVE-22046 is not patched will not 
> have engine column set in the table column stats. This causes "ERROR: null 
> value in column "ENGINE" violates not-null constraint" error during create 
> table while updating the column stats. As the column stats are updated after 
> the create table txn is committed, the next retry by HMS client throws table 
> already exist error. Need to update the ENGINE column to default value while 
> importing the table if the column value is not set. The column stat and 
> create table in same txn can be done as part of separate Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22234) Hive replication fails with table already exist error when replicating from old version of hive.

2019-09-27 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22234:
---
Attachment: HIVE-22234.04.patch

> Hive replication fails with table already exist error when replicating from 
> old version of hive.
> 
>
> Key: HIVE-22234
> URL: https://issues.apache.org/jira/browse/HIVE-22234
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22234.01.patch, HIVE-22234.02.patch, 
> HIVE-22234.03.patch, HIVE-22234.04.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIve replication from old version where HIVE-22046 is not patched will not 
> have engine column set in the table column stats. This causes "ERROR: null 
> value in column "ENGINE" violates not-null constraint" error during create 
> table while updating the column stats. As the column stats are updated after 
> the create table txn is committed, the next retry by HMS client throws table 
> already exist error. Need to update the ENGINE column to default value while 
> importing the table if the column value is not set. The column stat and 
> create table in same txn can be done as part of separate Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22234) Hive replication fails with table already exist error when replicating from old version of hive.

2019-09-27 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22234:
---
Status: Open  (was: Patch Available)

> Hive replication fails with table already exist error when replicating from 
> old version of hive.
> 
>
> Key: HIVE-22234
> URL: https://issues.apache.org/jira/browse/HIVE-22234
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22234.01.patch, HIVE-22234.02.patch, 
> HIVE-22234.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIve replication from old version where HIVE-22046 is not patched will not 
> have engine column set in the table column stats. This causes "ERROR: null 
> value in column "ENGINE" violates not-null constraint" error during create 
> table while updating the column stats. As the column stats are updated after 
> the create table txn is committed, the next retry by HMS client throws table 
> already exist error. Need to update the ENGINE column to default value while 
> importing the table if the column value is not set. The column stat and 
> create table in same txn can be done as part of separate Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22244) Added default ACLs for znodes on a non-kerberized cluster

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939553#comment-16939553
 ] 

Hive QA commented on HIVE-22244:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981496/HIVE-22244.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17009 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.llap.cache.TestBuddyAllocator.testMTT[2] (batchId=363)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18761/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18761/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18761/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981496 - PreCommit-HIVE-Build

> Added default ACLs for znodes on a non-kerberized cluster
> -
>
> Key: HIVE-22244
> URL: https://issues.apache.org/jira/browse/HIVE-22244
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-22244.1.patch, HIVE-22244.2.patch, 
> HIVE-22244.3.patch
>
>
> Set default ACLs for znodes on a non-kerberized cluster: 
> Create/Read/Delete/Write/Admin to the world



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Jay Green-Stevens (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Green-Stevens updated HIVE-22249:
-
Attachment: HIVE-22249.branch-2.3.patch
Status: Patch Available  (was: Open)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
> Attachments: HIVE-22249.branch-2.3.patch
>
>
> HIVE-8838 added Parquet support to HCatalog for Hive 3.0.0. We would like to 
> backport this functionality to Hive 2.x (primarily 2.3.x) for users who are 
> currently unable to migrate to Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Jay Green-Stevens (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Green-Stevens updated HIVE-22249:
-
Attachment: (was: HIVE-22249.branch-2.3.patch)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
> Attachments: HIVE-22249.branch-2.3.patch
>
>
> HIVE-8838 added Parquet support to HCatalog for Hive 3.0.0. We would like to 
> backport this functionality to Hive 2.x (primarily 2.3.x) for users who are 
> currently unable to migrate to Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Mass Dosage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mass Dosage updated HIVE-22249:
---
Target Version/s: 2.4.0, 2.3.7

> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
> Attachments: HIVE-22249.branch-2.3.patch
>
>
> HIVE-8838 added Parquet support to HCatalog for Hive 3.0.0. We would like to 
> backport this functionality to Hive 2.x (primarily 2.3.x) for users who are 
> currently unable to migrate to Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22244) Added default ACLs for znodes on a non-kerberized cluster

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939513#comment-16939513
 ] 

Hive QA commented on HIVE-22244:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
10s{color} | {color:blue} standalone-metastore/metastore-server in master has 
170 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18761/dev-support/hive-personality.sh
 |
| git revision | master / 6ca8397 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: standalone-metastore/metastore-server U: 
standalone-metastore/metastore-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18761/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Added default ACLs for znodes on a non-kerberized cluster
> -
>
> Key: HIVE-22244
> URL: https://issues.apache.org/jira/browse/HIVE-22244
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-22244.1.patch, HIVE-22244.2.patch, 
> HIVE-22244.3.patch
>
>
> Set default ACLs for znodes on a non-kerberized cluster: 
> Create/Read/Delete/Write/Admin to the world



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Mass Dosage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mass Dosage updated HIVE-22249:
---
Description: 
HIVE-8838 added Parquet support to HCatalog for Hive 3.0.0. We would like to 
backport this functionality to Hive 2.x (primarily 2.3.x) for users who are 
currently unable to migrate to Hive 3.


  was:
Want to add a patch to hive version 2.3.x to support parquet.

 

Relevant to previous ticket: HIVE-8838.


> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
> Attachments: HIVE-22249.branch-2.3.patch
>
>
> HIVE-8838 added Parquet support to HCatalog for Hive 3.0.0. We would like to 
> backport this functionality to Hive 2.x (primarily 2.3.x) for users who are 
> currently unable to migrate to Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22250) Describe function does not provide description for rank functions

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22250:
--
Attachment: HIVE-22250.1.patch

> Describe function does not provide description for rank functions
> -
>
> Key: HIVE-22250
> URL: https://issues.apache.org/jira/browse/HIVE-22250
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22250.1.patch
>
>
> {code}
> DESC FUNCTION dense_rank;
> {code}
> {code}
> PREHOOK: query: DESC FUNCTION dense_rank
> PREHOOK: type: DESCFUNCTION
> POSTHOOK: query: DESC FUNCTION dense_rank
> POSTHOOK: type: DESCFUNCTION
> There is no documentation for function 'dense_rank'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22250) Describe function does not provide description for rank functions

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22250:
--
Assignee: Krisztian Kasa
  Status: Patch Available  (was: Open)

> Describe function does not provide description for rank functions
> -
>
> Key: HIVE-22250
> URL: https://issues.apache.org/jira/browse/HIVE-22250
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22250.1.patch
>
>
> {code}
> DESC FUNCTION dense_rank;
> {code}
> {code}
> PREHOOK: query: DESC FUNCTION dense_rank
> PREHOOK: type: DESCFUNCTION
> POSTHOOK: query: DESC FUNCTION dense_rank
> POSTHOOK: type: DESCFUNCTION
> There is no documentation for function 'dense_rank'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21884) Scheduled query support

2019-09-27 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21884:

Attachment: HIVE-21884.20.patch

> Scheduled query support
> ---
>
> Key: HIVE-21884
> URL: https://issues.apache.org/jira/browse/HIVE-21884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21844.04.patch, HIVE-21844.05.patch, 
> HIVE-21844.06.patch, HIVE-21844.07.patch, HIVE-21844.08.patch, 
> HIVE-21844.09.patch, HIVE-21844.15.patch, HIVE-21844.19.patch, 
> HIVE-21884.01.patch, HIVE-21884.02.patch, HIVE-21884.03.patch, 
> HIVE-21884.09.patch, HIVE-21884.10.patch, HIVE-21884.10.patch, 
> HIVE-21884.11.patch, HIVE-21884.12.patch, HIVE-21884.13.patch, 
> HIVE-21884.14.patch, HIVE-21884.14.patch, HIVE-21884.14.patch, 
> HIVE-21884.16.patch, HIVE-21884.17.patch, HIVE-21884.18.patch, 
> HIVE-21884.20.patch, Scheduled queries2.pdf
>
>
> design document:
> https://docs.google.com/document/d/1mJSFdJi_1cbxJTXC9QvGw2rQ3zzJkNfxOO6b5esmyCE/edit#
> in case the google doc is not reachable:  [^Scheduled queries2.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22234) Hive replication fails with table already exist error when replicating from old version of hive.

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939498#comment-16939498
 ] 

Hive QA commented on HIVE-22234:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
20s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
40s{color} | {color:blue} standalone-metastore/metastore-common in master has 
32 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
12s{color} | {color:blue} standalone-metastore/metastore-server in master has 
170 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
15s{color} | {color:blue} ql in master has 1566 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch metastore-common passed checkstyle {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} standalone-metastore/metastore-server: The patch 
generated 0 new + 403 unchanged - 1 fixed = 403 total (was 404) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} The patch ql passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 3s{color} | {color:green} root: The patch generated 0 new + 417 unchanged - 1 
fixed = 417 total (was 418) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18760/dev-support/hive-personality.sh
 |
| git revision | master / 6ca8397 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18760/yetus/whitespace-tabs.txt
 |
| modules | C: standalone-metastore/metastore-common 
standalone-metastore/metastore-server ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18760/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive replication fails with table already exist error when replicating from 
> old version of hive.
> 
>
> Key: HIVE-22234
> URL: 

[jira] [Commented] (HIVE-22234) Hive replication fails with table already exist error when replicating from old version of hive.

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939497#comment-16939497
 ] 

Hive QA commented on HIVE-22234:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981509/HIVE-22234.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 17010 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[import_exported_table]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[import_exported_table]
 (batchId=195)
org.apache.hadoop.hive.metastore.TestObjectStore.catalogs (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testDatabaseOps (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testDeprecatedConfigIsOverwritten
 (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSQLDropParitionsCleanup
 (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSQLDropPartitionsCacheCrossSession
 (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSqlErrorMetrics 
(batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testEmptyTrustStoreProps 
(batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testMasterKeyOps (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testMaxEventResponse 
(batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testPartitionOps (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testQueryCloseOnError 
(batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testRoleOps (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testTableOps (batchId=233)
org.apache.hadoop.hive.metastore.TestObjectStore.testUseSSLProperty 
(batchId=233)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18760/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18760/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18760/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981509 - PreCommit-HIVE-Build

> Hive replication fails with table already exist error when replicating from 
> old version of hive.
> 
>
> Key: HIVE-22234
> URL: https://issues.apache.org/jira/browse/HIVE-22234
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22234.01.patch, HIVE-22234.02.patch, 
> HIVE-22234.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIve replication from old version where HIVE-22046 is not patched will not 
> have engine column set in the table column stats. This causes "ERROR: null 
> value in column "ENGINE" violates not-null constraint" error during create 
> table while updating the column stats. As the column stats are updated after 
> the create table txn is committed, the next retry by HMS client throws table 
> already exist error. Need to update the ENGINE column to default value while 
> importing the table if the column value is not set. The column stat and 
> create table in same txn can be done as part of separate Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Attachment: HIVE-21449.7.patch

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Status: Open  (was: Patch Available)

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Status: Patch Available  (was: Open)

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22250) Describe function does not provide description for rank functions

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22250:
--
Description: 
{code}
DESC FUNCTION dense_rank;
{code}

{code}
PREHOOK: query: DESC FUNCTION dense_rank
PREHOOK: type: DESCFUNCTION
POSTHOOK: query: DESC FUNCTION dense_rank
POSTHOOK: type: DESCFUNCTION
There is no documentation for function 'dense_rank'
{code}

> Describe function does not provide description for rank functions
> -
>
> Key: HIVE-22250
> URL: https://issues.apache.org/jira/browse/HIVE-22250
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Priority: Minor
> Fix For: 4.0.0
>
>
> {code}
> DESC FUNCTION dense_rank;
> {code}
> {code}
> PREHOOK: query: DESC FUNCTION dense_rank
> PREHOOK: type: DESCFUNCTION
> POSTHOOK: query: DESC FUNCTION dense_rank
> POSTHOOK: type: DESCFUNCTION
> There is no documentation for function 'dense_rank'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Component/s: UDF

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22240) Function percentile_cont fails when array parameter passed

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22240:
--
Component/s: UDF

> Function percentile_cont fails when array parameter passed
> --
>
> Key: HIVE-22240
> URL: https://issues.apache.org/jira/browse/HIVE-22240
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> {code}
> SELECT
> percentile_cont(array(0.2, 0.5, 0.9)) WITHIN GROUP (ORDER BY value)
> FROM t_test;
> {code}
> hive.log:
> {code}
> 2019-09-24T21:00:43,203 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:793)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
>   ... 11 more
> Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast 
> to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileCont$PercentileContEvaluator.iterate(GenericUDAFPercentileCont.java:259)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:214)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:639)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:814)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:720)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:788)
>   ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Component/s: Parser

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22137) Implement alter/rename partition related methods on temporary tables

2019-09-27 Thread Laszlo Pinter (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-22137:
-
Attachment: HIVE-22137.04.patch

> Implement alter/rename partition related methods on temporary tables
> 
>
> Key: HIVE-22137
> URL: https://issues.apache.org/jira/browse/HIVE-22137
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-22137.01.patch, HIVE-22137.02.patch, 
> HIVE-22137.03.patch, HIVE-22137.04.patch
>
>
> IMetaStoreClient exposes the following methods related to altering of 
> partitions:
> {code:java}
> void alter_partition(String dbName, String tblName, Partition newPart);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart);
> void alter_partition(String dbName, String tblName, Partition newPart, 
> EnvironmentContext environmentContext);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart, EnvironmentContext environmentContext, String writeIdList);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart, EnvironmentContext environmentContext);
> void alter_partitions(String dbName, String tblName, List 
> newParts);
> void alter_partitions(String dbName, String tblName, List 
> newParts, EnvironmentContext environmentContext);
> void alter_partitions(String dbName, String tblName, List 
> newParts, EnvironmentContext environmentContext,String writeIdList, long 
> writeId);
> void alter_partitions(String catName, String dbName, String tblName, 
> List newParts);
> void alter_partitions(String catName, String dbName, String tblName, 
> List newParts, EnvironmentContext environmentContext, String 
> writeIdList, long writeId);
> void renamePartition(final String dbname, final String tableName, final 
> List part_vals, final Partition newPart);
> void renamePartition(String catName, String dbname, String tableName, 
> List part_vals, Partition newPart, String validWriteIds){code}
> These should be implemented, in order to completely support partition on 
> temporary tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-7145) Remove dependence on apache commons-lang

2019-09-27 Thread David Lavati (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939446#comment-16939446
 ] 

David Lavati commented on HIVE-7145:


Thrift switched to commons-lang3 in 0.9.1 according to THRIFT-1956, and we're 
using 0.9.3-1, so I'm giving this another shot.

> Remove dependence on apache commons-lang
> 
>
> Key: HIVE-7145
> URL: https://issues.apache.org/jira/browse/HIVE-7145
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: David Lavati
>Priority: Major
>
> We currently depend on both Apache commons-lang and commons-lang3. They are 
> the same project, just at version 2.x vs 3.x. I propose that we move all of 
> the references in Hive to commons-lang3 and remove the v2 usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-7145) Remove dependence on apache commons-lang

2019-09-27 Thread David Lavati (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7145 started by David Lavati.
--
> Remove dependence on apache commons-lang
> 
>
> Key: HIVE-7145
> URL: https://issues.apache.org/jira/browse/HIVE-7145
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: David Lavati
>Priority: Major
>
> We currently depend on both Apache commons-lang and commons-lang3. They are 
> the same project, just at version 2.x vs 3.x. I propose that we move all of 
> the references in Hive to commons-lang3 and remove the v2 usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Status: Patch Available  (was: Open)

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-7145) Remove dependence on apache commons-lang

2019-09-27 Thread David Lavati (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati reassigned HIVE-7145:
--

Assignee: David Lavati

> Remove dependence on apache commons-lang
> 
>
> Key: HIVE-7145
> URL: https://issues.apache.org/jira/browse/HIVE-7145
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: David Lavati
>Priority: Major
>
> We currently depend on both Apache commons-lang and commons-lang3. They are 
> the same project, just at version 2.x vs 3.x. I propose that we move all of 
> the references in Hive to commons-lang3 and remove the v2 usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Attachment: HIVE-21449.7.patch

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21449) implement 'WITHIN GROUP' clause

2019-09-27 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-21449:
--
Status: Open  (was: Patch Available)

> implement 'WITHIN GROUP' clause
> ---
>
> Key: HIVE-21449
> URL: https://issues.apache.org/jira/browse/HIVE-21449
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21449.1.patch, HIVE-21449.2.patch, 
> HIVE-21449.3.patch, HIVE-21449.4.patch, HIVE-21449.5.patch, 
> HIVE-21449.5.patch, HIVE-21449.6.patch, HIVE-21449.6.patch, 
> HIVE-21449.6.patch, HIVE-21449.6.patch, HIVE-21449.7.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22245) Make qtest feature parser reuseable

2019-09-27 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939435#comment-16939435
 ] 

Zoltan Haindrich commented on HIVE-22245:
-

[~abstractdog] Could you please take a look?

> Make qtest feature parser reuseable
> ---
>
> Key: HIVE-22245
> URL: https://issues.apache.org/jira/browse/HIVE-22245
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22245.01.patch, HIVE-22245.02.patch, 
> HIVE-22245.03.patch, HIVE-22245.03.patch, HIVE-22245.04.patch
>
>
> right now we have a parser for {{--! qt:dataset}} ; to enable further 
> addition of things (I would like to run scheduled query service for some 
> qtests )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Jay Green-Stevens (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Green-Stevens updated HIVE-22249:
-
Attachment: HIVE-22249.branch-2.3.patch

> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
> Attachments: HIVE-22249.branch-2.3.patch
>
>
> Want to add a patch to hive version 2.3.x to support parquet.
>  
> Relevant to previous ticket: HIVE-8838.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Jay Green-Stevens (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Green-Stevens updated HIVE-22249:
-
Description: 
Want to add a patch to hive version 2.3.x to support parquet.

 

Relevant to previous ticket: HIVE-8838.

  was:
Want to add a patch to hive version 2.3.x to support parquet.

[HIVE-8838|https://issues.apache.org/jira/browse/HIVE-8838#].


> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
>
> Want to add a patch to hive version 2.3.x to support parquet.
>  
> Relevant to previous ticket: HIVE-8838.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22249) Support Parquet through HCatalog

2019-09-27 Thread Jay Green-Stevens (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Green-Stevens reassigned HIVE-22249:



> Support Parquet through HCatalog
> 
>
> Key: HIVE-22249
> URL: https://issues.apache.org/jira/browse/HIVE-22249
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jay Green-Stevens
>Assignee: Jay Green-Stevens
>Priority: Major
> Fix For: 2.3.6
>
>
> Want to add a patch to hive version 2.3.x to support parquet.
> [HIVE-8838|https://issues.apache.org/jira/browse/HIVE-8838#].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939419#comment-16939419
 ] 

Hive QA commented on HIVE-21924:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981504/HIVE-21924.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17010 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[skiphf_aggr]
 (batchId=179)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18759/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18759/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18759/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981504 - PreCommit-HIVE-Build

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22137) Implement alter/rename partition related methods on temporary tables

2019-09-27 Thread Laszlo Pinter (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-22137:
-
Attachment: HIVE-22137.03.patch

> Implement alter/rename partition related methods on temporary tables
> 
>
> Key: HIVE-22137
> URL: https://issues.apache.org/jira/browse/HIVE-22137
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-22137.01.patch, HIVE-22137.02.patch, 
> HIVE-22137.03.patch
>
>
> IMetaStoreClient exposes the following methods related to altering of 
> partitions:
> {code:java}
> void alter_partition(String dbName, String tblName, Partition newPart);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart);
> void alter_partition(String dbName, String tblName, Partition newPart, 
> EnvironmentContext environmentContext);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart, EnvironmentContext environmentContext, String writeIdList);
> void alter_partition(String catName, String dbName, String tblName, Partition 
> newPart, EnvironmentContext environmentContext);
> void alter_partitions(String dbName, String tblName, List 
> newParts);
> void alter_partitions(String dbName, String tblName, List 
> newParts, EnvironmentContext environmentContext);
> void alter_partitions(String dbName, String tblName, List 
> newParts, EnvironmentContext environmentContext,String writeIdList, long 
> writeId);
> void alter_partitions(String catName, String dbName, String tblName, 
> List newParts);
> void alter_partitions(String catName, String dbName, String tblName, 
> List newParts, EnvironmentContext environmentContext, String 
> writeIdList, long writeId);
> void renamePartition(final String dbname, final String tableName, final 
> List part_vals, final Partition newPart);
> void renamePartition(String catName, String dbname, String tableName, 
> List part_vals, Partition newPart, String validWriteIds){code}
> These should be implemented, in order to completely support partition on 
> temporary tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939411#comment-16939411
 ] 

Hive QA commented on HIVE-21924:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
47s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
26s{color} | {color:blue} ql in master has 1566 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
19s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 2 new + 17 unchanged - 0 fixed 
= 19 total (was 17) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
3s{color} | {color:red} root: The patch generated 2 new + 17 unchanged - 0 
fixed = 19 total (was 17) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18759/dev-support/hive-personality.sh
 |
| git revision | master / 6ca8397 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18759/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18759/yetus/diff-checkstyle-root.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18759/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18759/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = 

[jira] [Updated] (HIVE-22097) Incompatible java.util.ArrayList for java 11

2019-09-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22097:
-
Status: Patch Available  (was: Open)

> Incompatible java.util.ArrayList for java 11
> 
>
> Key: HIVE-22097
> URL: https://issues.apache.org/jira/browse/HIVE-22097
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 3.1.1, 3.0.0
>Reporter: Yuming Wang
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HIVE-22097.1.patch, JDK1.8.png, JDK11.png
>
>
> {noformat}
> export JAVA_HOME=/usr/lib/jdk-11.0.3
> export PATH=${JAVA_HOME}/bin:${PATH}
> hive> create table t(id int);
> Time taken: 0.035 seconds
> hive> insert into t values(1);
> Query ID = root_20190811155400_7c0e0494-eecb-4c54-a9fd-942ab52a0794
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.lang.NoSuchFieldException: parentOffset
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384)
>   ... 29 more
> Job Submission failed with exception 
> 'java.lang.RuntimeException(java.lang.NoSuchFieldException: parentOffset)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.lang.NoSuchFieldException: 
> parentOffset
> {noformat}
> The reason is Java removed {{parentOffset}}:
>  !JDK1.8.png! 
>  !JDK11.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22097) Incompatible java.util.ArrayList for java 11

2019-09-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22097:
-
Attachment: HIVE-22097.1.patch

> Incompatible java.util.ArrayList for java 11
> 
>
> Key: HIVE-22097
> URL: https://issues.apache.org/jira/browse/HIVE-22097
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Yuming Wang
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HIVE-22097.1.patch, JDK1.8.png, JDK11.png
>
>
> {noformat}
> export JAVA_HOME=/usr/lib/jdk-11.0.3
> export PATH=${JAVA_HOME}/bin:${PATH}
> hive> create table t(id int);
> Time taken: 0.035 seconds
> hive> insert into t values(1);
> Query ID = root_20190811155400_7c0e0494-eecb-4c54-a9fd-942ab52a0794
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.lang.NoSuchFieldException: parentOffset
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384)
>   ... 29 more
> Job Submission failed with exception 
> 'java.lang.RuntimeException(java.lang.NoSuchFieldException: parentOffset)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.lang.NoSuchFieldException: 
> parentOffset
> {noformat}
> The reason is Java removed {{parentOffset}}:
>  !JDK1.8.png! 
>  !JDK11.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22097) Incompatible java.util.ArrayList for java 11

2019-09-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-22097:


Assignee: Attila Magyar

> Incompatible java.util.ArrayList for java 11
> 
>
> Key: HIVE-22097
> URL: https://issues.apache.org/jira/browse/HIVE-22097
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Yuming Wang
>Assignee: Attila Magyar
>Priority: Major
> Attachments: JDK1.8.png, JDK11.png
>
>
> {noformat}
> export JAVA_HOME=/usr/lib/jdk-11.0.3
> export PATH=${JAVA_HOME}/bin:${PATH}
> hive> create table t(id int);
> Time taken: 0.035 seconds
> hive> insert into t values(1);
> Query ID = root_20190811155400_7c0e0494-eecb-4c54-a9fd-942ab52a0794
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.lang.NoSuchFieldException: parentOffset
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384)
>   ... 29 more
> Job Submission failed with exception 
> 'java.lang.RuntimeException(java.lang.NoSuchFieldException: parentOffset)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.lang.NoSuchFieldException: 
> parentOffset
> {noformat}
> The reason is Java removed {{parentOffset}}:
>  !JDK1.8.png! 
>  !JDK11.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22241) Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939341#comment-16939341
 ] 

Hive QA commented on HIVE-22241:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981495/HIVE-22241.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 17012 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.llap.cache.TestBuddyAllocator.testMTT[2] (batchId=363)
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValues
 (batchId=225)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18758/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18758/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981495 - PreCommit-HIVE-Build

> Implement UDF to interpret date/timestamp using its internal representation 
> and Gregorian-Julian hybrid calendar
> 
>
> Key: HIVE-22241
> URL: https://issues.apache.org/jira/browse/HIVE-22241
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22241.01.patch, HIVE-22241.02.patch, 
> HIVE-22241.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF that converts a date/timestamp to new *proleptic Gregorian calendar* (ISO 
> 8601 standard), which is produced by extending the Gregorian calendar 
> backward to dates preceding its official introduction in 1582, assuming that 
> its internal days/milliseconds since epoch is calculated using legacy 
> *Gregorian-Julian hybrid* calendar, i.e., calendar that supports both the 
> Julian and Gregorian calendar systems with the support of a single 
> discontinuity, which corresponds by default to the Gregorian date when the 
> Gregorian calendar was instituted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22241) Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar

2019-09-27 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939318#comment-16939318
 ] 

Hive QA commented on HIVE-22241:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
15s{color} | {color:blue} ql in master has 1566 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 38 new + 83 unchanged - 0 
fixed = 121 total (was 83) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18758/dev-support/hive-personality.sh
 |
| git revision | master / 6ca8397 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18758/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18758/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Implement UDF to interpret date/timestamp using its internal representation 
> and Gregorian-Julian hybrid calendar
> 
>
> Key: HIVE-22241
> URL: https://issues.apache.org/jira/browse/HIVE-22241
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22241.01.patch, HIVE-22241.02.patch, 
> HIVE-22241.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF that converts a date/timestamp to new *proleptic Gregorian calendar* (ISO 
> 8601 standard), which is produced by extending the Gregorian calendar 
> backward to dates preceding its official introduction in 1582, assuming that 
> its internal days/milliseconds since epoch is calculated using legacy 
> *Gregorian-Julian hybrid* calendar, i.e., calendar that supports both the 
> Julian and Gregorian calendar systems with the support of a single 
> discontinuity, which corresponds by default to the Gregorian date when the 
> Gregorian calendar was instituted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21146) Enforce TransactionBatch size=1 for blob stores

2019-09-27 Thread David Lavati (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati reassigned HIVE-21146:
---

Assignee: David Lavati

> Enforce TransactionBatch size=1 for blob stores
> ---
>
> Key: HIVE-21146
> URL: https://issues.apache.org/jira/browse/HIVE-21146
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: David Lavati
>Priority: Major
>
> Streaming Ingest API supports a concept of {{TransactionBatch}} where N 
> transactions can be opened at once and the data in all of them will be 
> written to the same delta_x_y directory where each transaction in the batch 
> can be committed/aborted independently.  The implementation relies on 
> {{FSDataOutputStream.hflush()}} (called from OrcRecordUpdater}} which is 
> available on HDFS but is often implemented as no-op in Blob store backed 
> {{FileSystem}} objects.
> Need to add a check to {{HiveStreamingConnection()}} constructor to raise an 
> error if {{builder.transactionBatchSize > 1}} and the target table/partitions 
> are backed by something that doesn't support {{hflush()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-21146) Enforce TransactionBatch size=1 for blob stores

2019-09-27 Thread David Lavati (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21146 started by David Lavati.
---
> Enforce TransactionBatch size=1 for blob stores
> ---
>
> Key: HIVE-21146
> URL: https://issues.apache.org/jira/browse/HIVE-21146
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: David Lavati
>Priority: Major
>
> Streaming Ingest API supports a concept of {{TransactionBatch}} where N 
> transactions can be opened at once and the data in all of them will be 
> written to the same delta_x_y directory where each transaction in the batch 
> can be committed/aborted independently.  The implementation relies on 
> {{FSDataOutputStream.hflush()}} (called from OrcRecordUpdater}} which is 
> available on HDFS but is often implemented as no-op in Blob store backed 
> {{FileSystem}} objects.
> Need to add a check to {{HiveStreamingConnection()}} constructor to raise an 
> error if {{builder.transactionBatchSize > 1}} and the target table/partitions 
> are backed by something that doesn't support {{hflush()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread Sankar Hariappan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939247#comment-16939247
 ] 

Sankar Hariappan commented on HIVE-21924:
-

[~mustafaiman] 
Thanks for the patch!
I posted few comments in the PR. Please take a look.

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319437
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328983572
 
 

 ##
 File path: 
ql/src/test/queries/clientpositive/file_with_header_footer_aggregation.q
 ##
 @@ -0,0 +1,94 @@
+set hive.mapred.mode=nonstrict;
+
+dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir};
+dfs -copyFromLocal ../../data/files/header_footer_table_4  
${system:test.tmp.dir}/header_footer_table_4;
+
+CREATE TABLE numbrs (numbr int);
+INSERT INTO numbrs VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), 
(11), (12), (NULL);
+CREATE EXTERNAL TABLE header_footer_table_4 (header_int int, header_name 
string, header_choice varchar(10)) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
',' LOCATION '${system:test.tmp.dir}/header_footer_table_4' tblproperties 
("skip.header.line.count"="1", "skip.footer.line.count"="2");
+
+SELECT * FROM header_footer_table_4;
+
+SELECT * FROM header_footer_table_4 ORDER BY header_int LIMIT 8;
+
+-- should return nothing as title is correctly skipped
+SELECT * FROM header_footer_table_4 WHERE header_choice = 'header_choice';
 
 Review comment:
   Add a select query with filters that return valid rows. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319437)
Time Spent: 50m  (was: 40m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319436
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328918560
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+fis.seek(bufferSectionStart);
 
 Review comment:
   Redundant statement. It is again done within the loop.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319436)
Time Spent: 40m  (was: 0.5h)

> Split text files even if header/footer exists
> 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319430
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328918215
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
 
 Review comment:
   fis.close() needed in finally block.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319430)
Time Spent: 20m  (was: 10m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319439
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328983296
 
 

 ##
 File path: 
ql/src/test/queries/clientpositive/file_with_header_footer_aggregation.q
 ##
 @@ -0,0 +1,94 @@
+set hive.mapred.mode=nonstrict;
+
+dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir};
+dfs -copyFromLocal ../../data/files/header_footer_table_4  
${system:test.tmp.dir}/header_footer_table_4;
+
+CREATE TABLE numbrs (numbr int);
+INSERT INTO numbrs VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), 
(11), (12), (NULL);
+CREATE EXTERNAL TABLE header_footer_table_4 (header_int int, header_name 
string, header_choice varchar(10)) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
',' LOCATION '${system:test.tmp.dir}/header_footer_table_4' tblproperties 
("skip.header.line.count"="1", "skip.footer.line.count"="2");
 
 Review comment:
   Also, add tests with only header, only footer, file having only 
header+footer but no data rows. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319439)
Time Spent: 50m  (was: 40m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319432
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328913511
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
 
 Review comment:
   The logic to obtain start and length are duplicated in 2 methods. Can we 
have a private methods for it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319432)
Time Spent: 0.5h  (was: 20m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319443
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328978995
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+fis.seek(bufferSectionStart);
+while (bufferSectionEnd > bufferSectionStart) {
+  fis.seek(bufferSectionStart);
+  long pos = fis.getPos();
+  while (pos < bufferSectionEnd) {
+fis.readLine();
+pos = fis.getPos();
+if (pos <= bufferSectionEnd) {
+  if (lineEndBuffer.size() > footerCount) {
+lineEndBuffer.poll();
 
 Review comment:
   I think, there is a problem in this logic. The queue is expected to have 
positions added in descending order. But, let's say, if first batch doesn't 
make the queue full, then 2nd batch 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319433
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328916465
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
 
 Review comment:
   It should be footerCount + 1.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319433)
Time Spent: 0.5h  (was: 20m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
>

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319431
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328915560
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
 
 Review comment:
   This IOException handling doesn't seem right. If file read fails then better 
to throw exception and if doesn't have enough header rows, then we should set 
start index as EOF so that we return dummy split always.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319431)
Time Spent: 0.5h  (was: 20m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319440
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328964763
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+fis.seek(bufferSectionStart);
+while (bufferSectionEnd > bufferSectionStart) {
+  fis.seek(bufferSectionStart);
+  long pos = fis.getPos();
+  while (pos < bufferSectionEnd) {
+fis.readLine();
+pos = fis.getPos();
+if (pos <= bufferSectionEnd) {
+  if (lineEndBuffer.size() > footerCount) {
+lineEndBuffer.poll();
+  }
+  lineEndBuffer.add(pos);
+}
+  }
+  if (lineEndBuffer.size() > footerCount) {
+break;
+  } else {
+bufferSectionEnd = 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319444
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328983856
 
 

 ##
 File path: 
ql/src/test/queries/clientpositive/file_with_header_footer_aggregation.q
 ##
 @@ -0,0 +1,94 @@
+set hive.mapred.mode=nonstrict;
+
+dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir};
+dfs -copyFromLocal ../../data/files/header_footer_table_4  
${system:test.tmp.dir}/header_footer_table_4;
+
+CREATE TABLE numbrs (numbr int);
+INSERT INTO numbrs VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), 
(11), (12), (NULL);
+CREATE EXTERNAL TABLE header_footer_table_4 (header_int int, header_name 
string, header_choice varchar(10)) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
',' LOCATION '${system:test.tmp.dir}/header_footer_table_4' tblproperties 
("skip.header.line.count"="1", "skip.footer.line.count"="2");
+
+SELECT * FROM header_footer_table_4;
+
+SELECT * FROM header_footer_table_4 ORDER BY header_int LIMIT 8;
+
+-- should return nothing as title is correctly skipped
+SELECT * FROM header_footer_table_4 WHERE header_choice = 'header_choice';
 
 Review comment:
   Also, test count(*) as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319444)
Time Spent: 1h 20m  (was: 1h 10m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/967a1cc98beede8e6568ce750ebeb6e0d048b8ea/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L494-L503
>  
> {code}
> int headerCount = 0;
> int footerCount = 0;
> if (table != null) {
>   headerCount = Utilities.getHeaderCount(table);
>   footerCount = Utilities.getFooterCount(table, conf);
>   if (headerCount != 0 || footerCount != 0) {
> // Input file has header or footer, cannot be splitted.
> HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, 
> Long.MAX_VALUE);
>   }
> }
> {code}
> this piece of code makes the CSV (or any text files with header/footer) files 
> not splittable if header or footer is present. 
> If only header is present, we can find the offset after first line break and 
> use that to split. Similarly for footer, may be read few KB's of data at the 
> end and find the last line break offset. Use that to determine the data range 
> which can be used for splitting. Few reads during split generation are 
> cheaper than not splitting the file at all.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319438
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328967734
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
 
 Review comment:
   We can have a special handling for footerCount = 0. We can return the actual 
length of file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319438)
Time Spent: 50m  (was: 40m)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
> 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319441
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328919993
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+fis.seek(bufferSectionStart);
+while (bufferSectionEnd > bufferSectionStart) {
+  fis.seek(bufferSectionStart);
+  long pos = fis.getPos();
+  while (pos < bufferSectionEnd) {
+fis.readLine();
+pos = fis.getPos();
+if (pos <= bufferSectionEnd) {
+  if (lineEndBuffer.size() > footerCount) {
+lineEndBuffer.poll();
+  }
+  lineEndBuffer.add(pos);
+}
+  }
+  if (lineEndBuffer.size() > footerCount) {
+break;
+  } else {
+bufferSectionEnd = 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319435
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328914761
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
 
 Review comment:
   Can we move this outside the loop?
   Also, as per current logic, if data have only footer rows and headerCount=0, 
then startIndexForFile will be null. Having this statement outside the loop 
would handle it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319435)
Time Spent: 40m  (was: 0.5h)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: https://issues.apache.org/jira/browse/HIVE-21924
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.4.0, 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21924.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319434
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328918236
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
 
 Review comment:
   fis.close() needed in finally block.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319434)

> Split text files even if header/footer exists
> -
>
> Key: HIVE-21924
> URL: 

[jira] [Work logged] (HIVE-21924) Split text files even if header/footer exists

2019-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21924?focusedWorklogId=319442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319442
 ]

ASF GitHub Bot logged work on HIVE-21924:
-

Author: ASF GitHub Bot
Created on: 27/Sep/19 09:17
Start Date: 27/Sep/19 09:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #791: HIVE-21924
URL: https://github.com/apache/hive/pull/791#discussion_r328981037
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/SkippingTextInputFormat.java
 ##
 @@ -0,0 +1,136 @@
+package org.apache.hadoop.hive.ql.io;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.TextInputFormat;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * SkippingInputFormat is a header/footer aware input format. It truncates
+ * splits identified by TextInputFormat. Header and footers are removed
+ * from the splits.
+ */
+public class SkippingTextInputFormat extends TextInputFormat {
+
+  private final Map startIndexMap = new ConcurrentHashMap();
+  private final Map endIndexMap = new ConcurrentHashMap();
+  private JobConf conf;
+  private int headerCount;
+  private int footerCount;
+
+  @Override
+  public void configure(JobConf conf) {
+this.conf = conf;
+super.configure(conf);
+  }
+
+  public void configure(JobConf conf, int headerCount, int footerCount) {
+configure(conf);
+this.headerCount = headerCount;
+this.footerCount = footerCount;
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > -1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts);
+  }
+
+  @Override
+  protected FileSplit makeSplit(Path file, long start, long length, String[] 
hosts, String[] inMemoryHosts) {
+long cachedStart = getCachedStartIndex(file);
+long cachedEnd = getCachedEndIndex(file);
+if (cachedStart > start + length) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedStart > start) {
+  length = length - (cachedStart - start);
+  start = cachedStart;
+}
+if (cachedEnd > - 1 && cachedEnd < start) {
+  return new NullRowsInputFormat.DummyInputSplit(file);
+} else if (cachedEnd < start + length) {
+  length = cachedEnd - start;
+}
+return super.makeSplit(file, start, length, hosts, inMemoryHosts);
+  }
+
+  private long getCachedStartIndex(Path path) {
+Long startIndexForFile = startIndexMap.get(path);
+if (startIndexForFile == null) {
+  try {
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+for (int j = 0; j < headerCount; j++) {
+  fis.readLine();
+  // back 1 byte because readers skip the entire first row if split 
start is not 0
+  startIndexForFile = fis.getPos() - 1;
+}
+  } catch (IOException e) {
+startIndexForFile = 0L;
+  }
+  startIndexMap.put(path, startIndexForFile);
+}
+return startIndexForFile;
+  }
+
+  private long getCachedEndIndex(Path path) {
+Long endIndexForFile = endIndexMap.get(path);
+if (endIndexForFile == null) {
+  try {
+final long bufferSectionSize = 1024;
+long bufferSectionEnd = 
path.getFileSystem(conf).getFileStatus(path).getLen();
+long bufferSectionStart = Math.max(0, bufferSectionEnd - 
bufferSectionSize);
+Queue lineEndBuffer = new ArrayDeque(headerCount + 1);
+FSDataInputStream fis = path.getFileSystem(conf).open(path);
+fis.seek(bufferSectionStart);
+while (bufferSectionEnd > bufferSectionStart) {
+  fis.seek(bufferSectionStart);
+  long pos = fis.getPos();
+  while (pos < bufferSectionEnd) {
+fis.readLine();
+pos = fis.getPos();
+if (pos <= bufferSectionEnd) {
+  if (lineEndBuffer.size() > footerCount) {
+lineEndBuffer.poll();
+  }
+  lineEndBuffer.add(pos);
+}
+  }
+  if (lineEndBuffer.size() > footerCount) {
+break;
+  } else {
+bufferSectionEnd = 

[jira] [Commented] (HIVE-22191) Simplify SemanticAnalyzer by removing unused code

2019-09-27 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939228#comment-16939228
 ] 

Laszlo Bodor commented on HIVE-22191:
-

+1

> Simplify SemanticAnalyzer by removing unused code
> -
>
> Key: HIVE-22191
> URL: https://issues.apache.org/jira/browse/HIVE-22191
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Akos Dombi
>Assignee: Akos Dombi
>Priority: Major
> Attachments: HIVE-22191.2.patch, HIVE-22191.3.patch, 
> HIVE-22191.4.patch, HIVE-22191.5.patch, HIVE-22191.5.patch, HIVE-22191.patch
>
>
> Simplify {{SemanticAnalyzer}} by:
>  - Remove dead code
>  - Simplify returning statements
>  - Use interfaces types for parameters/fields/variables where it is 
> straightforward to migrate
>  - Make visibility stricter where it is possible
>  - Check logging to use parametrised logging
>  - Removing unnecessary keywords (e.g.: {{static}})
>  - Some code part could be simplified by using Java 8 features
> I think this is crucial step as this class already contains 15000+ lines of 
> code which is screaming for splitting into more reasonable classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22236) Fail to create View selecting View containing NOT IN subquery

2019-09-27 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-22236:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the patch [~zmatyus], and [~kgyrtkirk] for the review!

> Fail to create View selecting View containing NOT IN subquery
> -
>
> Key: HIVE-22236
> URL: https://issues.apache.org/jira/browse/HIVE-22236
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Matyus
>Assignee: Zoltan Matyus
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22236.01.patch, HIVE-22236.02.patch, 
> HIVE-22236.03.patch, HIVE-22236.q
>
>
> * Given a complicated view with a select statement that has subquery 
> containing "{{NOT IN}}"
> * Hive fails to create a simple view as {{SELECT * FROM complicated_view}} 
> * (with CBO disabled).
> The unparse replacements of the complicated view will be applied to the text 
> of the simple view, resulting in {{IllegalArgumentException: replace: range 
> invalid}} exceptions from {{org.antlr.runtime.TokenRewriteStream.replace}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22241) Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar

2019-09-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-22241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939197#comment-16939197
 ] 

Ádám Szita commented on HIVE-22241:
---

Looks good [~jcamachorodriguez], +1 pending tests

> Implement UDF to interpret date/timestamp using its internal representation 
> and Gregorian-Julian hybrid calendar
> 
>
> Key: HIVE-22241
> URL: https://issues.apache.org/jira/browse/HIVE-22241
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22241.01.patch, HIVE-22241.02.patch, 
> HIVE-22241.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UDF that converts a date/timestamp to new *proleptic Gregorian calendar* (ISO 
> 8601 standard), which is produced by extending the Gregorian calendar 
> backward to dates preceding its official introduction in 1582, assuming that 
> its internal days/milliseconds since epoch is calculated using legacy 
> *Gregorian-Julian hybrid* calendar, i.e., calendar that supports both the 
> Julian and Gregorian calendar systems with the support of a single 
> discontinuity, which corresponds by default to the Gregorian date when the 
> Gregorian calendar was instituted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >