[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934744#comment-14934744
 ] 

Bing Li commented on HIVE-10982:


Hi, [~vgumashta]
Thank you for your comment.

Do you mean to invoke a new property to hive-site.xml, which will control the 
max size responded by HS2 at the same time?

Do you know the current control mechanism on HS2?




> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9753) Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins.

2015-09-29 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934720#comment-14934720
 ] 

Feng Yuan commented on HIVE-9753:
-

[~gopalv]]

> Wrong results when using multiple levels of Joins. When table alias of one of 
> the table is null with left outer joins.  
> 
>
> Key: HIVE-9753
> URL: https://issues.apache.org/jira/browse/HIVE-9753
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Pavan Srinivas
>Priority: Critical
> Attachments: HIVE-9753.0-0.14.0.patch, HIVE-9753.0-1.0.0.patch, 
> HIVE-9753.patch, table1.data, table2.data, table3.data
>
>
> Let take scenario, where the tables are:
> {code}
> drop table table1;
> CREATE TABLE table1(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string
>   )
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table2;
> CREATE  TABLE table2(
>   col1 string,
>   col2 bigint,
>   col3 string,
>   col4 string
>   )
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table3;
> CREATE  TABLE table3(
>   col1 string,
>   col2 int,
>   col3 int,
>   col4 string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> {code}
> Query with wrong results:
> {code}
> SELECT t1.col1 AS dummy,
> t1.expected_column AS expected_column,
> t2.col4
> FROM (
> SELECT col1,
> '23-1',
> '23-13' as three,
> col4 AS expected_column
> FROM table1
> ) t1
> JOIN table2 t2
> ON cast(t2.col1 as string) = cast(t1.col1 as string)
> LEFT OUTER JOIN
> (SELECT col4, col1
> FROM table3
> ) t3
> ON t2.col4 = t3.col1  
> ;
> {code}
> and explain output: 
> {code}
> STAGE DEPENDENCIES:
>   Stage-7 is a root stage
>   Stage-5 depends on stages: Stage-7
>   Stage-0 depends on stages: Stage-5
> STAGE PLANS:
>   Stage: Stage-7
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t1:table1
>   Fetch Operator
> limit: -1
> t3:table3
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t1:table1
>   TableScan
> alias: table1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> Filter Operator
>   predicate: col1 is not null (type: boolean)
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Select Operator
> expressions: col1 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> HashTable Sink Operator
>   condition expressions:
> 0
> 1 {col4}
>   keys:
> 0 _col0 (type: string)
> 1 col1 (type: string)
> t3:table3
>   TableScan
> alias: table3
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> Select Operator
>   expressions: col1 (type: string)
>   outputColumnNames: _col1
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   HashTable Sink Operator
> condition expressions:
>   0 {_col0} {_col7} {_col7}
>   1
> keys:
>   0 _col7 (type: string)
>   1 _col1 (type: string)
>   Stage: Stage-5
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t2
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> Filter Operator
>   predicate: col1 is not null (type: boolean)
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {_col0}
>   1 {col4}
> keys:
>   0 _col0 (type: string)
>   1 col1 (type: string)
> outputColumnNames: 

[jira] [Commented] (HIVE-11930) how to prevent ppd the topN(a) udf predication in where clause?

2015-09-29 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934780#comment-14934780
 ] 

Feng Yuan commented on HIVE-11930:
--

hi [~ashutoshc]this cant be used in where clause.

> how to prevent ppd the topN(a) udf predication in where clause?
> ---
>
> Key: HIVE-11930
> URL: https://issues.apache.org/jira/browse/HIVE-11930
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 0.14.0
>Reporter: Feng Yuan
>Priority: Minor
>
> select 
> a.state_date,a.customer,a.taskid,a.step_id,a.exit_title,a.pv,top1000(a.only_id)
>   from
> (  select 
> t1.state_date,t1.customer,t1.taskid,t1.step_id,t1.exit_title,t1.pv,t1.only_id
>   from 
>   ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
>order by t1.only_id,t1.pv desc
>  )a
>   where  a.customer='Cdianyingwang'
>   and a.taskid='33'
>   and a.step_id='0' 
>   and top1000(a.only_id)<=10;
> in above example:
> outer top1000(a.only_id)<=10;will ppd to:
> stage 1:
> ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
> and this stage have 2 reduce,so you can see this will output 20 records,
> upon to outer stage,the final results is exactly this 20 records.
> so i want to know is there any way to hint this topN udf predication not to 
> ppd?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9566) HiveServer2 fails to start with NullPointerException

2015-09-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934684#comment-14934684
 ] 

Lefty Leverenz commented on HIVE-9566:
--

This was also committed to branch-1.0 (for release 1.0.2).  Shouldn't 1.0.2 be 
listed in Fix Version/s so it will get picked up for the release notes?

See commit 37206a49f1f6e12f3ac997bb04d3b383ae7781e1.

> HiveServer2 fails to start with NullPointerException
> 
>
> Key: HIVE-9566
> URL: https://issues.apache.org/jira/browse/HIVE-9566
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-9566-branch-0.13.patch, 
> HIVE-9566-branch-0.14.patch, HIVE-9566-trunk.patch, HIVE-9566.patch
>
>
> hiveserver2 uses embedded metastore with default hive-site.xml configuration. 
> I use "hive --stop --service hiveserver2" command to stop the running 
> hiveserver2 process and then use "hive --start --service hiveserver2" command 
> to start the hiveserver2 service. I see the following exception in the 
> hive.log file
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hive.service.server.HiveServer2.stop(HiveServer2.java:104)
> at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:138)
> at 
> org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:171)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11972) [Refactor] Improve determination of dynamic partitioning columns in FileSink Operator

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934748#comment-14934748
 ] 

Hive QA commented on HIVE-11972:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762542/HIVE-11972.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9646 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.ql.exec.TestFileSinkOperator.testDeleteDynamicPartitioning
org.apache.hadoop.hive.ql.exec.TestFileSinkOperator.testInsertDynamicPartitioning
org.apache.hadoop.hive.ql.exec.TestFileSinkOperator.testNonAcidDynamicPartitioning
org.apache.hadoop.hive.ql.exec.TestFileSinkOperator.testUpdateDynamicPartitioning
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5454/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5454/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5454/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762542 - PreCommit-HIVE-TRUNK-Build

> [Refactor] Improve determination of dynamic partitioning columns in FileSink 
> Operator
> -
>
> Key: HIVE-11972
> URL: https://issues.apache.org/jira/browse/HIVE-11972
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11972.patch
>
>
> Currently it uses column names to locate DP columns, which is brittle since 
> column names may change during planning and optimization phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11211) Reset the fields in JoinStatsRule in StatsRulesProcFactory

2015-09-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934679#comment-14934679
 ] 

Lefty Leverenz commented on HIVE-11211:
---

Fix version only shows 2.0.0, although this was also committed to branch-1 (for 
1.3.0) and recently to branch-1.2 (for 1.2.2).

Commit f428af1d2908588dd68eb30cde2f158bf9ef04c0 for branch-1.2 (1.2.2).
Commit 64d8582cb8d357216ef7fa208f68548ceb1ef2d3 for branch-1 (1.3.0).
Commit 42326958148c2558be9c3d4dfe44c9e735704617 for master (2.0.0).

> Reset the fields in JoinStatsRule in StatsRulesProcFactory
> --
>
> Key: HIVE-11211
> URL: https://issues.apache.org/jira/browse/HIVE-11211
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-11211.02.patch, HIVE-11211.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11903) Add zookeeper lock metrics to HS2

2015-09-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11903:

Attachment: HIVE-11903.3.patch

Patch 3 removes zookeeper connection metrics and only record locks count.

> Add zookeeper lock metrics to HS2
> -
>
> Key: HIVE-11903
> URL: https://issues.apache.org/jira/browse/HIVE-11903
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Yongzhi Chen
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11903.1.patch, HIVE-11903.2.patch, 
> HIVE-11903.3.patch
>
>
> Potential metrics are active zookeeper locks taken by type.  Can refine as we 
> go along.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11835) Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL

2015-09-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934692#comment-14934692
 ] 

Xuefu Zhang commented on HIVE-11835:


Decimal(1,1) covers a range from [-0.9, 0.9], beyond which data will be rounded 
(half up) to fit or null if such rounding isn't possible. As seen in the new 
test case, 1.0 => NULL, while 0.345 => 0.3.


> Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL
> -
>
> Key: HIVE-11835
> URL: https://issues.apache.org/jira/browse/HIVE-11835
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11835.1.patch, HIVE-11835.2.patch, HIVE-11835.patch
>
>
> Steps to reproduce:
> 1. create a text file with values like 0.0, 0.00, etc.
> 2. create table in hive with type decimal(1,1).
> 3. run "load data local inpath ..." to load data into the table.
> 4. run select * on the table.
> You will see that NULL is displayed for 0.0, 0.00, .0, etc. Instead, these 
> should be read as 0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10598) Vectorization borks when column is added to table.

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935122#comment-14935122
 ] 

Hive QA commented on HIVE-10598:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762634/HIVE-10598.06.patch

{color:red}ERROR:{color} -1 due to 42 failed/errored test(s), 9633 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-orc_vectorization_ppd.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_join_partition_key
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.ql.TestTxnCommands.testMultipleInserts
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorization
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorizationWithAcid
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorizationWithBuckets
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactWhileStreaming
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.minorCompactAfterAbort
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.minorCompactWhileStreaming
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.majorTableLegacy
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.majorTableNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.majorTableWithBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.majorWithAborted
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.majorWithOpenInMiddle
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorTableLegacy
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorTableNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorTableWithBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorWithAborted
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorWithOpenInMiddle
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.majorTableLegacy
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.majorTableNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.majorTableWithBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.majorWithAborted
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.majorWithOpenInMiddle
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorTableLegacy
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorTableNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorTableWithBase
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorWithAborted
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorWithOpenInMiddle
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5456/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5456/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5456/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 42 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762634 - PreCommit-HIVE-TRUNK-Build

> Vectorization borks when column is added to table.
> --
>
> Key: HIVE-10598
> URL: https://issues.apache.org/jira/browse/HIVE-10598
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Mithun Radhakrishnan
>Assignee: Matt McCline
> Attachments: HIVE-10598.01.patch, HIVE-10598.02.patch, 
> HIVE-10598.03.patch, HIVE-10598.04.patch, HIVE-10598.05.patch, 
> HIVE-10598.06.patch
>
>
> Consider the following table definition:
> {code:sql}

[jira] [Updated] (HIVE-11903) Add zookeeper lock metrics to HS2

2015-09-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11903:

Description: Potential metrics are active zookeeper locks taken by type.  
Can refine as we go along.  (was: Potential metrics are active zookeeper 
connections, locks taken by type, etc.  Can refine as we go along.)

> Add zookeeper lock metrics to HS2
> -
>
> Key: HIVE-11903
> URL: https://issues.apache.org/jira/browse/HIVE-11903
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Yongzhi Chen
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11903.1.patch, HIVE-11903.2.patch
>
>
> Potential metrics are active zookeeper locks taken by type.  Can refine as we 
> go along.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11945) ORC with non-local reads may not be reusing connection to DN

2015-09-29 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-11945:

Attachment: HIVE-11945.3.branch-1.patch

Attaching rebased patch for branch-1.

> ORC with non-local reads may not be reusing connection to DN
> 
>
> Key: HIVE-11945
> URL: https://issues.apache.org/jira/browse/HIVE-11945
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-11945.1.patch, HIVE-11945.2.patch, 
> HIVE-11945.3.branch-1.patch, HIVE-11945.3.patch
>
>
> When “seek + readFully(buffer, offset, length)” is used,  DFSInputStream ends 
> up going via “readWithStrategy()”.  This sets up BlockReader with length 
> equivalent to that of the block size. So until this position is reached, 
> RemoteBlockReader2.peer would not be added to the PeerCache (Plz refer 
> RemoteBlockReader2.close() in HDFS).  So eventually the next call to the same 
> DN would end opening a new socket.  In ORC, when it is not a data local read, 
> this has a the possibility of opening/closing lots of connections with DN.  
> In random reads, it would be good to set this length to the amount of data 
> that is to be read (e.g pread call in DFSInputStream which sets up the 
> BlockReader’s length correctly & the code path returns the Peer back to peer 
> cache properly).  “readFully(position, buffer, offset, length)” follows this 
> code path and ends up reusing the connections properly. Creating this JIRA to 
> fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11880) filter bug of UNION ALL when hive.ppd.remove.duplicatefilters=true and filter condition is type incompatible column

2015-09-29 Thread WangMeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935024#comment-14935024
 ] 

WangMeng commented on HIVE-11880:
-

[~ashutoshc]   [~jpullokkaran]  I  have published this patch on Review Board: 
https://reviews.apache.org/r/38805/
Please help  review it . Thanks. 

> filter bug  of UNION ALL when hive.ppd.remove.duplicatefilters=true and 
> filter condition is type incompatible column 
> -
>
> Key: HIVE-11880
> URL: https://issues.apache.org/jira/browse/HIVE-11880
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: WangMeng
>Assignee: WangMeng
> Attachments: HIVE-11880.01.patch, HIVE-11880.02.patch, 
> HIVE-11880.03.patch, HIVE-11880.04.patch
>
>
>For UNION ALL , when an union operator is constant column (such as '0L', 
> BIGINT Type)  and its corresponding column has incompatible type (such as INT 
> type). 
>   Query with filter condition on type incompatible column on this UNION ALL  
> will cause IndexOutOfBoundsException.
>  Such as TPC-H table "orders",in  the following query:
>  Type of 'orders'.'o_custkey' is INT normally,  while  the type of 
> corresponding constant column  "0" is BIGINT( `0L AS `o_custkey` ). 
>  This query (with filter "type incompatible column 'o_custkey' ")  will fail  
> with  java.lang.IndexOutOfBoundsException : 
> {code}
> SELECT Count(1)
> FROM   (
>   SELECT `o_orderkey` ,
>  `o_custkey`
>   FROM   `orders`
>   UNION ALL
>   SELECT `o_orderkey`,
>  0L  AS `o_custkey`
>   FROM   `orders`) `oo`
> WHERE  o_custkey<10 limit 4 ;
> {code}
> When 
> {code}
> set hive.ppd.remove.duplicatefilters=true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6090) Audit logs for HiveServer2

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934935#comment-14934935
 ] 

Hive QA commented on HIVE-6090:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762581/HIVE-6090.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9631 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_join30.q-vector_data_types.q-filter_join_breaktask.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5455/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5455/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5455/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762581 - PreCommit-HIVE-TRUNK-Build

> Audit logs for HiveServer2
> --
>
> Key: HIVE-6090
> URL: https://issues.apache.org/jira/browse/HIVE-6090
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability, HiveServer2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>  Labels: audit, hiveserver
> Attachments: HIVE-6090.1.WIP.patch, HIVE-6090.1.patch, 
> HIVE-6090.3.patch, HIVE-6090.patch
>
>
> HiveMetastore has audit logs and would like to audit all queries or requests 
> to HiveServer2 also. This will help in understanding how the APIs were used, 
> queries submitted, users etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11903) Add zookeeper lock metrics to HS2

2015-09-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11903:

Summary: Add zookeeper lock metrics to HS2  (was: Add zookeeper metrics to 
HS2)

> Add zookeeper lock metrics to HS2
> -
>
> Key: HIVE-11903
> URL: https://issues.apache.org/jira/browse/HIVE-11903
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Yongzhi Chen
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11903.1.patch, HIVE-11903.2.patch
>
>
> Potential metrics are active zookeeper connections, locks taken by type, etc. 
>  Can refine as we go along.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7594) Hive JDBC client: "out of sequence response" on large long running query

2015-09-29 Thread Sidi RHIL (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935294#comment-14935294
 ] 

Sidi RHIL commented on HIVE-7594:
-

Hello,

I have the same problem when using Talend to read a hive table (see the log 
below).

Did anybody find a solution for this issue? 

Exception in component tHiveInput_1
java.sql.SQLException: org.apache.thrift.TApplicationException: CloseOperation 
failed: out of sequence response
at 
org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:172)
at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:191)
at 
mon_projet.agregate_doxtl_0_1.agregate_doxtl.tHiveInput_1Process(agregate_doxtl.java:2540)
at 
mon_projet.agregate_doxtl_0_1.agregate_doxtl.runJobInTOS(agregate_doxtl.java:3447)
at 
mon_projet.agregate_doxtl_0_1.agregate_doxtl.main(agregate_doxtl.java:3304)
Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: out 
of sequence response
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:455)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.CloseOperation(TCLIService.java:442)
at 
org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:166)
... 4 more





> Hive JDBC client: "out of sequence response" on large long running query
> 
>
> Key: HIVE-7594
> URL: https://issues.apache.org/jira/browse/HIVE-7594
> Project: Hive
>  Issue Type: Bug
>  Components: Clients, HiveServer2
>Affects Versions: 0.13.0
> Environment: HDP2.1
>Reporter: Hari Sekhon
>
> When executing a long running query in a JDBC client (Squirrel) to 
> HiveServer2 after several minutes I get this error in the client:
> {code}
> Error: org.apache.thrift.TApplicationException: ExecuteStatement failed: out 
> of sequence response
> SQLState:  08S01
> ErrorCode: 0
> {code}
> I've seen this before in, iirc when running 2 queries in 1 session but I've 
> closed the client and run only this single query in a new session each time. 
> I did a search and saw HIVE-6893 referring to a Metastore exception which I 
> have in some older logs but not corresponding / recent in these recent 
> instances, the error seems different in this case but may be related.
> The query to reproduce is "select count(*) from myTable" where myTable is a 
> 1TB table of 620 million rows. This happens in both MR and Tez execution 
> engines running on Yarn.
> Here are all the jars I've added to the classpath (taken from Hortonworks doc 
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_dataintegration/content/ch_using-hive-2.html,
>  plus added hadoop-common, hive-exec and slf4j-api to solve class not found 
> issues on top of that):
> commons-codec-1.4.jar
> commons-logging-1.1.3.jar
> hadoop-common-2.4.0.2.1.3.0-563.jar
> hive-exec-0.13.0.2.1.3.0-563.jar
> hive-jdbc-0.13.0.2.1.3.0-563.jar
> hive-service-0.13.0.2.1.3.0-563.jar
> httpclient-4.2.5.jar
> httpcore-4.2.5.jar
> libthrift-0.9.0.jar
> slf4j-api-1.7.5.jar
> I am seeing errors like this in the hiveserver2.log:
> {code}
> 2014-08-01 15:04:31,358 ERROR [pool-5-thread-3]: server.TThreadPoolServer 
> (TThreadPoolServer.java:run(215)) - Error occurred during processing of 
> message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
> at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> ... 4 more
> ...
> 2014-08-01 15:06:31,520 ERROR [pool-5-thread-3]: 

[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935266#comment-14935266
 ] 

Hive QA commented on HIVE-11954:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762593/HIVE-11954.01.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9646 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5457/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5457/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5457/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762593 - PreCommit-HIVE-TRUNK-Build

> Extend logic to choose side table in MapJoin Conversion algorithm
> -
>
> Key: HIVE-11954
> URL: https://issues.apache.org/jira/browse/HIVE-11954
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11954.01.patch, HIVE-11954.patch, HIVE-11954.patch
>
>
> Selection of side table (in memory/hash table) in MapJoin Conversion 
> algorithm needs to be more sophisticated.
> In an N way Map Join, Hive should pick an input stream as side table (in 
> memory table) that has least cost in producing relation (like TS(FIL|Proj)*).
> Cost based choice needs extended cost model; without return path its going to 
> be hard to do this.
> For the time being we could employ a modified cost based algorithm for side 
> table selection.
> New algorithm is described below:
> 1. Identify the candidate set of inputs for side table (in memory/hash table) 
> from the inputs (based on conditional task size)
> 2. For each of the input identify its cost, memory requirement. Cost is 1 for 
> each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for 
> an input is the total no of heavy weight ops in its branch.
> 3. Order set from #1 on cost & memory req (ascending order)
> 4. Pick the first element from #3 as the side table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11972) [Refactor] Improve determination of dynamic partitioning columns in FileSink Operator

2015-09-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935240#comment-14935240
 ] 

Ashutosh Chauhan commented on HIVE-11972:
-

[~prasanth_j] Would you like to take a look?

> [Refactor] Improve determination of dynamic partitioning columns in FileSink 
> Operator
> -
>
> Key: HIVE-11972
> URL: https://issues.apache.org/jira/browse/HIVE-11972
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11972.patch
>
>
> Currently it uses column names to locate DP columns, which is brittle since 
> column names may change during planning and optimization phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11973) IN operator fails when the column type is DATE

2015-09-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935249#comment-14935249
 ] 

Yongzhi Chen commented on HIVE-11973:
-

The in statement thinks string can not convert to date for
FunctionRegistry.getPrimitiveCommonCategory(TypeInfo, TypeInfo) return null for 
the two types. For string can be converted to Date and implicitly converted to 
Date. So the PrimitiveCommonCategory for the two should be Date. 

{noformat}
FunctionRegistry.getPrimitiveCommonCategory(TypeInfo, TypeInfo) line: 772   
FunctionRegistry.getCommonClass(TypeInfo, TypeInfo) line: 810 
GenericUDFUtils$ReturnObjectInspectorResolver.update(ObjectInspector, boolean) 
line: 165
GenericUDFUtils$ReturnObjectInspectorResolver.update(ObjectInspector) line: 103 

GenericUDFIn.initialize(ObjectInspector[]) line: 89 
GenericUDFIn(GenericUDF).initializeAndFoldConstants(ObjectInspector[]) line: 
139
ExprNodeGenericFuncDesc.newInstance(GenericUDF, String, List) 
line: 234
{noformat}

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11980) Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON

2015-09-29 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11980:

Description: 
When we create a new table from the table with table-level serde to be Parquet 
and partition-level serde to be JSON, currently the following exception will be 
thrown if there are struct fields.

Apparently, getStructFieldsDataAsList() also needs to handle the case of List 
in addition to ArrayWritable similar to getStructFieldData.

{noformat}
Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
java.util.ArrayList
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldsDataAsList(ArrayWritableObjectInspector.java:172)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:354)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:257)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:241)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:720)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
{noformat}

  was:
Apparently, getStructFieldsDataAsList() also needs to handle the case of List 
in addition to ArrayWritable similar to getStructFieldData.

{noformat}
Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
java.util.ArrayList
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldsDataAsList(ArrayWritableObjectInspector.java:172)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:354)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:257)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:241)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:720)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
{noformat}


> Follow up on HIVE-11696, exception is thrown from CTAS from the table with 
> table-level serde is Parquet while partition-level serde is JSON
> ---
>
> Key: HIVE-11980
> URL: https://issues.apache.org/jira/browse/HIVE-11980
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11980.patch
>
>
> When we create a new table from the table with table-level serde to be 
> Parquet and partition-level serde to be JSON, currently the following 
> exception will be thrown if there are struct fields.
> Apparently, getStructFieldsDataAsList() also needs to handle the case of List 
> in addition to ArrayWritable similar to getStructFieldData.
> {noformat}
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldsDataAsList(ArrayWritableObjectInspector.java:172)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:354)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:257)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:241)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> 

[jira] [Commented] (HIVE-11930) how to prevent ppd the topN(a) udf predication in where clause?

2015-09-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935230#comment-14935230
 ] 

Ashutosh Chauhan commented on HIVE-11930:
-

Not in where clause, but in java source file containing your udf {{top1000}}

> how to prevent ppd the topN(a) udf predication in where clause?
> ---
>
> Key: HIVE-11930
> URL: https://issues.apache.org/jira/browse/HIVE-11930
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 0.14.0
>Reporter: Feng Yuan
>Priority: Minor
>
> select 
> a.state_date,a.customer,a.taskid,a.step_id,a.exit_title,a.pv,top1000(a.only_id)
>   from
> (  select 
> t1.state_date,t1.customer,t1.taskid,t1.step_id,t1.exit_title,t1.pv,t1.only_id
>   from 
>   ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
>order by t1.only_id,t1.pv desc
>  )a
>   where  a.customer='Cdianyingwang'
>   and a.taskid='33'
>   and a.step_id='0' 
>   and top1000(a.only_id)<=10;
> in above example:
> outer top1000(a.only_id)<=10;will ppd to:
> stage 1:
> ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
> and this stage have 2 reduce,so you can see this will output 20 records,
> upon to outer stage,the final results is exactly this 20 records.
> so i want to know is there any way to hint this topN udf predication not to 
> ppd?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11973) IN operator fails when the column type is DATE

2015-09-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11973:

Attachment: HIVE-11973.1.patch

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
> Attachments: HIVE-11973.1.patch
>
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11980) Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON

2015-09-29 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11980:

Attachment: HIVE-11980.patch

Update getStructFrieidsDataAsList() to also handle {{List}} as a parameter. 
Added unit test for the coverage.

> Follow up on HIVE-11696, exception is thrown from CTAS from the table with 
> table-level serde is Parquet while partition-level serde is JSON
> ---
>
> Key: HIVE-11980
> URL: https://issues.apache.org/jira/browse/HIVE-11980
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11980.patch
>
>
> Apparently, getStructFieldsDataAsList() also needs to handle the case of List 
> in addition to ArrayWritable similar to getStructFieldData.
> {noformat}
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldsDataAsList(ArrayWritableObjectInspector.java:172)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:354)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:257)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:241)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:720)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:813)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11928) ORC footer section can also exceed protobuf message limit

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935267#comment-14935267
 ] 

Hive QA commented on HIVE-11928:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762650/HIVE-11928.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5458/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5458/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5458/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5458/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at a4c43f0 HIVE-11945: ORC with non-local reads may not be reusing 
connection to DN (Rajesh Balamohan reviewed by Sergey Shelukhin, Prasanth 
Jayachandran)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at a4c43f0 HIVE-11945: ORC with non-local reads may not be reusing 
connection to DN (Rajesh Balamohan reviewed by Sergey Shelukhin, Prasanth 
Jayachandran)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762650 - PreCommit-HIVE-TRUNK-Build

> ORC footer section can also exceed protobuf message limit
> -
>
> Key: HIVE-11928
> URL: https://issues.apache.org/jira/browse/HIVE-11928
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jagruti Varia
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11928-branch-1.patch, HIVE-11928.1.patch, 
> HIVE-11928.1.patch, HIVE-11928.2.patch, HIVE-11928.2.patch
>
>
> Similar to HIVE-11592 but for orc footer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) handle long typenames from Avro schema in metastore

2015-09-29 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935413#comment-14935413
 ] 

Jimmy Xiang commented on HIVE-11985:


I am not familiar with Avro serde either :(

> handle long typenames from Avro schema in metastore
> ---
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935442#comment-14935442
 ] 

Josh Elser commented on HIVE-11755:
---

Thanks, [~brocknoland]. Much appreciated!

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch, 
> HIVE-11755.003.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
> only be set once per job)'
> {noformat}
> The OutputFormat implementation already had a method in place to account for 
> this exception but the method accidentally 

[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-29 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935439#comment-14935439
 ] 

Brock Noland commented on HIVE-11755:
-

Looks reasonable.

+1

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch, 
> HIVE-11755.003.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
> only be set once per job)'
> {noformat}
> The OutputFormat implementation already had a method in place to account for 
> this exception but the method accidentally wasn't getting 

[jira] [Updated] (HIVE-11988) [hive] security issue with hive & ranger for import table command

2015-09-29 Thread Deepak Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Sharma updated HIVE-11988:
-
Assignee: Sushanth Sowmyan

> [hive] security issue with hive & ranger for import table command
> -
>
> Key: HIVE-11988
> URL: https://issues.apache.org/jira/browse/HIVE-11988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Deepak Sharma
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Fix For: 0.14.1, 1.2.2
>
>
> if a user does not have permission to create table in hive , then if the same 
> user import data for a table using following command then , it will have to 
> create table also and that is working successfully , ideally it should not 
> work
> STR:
> 
> 1. put some raw data in hdfs path /user/user1/tempdata
> 2. in ranger check policy , user1 should not have any permission on any table
> 3. login through user1 into beeline ( obviously it will fail since user 
> doesnt have permission to create table)
> create table tt1(id INT,ff String);
> FAILED: HiveAccessControlException Permission denied: user user1 does not 
> have CREATE privilege on default/tt1 (state=42000,code=4)
> 4. now try following command to import data into a table ( table should not 
> exist already)
> import table tt1 from '/user/user1/tempdata';
> ER:
> since user1 doesnt have permission to create table so this operation should 
> fail
> AR:
> table is created successfully and data is also imported !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11880) filter bug of UNION ALL when hive.ppd.remove.duplicatefilters=true and filter condition is type incompatible column

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935812#comment-14935812
 ] 

Hive QA commented on HIVE-11880:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764007/HIVE-11880.04.patch

{color:red}ERROR:{color} -1 due to 57 failed/errored test(s), 9631 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_distinct_2.q-vector_interval_2.q-load_dyn_part2.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonblock_op_deduplicate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_boolexpr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_test_boolean_whereclause
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_percentile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_nulls
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_nulls
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mr_diff_schema_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join_nonexistent_part
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_udf_percentile
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementParallel
{noformat}

Test results: 

[jira] [Commented] (HIVE-11964) RelOptHiveTable.hiveColStatsMap might contain mismatched column stats

2015-09-29 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935844#comment-14935844
 ] 

Laljo John Pullokkaran commented on HIVE-11964:
---

+1 Thanks [~ctang.ma]

> RelOptHiveTable.hiveColStatsMap might contain mismatched column stats
> -
>
> Key: HIVE-11964
> URL: https://issues.apache.org/jira/browse/HIVE-11964
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Statistics
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-11964.patch
>
>
> RelOptHiveTable.hiveColStatsMap might contain mismatched stats since it was 
> built by assuming the stats returned from
> ==
> hiveColStats =StatsUtils.getTableColumnStats(hiveTblMetadata, 
> hiveNonPartitionCols, nonPartColNamesThatRqrStats);
> or 
> HiveMetaStoreClient.getTableColumnStatistics(dbName, tableName, colNames)
> ==
> have the same order of the requested columns. But actually the order is 
> non-deterministic. therefore the returned stats should be re-ordered before 
> it is put in hiveColStatsMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11969) start Tez session in background when starting CLI

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936046#comment-14936046
 ] 

Sergey Shelukhin commented on HIVE-11969:
-

Seems to work as intended on cluster. [~sseth] can you take a look? Esp. wrt 
the right part of init being async.

> start Tez session in background when starting CLI
> -
>
> Key: HIVE-11969
> URL: https://issues.apache.org/jira/browse/HIVE-11969
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11969.01.patch, HIVE-11969.patch
>
>
> Tez session spins up AM, which can cause delays, esp. if the cluster is very 
> busy.
> This can be done in background, so the AM might get started while the user is 
> running local commands and doing other things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11807) Set ORC buffer size in relation to set stripe size

2015-09-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936123#comment-14936123
 ] 

Gopal V commented on HIVE-11807:


[~owen.omalley]: LGTM - +1.

> Set ORC buffer size in relation to set stripe size
> --
>
> Key: HIVE-11807
> URL: https://issues.apache.org/jira/browse/HIVE-11807
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11807.patch, HIVE-11807.patch
>
>
> A customer produced ORC files with very small stripe sizes (10k rows/stripe) 
> by setting a small 64MB stripe size and 256K buffer size for a 54 column 
> table. At that size, each of the streams only get a buffer or two before the 
> stripe size is reached. The current code uses the available memory instead of 
> the stripe size and thus doesn't shrink the buffer size if the JVM has much 
> more memory than the stripe size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11898) support default partition in metastoredirectsql

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936125#comment-14936125
 ] 

Sergey Shelukhin commented on HIVE-11898:
-

These tests pass for me. [~sushanth] can you take a look?

> support default partition in metastoredirectsql
> ---
>
> Key: HIVE-11898
> URL: https://issues.apache.org/jira/browse/HIVE-11898
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11898.01.patch, HIVE-11898.02.patch, 
> HIVE-11898.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11990) Loading data inpath from a temporary table dir fails on Windows

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11990:
-
Attachment: HIVE-11990.2.patch

[~jdere] Patch #2 with test case.

Thanks
Hari

> Loading data inpath from a temporary table dir fails on Windows
> ---
>
> Key: HIVE-11990
> URL: https://issues.apache.org/jira/browse/HIVE-11990
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11990.1.patch, HIVE-11990.2.patch
>
>
> The query runs:
> {noformat}
> load data inpath 'wasb:///tmp/testtemptable/temptablemisc_5/data' overwrite 
> into table temp2;
> {noformat}
> It fails with:
> {noformat}
> FAILED: SemanticException [Error 10028]: Line 2:37 Path is not legal 
> ''wasb:///tmp/testtemptable/temptablemisc_5/data'': Move from: 
> wasb://humb23-hi...@humboldttesting3.blob.core.windows.net/tmp/testtemptable/temptablemisc_5/data
>  to: 
> hdfs://headnode0.humb23-hive1-ssh.h2.internal.cloudapp.net:8020/tmp/hive/hrt_qa/0d5f8b31-5908-44bf-ae4c-9eee956da066/_tmp_space.db/75b44252-42a7-4d28-baf8-4977daa5d49c
>  is not valid. Please check that values for params "default.fs.name" and 
> "hive.metastore.warehouse.dir" do not conflict.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11992) speed up and re-enable slow q test files in Hive

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11992:

Description: 
Due to perceived lack of importance and long runtimes, we have disabled the 
following q files:
CliDriver:
  rcfile_merge1.q,\

MinimrCliDriver:
  ql_rewrite_gbtoidx.q,\
  ql_rewrite_gbtoidx_cbo_1.q,\
  ql_rewrite_gbtoidx_cbo_2.q,\
  smb_mapjoin_8.q,\

If someone thinks any of these are important, they should be re-enabled, 
however, their runtime should be made acceptable first (they each take 10-30 
minutes right now, and should take 3 minutes at most, ideally 0-2).

Please feel free to look at all of these, or file sub-tasks to look at a subset 
of the list.

  was:
Due to perceived lack of importance and long runtimes, we have disabled the 
following q files:
CliDriver:
  rcfile_merge1.q,\

MinimrCliDriver:
  ql_rewrite_gbtoidx.q,\
  ql_rewrite_gbtoidx_cbo_1.q,\
  ql_rewrite_gbtoidx_cbo_2.q,\
  smb_mapjoin_8.q,\

If someone thinks any of these are important, they should be re-enabled, 
however, their runtime should be made acceptable first (they take 10-30 minutes 
right now, and should take 3 minutes at most, ideally 0-2).

Please feel free to look at all of these, or file sub-tasks to look at a subset 
of the list.


> speed up and re-enable slow q test files in Hive
> 
>
> Key: HIVE-11992
> URL: https://issues.apache.org/jira/browse/HIVE-11992
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> Due to perceived lack of importance and long runtimes, we have disabled the 
> following q files:
> CliDriver:
>   rcfile_merge1.q,\
> MinimrCliDriver:
>   ql_rewrite_gbtoidx.q,\
>   ql_rewrite_gbtoidx_cbo_1.q,\
>   ql_rewrite_gbtoidx_cbo_2.q,\
>   smb_mapjoin_8.q,\
> If someone thinks any of these are important, they should be re-enabled, 
> however, their runtime should be made acceptable first (they each take 10-30 
> minutes right now, and should take 3 minutes at most, ideally 0-2).
> Please feel free to look at all of these, or file sub-tasks to look at a 
> subset of the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11894) CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table column name in CTAS queries

2015-09-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11894:
---
Attachment: HIVE-11894.03.patch

a combined patch of 11894 and 11907, depending on the check in of 11971

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table 
> column name in CTAS queries
> ---
>
> Key: HIVE-11894
> URL: https://issues.apache.org/jira/browse/HIVE-11894
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11894.01.patch, HIVE-11894.02.patch, 
> HIVE-11894.03.patch
>
>
> To repro, run lineage2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11920) ADD JAR failing with URL schemes other than file/ivy/hdfs

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935828#comment-14935828
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11920:
--

ltgm +1

> ADD JAR failing with URL schemes other than file/ivy/hdfs
> -
>
> Key: HIVE-11920
> URL: https://issues.apache.org/jira/browse/HIVE-11920
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11920.1.patch
>
>
> Example stack trace below. It looks like this was introduced by HIVE-9664.
> {noformat}
> 015-09-16 19:53:16,502 ERROR [main]: SessionState 
> (SessionState.java:printError(960)) - invalid url: 
> wasb:///tmp/hive-udfs-0.1.jar, expecting ( file | hdfs | ivy)  as url scheme.
> java.lang.RuntimeException: invalid url: wasb:///tmp/hive-udfs-0.1.jar, 
> expecting ( file | hdfs | ivy)  as url scheme.
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getURLType(SessionState.java:1230)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1237)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:301)
> at 
> org.apache.hadoop.hive.ql.exec.Registry.registerToSessionRegistry(Registry.java:453)
> at 
> org.apache.hadoop.hive.ql.exec.Registry.registerPermanentFunction(Registry.java:200)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerPermanentFunction(FunctionRegistry.java:1495)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.createPermanentFunction(FunctionTask.java:136)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:75)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1655)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1414)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11983:
--
Component/s: Transactions

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11960) braces in join conditions are not supported

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11960:

Attachment: HIVE-11960.02.patch

It actually appears to be supported thru the magic of recursion. Updated the 
test (I just added 2 sets of braces to the original no-braces query that I used 
to compare)

> braces in join conditions are not supported
> ---
>
> Key: HIVE-11960
> URL: https://issues.apache.org/jira/browse/HIVE-11960
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11960.01.patch, HIVE-11960.02.patch, 
> HIVE-11960.patch
>
>
> These should be supported; they are ANSI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11903) Add lock metrics to HS2

2015-09-29 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-11903:
-
Summary: Add lock metrics to HS2  (was: Add zookeeper lock metrics to HS2)

> Add lock metrics to HS2
> ---
>
> Key: HIVE-11903
> URL: https://issues.apache.org/jira/browse/HIVE-11903
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Yongzhi Chen
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11903.1.patch, HIVE-11903.2.patch, 
> HIVE-11903.3.patch
>
>
> Potential metrics are active zookeeper locks taken by type.  Can refine as we 
> go along.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936098#comment-14936098
 ] 

Hive QA commented on HIVE-11976:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764060/HIVE-11976.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9631 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-update_orig_table.q-vectorization_13.q-mapreduce2.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_flatten_and_or
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5461/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5461/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5461/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12764060 - PreCommit-HIVE-TRUNK-Build

> Extend CBO rules to being able to apply rules only once on a given operator
> ---
>
> Key: HIVE-11976
> URL: https://issues.apache.org/jira/browse/HIVE-11976
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11976.patch
>
>
> Create a way to bail out quickly from HepPlanner if the rule has been already 
> applied on a certain operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11993) It is not necessary to start tez session when cli with parameter "-e" or "-f"

2015-09-29 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated HIVE-11993:
--
Description: 
With "-e" or "-f", hive execute under batch mode, so I don't think it is 
necessary to start tez session When hive session is started. 
Especially when I only want to execute DDL. 

  was:Especially when I only want to execute DDL. 


> It is not necessary to start tez session when cli with parameter "-e" or "-f"
> -
>
> Key: HIVE-11993
> URL: https://issues.apache.org/jira/browse/HIVE-11993
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Jeff Zhang
>
> With "-e" or "-f", hive execute under batch mode, so I don't think it is 
> necessary to start tez session When hive session is started. 
> Especially when I only want to execute DDL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11915) BoneCP returns closed connections from the pool

2015-09-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935926#comment-14935926
 ] 

Thejas M Nair commented on HIVE-11915:
--

Thanks for the update. Can you also update the log message to also indicate 
that there is going to be a retry ? Otherwise, users might not realize that the 
issue was not fatal. Also, please update it to include the full exception stack 
trace. That can be very useful for debugging.


> BoneCP returns closed connections from the pool
> ---
>
> Key: HIVE-11915
> URL: https://issues.apache.org/jira/browse/HIVE-11915
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11915.01.patch, HIVE-11915.02.patch, 
> HIVE-11915.WIP.patch, HIVE-11915.patch
>
>
> It's a very old bug in BoneCP and it will never be fixed... There are 
> multiple workarounds on the internet but according to responses they are all 
> unreliable. We should upgrade to HikariCP (which in turn is only supported by 
> DN 4), meanwhile try some shamanic rituals. In this JIRA we will try a 
> relatively weak drum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) handle long typenames from Avro schema in metastore

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935949#comment-14935949
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

https://reviews.apache.org/r/38862/

> handle long typenames from Avro schema in metastore
> ---
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11903) Add zookeeper lock metrics to HS2

2015-09-29 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936037#comment-14936037
 ] 

Szehon Ho commented on HIVE-11903:
--

Thanks, looks good to me, we can remove the extra imports on 
CuratorFrameworkSingleton for now, but I can do it on commit.  +1

> Add zookeeper lock metrics to HS2
> -
>
> Key: HIVE-11903
> URL: https://issues.apache.org/jira/browse/HIVE-11903
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Yongzhi Chen
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11903.1.patch, HIVE-11903.2.patch, 
> HIVE-11903.3.patch
>
>
> Potential metrics are active zookeeper locks taken by type.  Can refine as we 
> go along.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11992) speed up and re-enable slow q test files in Hive

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936079#comment-14936079
 ] 

Sergey Shelukhin commented on HIVE-11992:
-

CliDriver tests are already parallelized, but total capacity of test runners is 
still limited...

> speed up and re-enable slow q test files in Hive
> 
>
> Key: HIVE-11992
> URL: https://issues.apache.org/jira/browse/HIVE-11992
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> Due to perceived lack of importance and long runtimes, we have disabled the 
> following q files:
> CliDriver:
>   rcfile_merge1.q,\
> MinimrCliDriver:
>   ql_rewrite_gbtoidx.q,\
>   ql_rewrite_gbtoidx_cbo_1.q,\
>   ql_rewrite_gbtoidx_cbo_2.q,\
>   smb_mapjoin_8.q,\
> If someone thinks any of these are important, they should be re-enabled, 
> however, their runtime should be made acceptable first (they each take 10-30 
> minutes right now, and should take 3 minutes at most, ideally 0-2).
> Please feel free to look at all of these, or file sub-tasks to look at a 
> subset of the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11823) create a self-contained translation for SARG to be used by metastore

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11823:

Description: 
See HIVE-11705. This just contains the hbase-metastore-specific methods from 
that patch

NO PRECOMMIT TESTS



  was:
See HIVE-11705. This just contains the hbase-metastore-specific methods from 
that patch




> create a self-contained translation for SARG to be used by metastore
> 
>
> Key: HIVE-11823
> URL: https://issues.apache.org/jira/browse/HIVE-11823
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11823.01.patch, HIVE-11823.02.patch, 
> HIVE-11823.patch
>
>
> See HIVE-11705. This just contains the hbase-metastore-specific methods from 
> that patch
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11836) ORC SARG creation throws NPE for null constants with void type

2015-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11836:
--
Component/s: Transactions

> ORC SARG creation throws NPE for null constants with void type
> --
>
> Key: HIVE-11836
> URL: https://issues.apache.org/jira/browse/HIVE-11836
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0
>
> Attachments: HIVE-11836.1.patch
>
>
> Queries like
> {code}
> select * from table where col = null
> {code}
> will throw the following exception
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.boxLiteral(SearchArgumentImpl.java:446)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.getLiteral(SearchArgumentImpl.java:476)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.createLeaf(SearchArgumentImpl.java:524)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.createLeaf(SearchArgumentImpl.java:584)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.parse(SearchArgumentImpl.java:629)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.addChildren(SearchArgumentImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.parse(SearchArgumentImpl.java:621)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.addChildren(SearchArgumentImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.parse(SearchArgumentImpl.java:621)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.expression(SearchArgumentImpl.java:916)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl.(SearchArgumentImpl.java:953)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.create(SearchArgumentFactory.java:36)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.createFromConf(SearchArgumentFactory.java:50)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:312)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1224)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> {code}
> This issue does not happen when CBO is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11672) Hive Streaming API handles bucketing incorrectly

2015-09-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935893#comment-14935893
 ] 

Eugene Koifman commented on HIVE-11672:
---

[~roshan_naik] is this is a dup of HIVE-11983?

> Hive Streaming API handles bucketing incorrectly
> 
>
> Key: HIVE-11672
> URL: https://issues.apache.org/jira/browse/HIVE-11672
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Raj Bains
>Assignee: Roshan Naik
>Priority: Critical
>
> Hive Streaming API allows the clients to get a random bucket and then insert 
> data into it. However, this leads to incorrect bucketing as Hive expects data 
> to be distributed into buckets based on a hash function applied to bucket 
> key. The data is inserted randomly by the clients right now. They have no way 
> of
> # Knowing what bucket a row (tuple) belongs to
> # Asking for a specific bucket
> There are optimization such as Sort Merge Join and Bucket Map Join that rely 
> on the data being correctly distributed across buckets and these will cause 
> incorrect read results if the data is not distributed correctly.
> There are two obvious design choices
> # Hive Streaming API should fix this internally by distributing the data 
> correctly
> # Hive Streaming API should expose data distribution scheme to the clients 
> and allow them to distribute the data correctly
> The first option will mean every client thread will write to many buckets, 
> causing many small files in each bucket and too many connections open. this 
> does not seem feasible. The second option pushes more functionality into the 
> client of the Hive Streaming API, but can maintain high throughput and write 
> good sized ORC files. This option seems preferable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11952) disable q tests that are both slow and less relevant

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11952:

Fix Version/s: 2.0.0

> disable q tests that are both slow and less relevant
> 
>
> Key: HIVE-11952
> URL: https://issues.apache.org/jira/browse/HIVE-11952
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-11952.01.patch, HIVE-11952.patch
>
>
> We will disable several tests that test obscure and old features and take 
> inordinate amount of time, and file JIRAs to look at their perf if someone 
> still cares about them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11444) ACID Compactor should generate stats/alerts

2015-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11444:
--
Description: 
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.

Simplest idea is to add another periodic task to AcidHouseKeeperService to
//periodically do select count(*), min(txnid),max(txnid), type from 
txns group by type.
//1. dump that to log file at info
//2. could also keep counts for last 10min, hour, 6 hours, 24 hours, etc
//2.2 if a large increase is detected - issue alert (at least to the 
log for now) at warn/error


  was:
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.


> ACID Compactor should generate stats/alerts
> ---
>
> Key: HIVE-11444
> URL: https://issues.apache.org/jira/browse/HIVE-11444
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> Compaction should generate stats about number of files it reads, min/max/avg 
> size etc.  It should also generate alerts if it looks like the system is not 
> configured correctly.
> For example, if there are lots of delta files with very small files, it's a 
> good sign that Streaming API is configured with batches that are too small.
> Simplest idea is to add another periodic task to AcidHouseKeeperService to
> //periodically do select count(*), min(txnid),max(txnid), type from 
> txns group by type.
> //1. dump that to log file at info
> //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, 
> etc
> //2.2 if a large increase is detected - issue alert (at least to the 
> log for now) at warn/error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11915) BoneCP returns closed connections from the pool

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11915:

Attachment: HIVE-11915.02.patch

Updated the patch; fixed the log, moved the retry logic closer to the code.
I disagree about extra retries just-in-case - that's what retrying object store 
and ms client do, and 90% of the cases when I see these (and 100% for the 
former) is when it blindly retries for some non-recoverable exceptions.
DBCP has been around for a while and doesn't merit doing things just in case, 
esp. if there's no way to tell what is recoverable... at least with BONECP we 
know what we are trying to fix.

> BoneCP returns closed connections from the pool
> ---
>
> Key: HIVE-11915
> URL: https://issues.apache.org/jira/browse/HIVE-11915
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11915.01.patch, HIVE-11915.02.patch, 
> HIVE-11915.WIP.patch, HIVE-11915.patch
>
>
> It's a very old bug in BoneCP and it will never be fixed... There are 
> multiple workarounds on the internet but according to responses they are all 
> unreliable. We should upgrade to HikariCP (which in turn is only supported by 
> DN 4), meanwhile try some shamanic rituals. In this JIRA we will try a 
> relatively weak drum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11960) braces in join conditions are not supported

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935880#comment-14935880
 ] 

Sergey Shelukhin commented on HIVE-11960:
-

It's not redundant - there's ambiguity between this and virtual tables 
otherwise.
I will look at the latter. The ((join) join) case us supported and tested, but 
I suspect that ((join)) won't work. Need to see if the "obvious" way to add is 
good enough.

> braces in join conditions are not supported
> ---
>
> Key: HIVE-11960
> URL: https://issues.apache.org/jira/browse/HIVE-11960
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11960.01.patch, HIVE-11960.patch
>
>
> These should be supported; they are ANSI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-09-29 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935951#comment-14935951
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-

[~ashutoshc] So I looked into this a little bit and looks like the fix with 
HIVE-10940 isn't really going to work, mostly because it seems to be pretty 
tailored to Tez. The SerializeFilter is only called for TezCompiler[1]. Any 
reason why this is not called for MapreduceCompiler too or SparkCompiler? As a 
quick hack, I tried calling it within the MapReduceCompiler in the 
"optimizeTaskPlan" but that doesn't seem to work very well too. Might need to 
dig a little bit into what's going on there. If it's ok with you, I would 
potentially like to log a separate bug though and tackle it there just to keep 
it separate from what we are trying to do here. If that works, we can re-add 
the "transient" and only go the SerializeFilter route.

[1] 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java#L486-L490
  

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11952) disable q tests that are both slow and less relevant

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11952:

Attachment: HIVE-11952.01.patch

The patch that takes care of the other variable

> disable q tests that are both slow and less relevant
> 
>
> Key: HIVE-11952
> URL: https://issues.apache.org/jira/browse/HIVE-11952
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11952.01.patch, HIVE-11952.patch
>
>
> We will disable several tests that test obscure and old features and take 
> inordinate amount of time, and file JIRAs to look at their perf if someone 
> still cares about them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11819) HiveServer2 catches OOMs on request threads

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936051#comment-14936051
 ] 

Sergey Shelukhin commented on HIVE-11819:
-

[~sushanth] should this be ported to branch-1?

> HiveServer2 catches OOMs on request threads
> ---
>
> Key: HIVE-11819
> URL: https://issues.apache.org/jira/browse/HIVE-11819
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-11819.01.patch, HIVE-11819.02.patch, 
> HIVE-11819.patch
>
>
> ThriftCLIService methods such as ExecuteStatement are apparently capable of 
> catching OOMs because they get wrapped in RTE by HiveSessionProxy. 
> This shouldn't happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11823) create a self-contained translation for SARG to be used by metastore

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11823:

Attachment: HIVE-11823.02.patch

Rebased the patch. [~prasanth_j] ping?

> create a self-contained translation for SARG to be used by metastore
> 
>
> Key: HIVE-11823
> URL: https://issues.apache.org/jira/browse/HIVE-11823
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11823.01.patch, HIVE-11823.02.patch, 
> HIVE-11823.patch
>
>
> See HIVE-11705. This just contains the hbase-metastore-specific methods from 
> that patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11925) Hive file format checking breaks load from named pipes

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11925:

Attachment: HIVE-11925.01.patch

Fix the check to not be done for HDFS, and to handle unknown fs-es.

> Hive file format checking breaks load from named pipes
> --
>
> Key: HIVE-11925
> URL: https://issues.apache.org/jira/browse/HIVE-11925
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11925.01.patch, HIVE-11925.patch
>
>
> Opening the file and mucking with it when hive.fileformat.check is true (the 
> default) breaks the LOAD command from a named pipe. Right now, it's done for 
> all the text files blindly to see if they might be in some other format. 
> Files.getAttribute can be used to figure out if the input is a named pipe (or 
> a socket) and skip the format check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-29 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.patch

Uploading patch

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: HIVE-11642.14.patch

A more recent diff

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.12.patch, HIVE-11642.13.patch, HIVE-11642.14.patch, 
> HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10595) Dropping a table can cause NPEs in the compactor

2015-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10595:
--
Component/s: Transactions

> Dropping a table can cause NPEs in the compactor
> 
>
> Key: HIVE-10595
> URL: https://issues.apache.org/jira/browse/HIVE-10595
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 1.2.0
>
> Attachments: HIVE-10595.1.patch, HIVE-10595.patch
>
>
> Reproduction:
> # start metastore with compactor off
> # insert enough entries in a table to trigger a compaction
> # drop the table
> # stop metastore
> # restart metastore with compactor on
> Result:  NPE in the compactor threads.  I suspect this would also happen if 
> the inserts and drops were done in between a run of the compactor, but I 
> haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-09-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936035#comment-14936035
 ] 

Gopal V commented on HIVE-11355:


[~vikram.dixit]: the feature seems to over-estimate sorter sizes larger than 
the oldgen sizes in the JVM, the Xmx is 80% of the container size & the goal of 
this is to only scale down the buffers from their configured size.

I noticed that it occasionally, the decider decides to scale it upwards to bad 
results.

{code}
], TaskAttempt 3 failed, info=[Error: Failure while running task: 
attempt_1442254312093_1019_1_00_16_3:java.lang.IllegalArgumentException: 
tez.runtime.io.sort.mb 8187 should be larger than 0 and should be less than the 
available task memory (MB):6311
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.getInitialMemoryRequirement(ExternalSorter.java:338)
at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.initialize(OrderedPartitionedKVOutput.java:92)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:477)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:455)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

To repro, run query28 on 30Tb scale (planner on cn105).

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch, HIVE-11355.4.patch, HIVE-11355.5.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11992) speed up and re-enable slow q test files in Hive

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11992:

Priority: Minor  (was: Major)

> speed up and re-enable slow q test files in Hive
> 
>
> Key: HIVE-11992
> URL: https://issues.apache.org/jira/browse/HIVE-11992
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> Due to perceived lack of importance and long runtimes, we have disabled the 
> following q files:
> CliDriver:
>   rcfile_merge1.q,\
> MinimrCliDriver:
>   ql_rewrite_gbtoidx.q,\
>   ql_rewrite_gbtoidx_cbo_1.q,\
>   ql_rewrite_gbtoidx_cbo_2.q,\
>   smb_mapjoin_8.q,\
> If someone thinks any of these are important, they should be re-enabled, 
> however, their runtime should be made acceptable first (they take 10-30 
> minutes right now, and should take 3 minutes at most, ideally 0-2).
> Please feel free to look at all of these, or file sub-tasks to look at a 
> subset of the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11960) braces in join conditions are not supported

2015-09-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936034#comment-14936034
 ] 

Pengcheng Xiong commented on HIVE-11960:


LGTM +1 pending QA run.

> braces in join conditions are not supported
> ---
>
> Key: HIVE-11960
> URL: https://issues.apache.org/jira/browse/HIVE-11960
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11960.01.patch, HIVE-11960.02.patch, 
> HIVE-11960.patch
>
>
> These should be supported; they are ANSI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11993) It is not necessary to start tez session when cli with parameter "-e" or "-f"

2015-09-29 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated HIVE-11993:
--
Description: 
With "-e" or "-f", hive execute under batch mode, so I don't think it is 
necessary to start tez session When hive session is started
Especially when I only want to execute DDL.  Tez Session can be started when it 
is needed in this mode. 

  was:
With "-e" or "-f", hive execute under batch mode, so I don't think it is 
necessary to start tez session When hive session is started. 
Especially when I only want to execute DDL. 


> It is not necessary to start tez session when cli with parameter "-e" or "-f"
> -
>
> Key: HIVE-11993
> URL: https://issues.apache.org/jira/browse/HIVE-11993
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Jeff Zhang
>
> With "-e" or "-f", hive execute under batch mode, so I don't think it is 
> necessary to start tez session When hive session is started
> Especially when I only want to execute DDL.  Tez Session can be started when 
> it is needed in this mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11985) handle long typenames from Avro schema in metastore

2015-09-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11985:

Attachment: HIVE-11985.01.patch

Need to change the type name to keep metastore happy. Tested this on cluster 
with a giant Avro schema, I can create the table, query it (it's empty though) 
and describe correctly. At any rate, it's an improvement over existing 
truncated type name. [~ashutoshc] do you want to review or suggest a reviewer? 
:)

Btw, this case will also fail on Oracle (before the patch), as it doesn't allow 
the data to be truncated on insert.

> handle long typenames from Avro schema in metastore
> ---
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11952) disable q tests that are both slow and less relevant

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936025#comment-14936025
 ] 

Sergey Shelukhin commented on HIVE-11952:
-

Tests didn't execute, as planned. Will remove from that variable and commit

> disable q tests that are both slow and less relevant
> 
>
> Key: HIVE-11952
> URL: https://issues.apache.org/jira/browse/HIVE-11952
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11952.patch
>
>
> We will disable several tests that test obscure and old features and take 
> inordinate amount of time, and file JIRAs to look at their perf if someone 
> still cares about them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11989) vector_groupby_reduce.q is failing on CLI and MiniTez drivers on master

2015-09-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935966#comment-14935966
 ] 

Matt McCline commented on HIVE-11989:
-

+1 lgtm

> vector_groupby_reduce.q is failing on CLI and MiniTez drivers on master
> ---
>
> Key: HIVE-11989
> URL: https://issues.apache.org/jira/browse/HIVE-11989
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11989.01.patch
>
>
> need to update the golden files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11970) COLUMNS_V2 table in metastore should have a longer name field

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936023#comment-14936023
 ] 

Sergey Shelukhin commented on HIVE-11970:
-

[~sushanth] [~thejas] ping?

> COLUMNS_V2 table in metastore should have a longer name field
> -
>
> Key: HIVE-11970
> URL: https://issues.apache.org/jira/browse/HIVE-11970
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11970.patch
>
>
> In some cases, esp. with derived names, e.g. from Avro schemas, the column 
> names can be pretty long. COLUMNS_V2 name field has a very short length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11992) speed up and re-enable slow q test files in Hive

2015-09-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936076#comment-14936076
 ] 

Gopal V commented on HIVE-11992:


If these need to be brought back, moving these tests to a separate test-shard 
would do the trick - since they'll run in parallel instead of blocking other 
tests.

> speed up and re-enable slow q test files in Hive
> 
>
> Key: HIVE-11992
> URL: https://issues.apache.org/jira/browse/HIVE-11992
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> Due to perceived lack of importance and long runtimes, we have disabled the 
> following q files:
> CliDriver:
>   rcfile_merge1.q,\
> MinimrCliDriver:
>   ql_rewrite_gbtoidx.q,\
>   ql_rewrite_gbtoidx_cbo_1.q,\
>   ql_rewrite_gbtoidx_cbo_2.q,\
>   smb_mapjoin_8.q,\
> If someone thinks any of these are important, they should be re-enabled, 
> however, their runtime should be made acceptable first (they each take 10-30 
> minutes right now, and should take 3 minutes at most, ideally 0-2).
> Please feel free to look at all of these, or file sub-tasks to look at a 
> subset of the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936101#comment-14936101
 ] 

Hive QA commented on HIVE-11977:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764065/HIVE-11977.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5462/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5462/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5462/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5462/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   dc130f0..1636292  branch-1   -> origin/branch-1
   a5ffa71..6a8d7e4  master -> origin/master
+ git reset --hard HEAD
HEAD is now at a5ffa71 HIVE-11724 : WebHcat get jobs to order jobs on time 
order with latest at top (Kiran Kumar Kolli, reviewed by Hari Subramaniyan)
+ git clean -f -d
Removing 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveHepPlannerContext.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveVolcanoPlannerContext.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRulesRegistry.java
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 6a8d7e4 HIVE-11819 : HiveServer2 catches OOMs on request threads 
(Sergey Shelukhin, reviewed by Vaibhav Gumashta)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
patch:  malformed patch at line 34: @@ -146,7 +156,7 @@ private boolean 
pathIsInPartition(Path split, String partitionPath) {

patch:  malformed patch at line 34: @@ -146,7 +156,7 @@ private boolean 
pathIsInPartition(Path split, String partitionPath) {

patch:  malformed patch at line 34: @@ -146,7 +156,7 @@ private boolean 
pathIsInPartition(Path split, String partitionPath) {

The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12764065 - PreCommit-HIVE-TRUNK-Build

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Attachments: HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect 

[jira] [Commented] (HIVE-11962) Improve windowing_windowspec2.q tests to return consistent results

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936218#comment-14936218
 ] 

Hive QA commented on HIVE-11962:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764073/HIVE-11962.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9638 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5463/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5463/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5463/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12764073 - PreCommit-HIVE-TRUNK-Build

> Improve windowing_windowspec2.q tests to return consistent results
> --
>
> Key: HIVE-11962
> URL: https://issues.apache.org/jira/browse/HIVE-11962
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-11962.patch
>
>
> Upstream test result for windowing_windowspec2.q seems to return consistent 
> result while we have observe that in a different test env, the result could 
> be slightly different. 
> e.g., for the following query, the value t could be the same in each 
> partition of ts. So the row order could be either for those rows. I haven't 
> looked further why it causes that difference yet.
> {noformat} 
> select ts, f, max(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936304#comment-14936304
 ] 

Hive QA commented on HIVE-11642:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764316/HIVE-11642.14.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9986 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-orc_ppd_decimal.q-vector_decimal_round.q-metadata_export_drop.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5464/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5464/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5464/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12764316 - PreCommit-HIVE-TRUNK-Build

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.12.patch, HIVE-11642.13.patch, HIVE-11642.14.patch, 
> HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11699) Support special characters in quoted table names

2015-09-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11699:
---
Attachment: HIVE-11699.06.patch

> Support special characters in quoted table names
> 
>
> Key: HIVE-11699
> URL: https://issues.apache.org/jira/browse/HIVE-11699
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11699.01.patch, HIVE-11699.02.patch, 
> HIVE-11699.03.patch, HIVE-11699.04.patch, HIVE-11699.05.patch, 
> HIVE-11699.06.patch
>
>
> Right now table names can only be "[a-zA-z_0-9]+". This patch tries to 
> investigate how much change there should be if we would like to support 
> special characters, e.g., "/" in table names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11937) Improve StatsOptimizer to deal with query with additional constant columns

2015-09-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11937:
---
Fix Version/s: 2.0.0

> Improve StatsOptimizer to deal with query with additional constant columns
> --
>
> Key: HIVE-11937
> URL: https://issues.apache.org/jira/browse/HIVE-11937
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-11937.01.patch, HIVE-11937.02.patch
>
>
> Right now StatsOptimizer can deal with query such as "select count(1) from 
> src" by directly looking into the metastore. However, it can not deal with 
> "select '1' as one, count(1) from src" which has an additional constant 
> column. We may improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we can reduce transaction isolation level

2015-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11948:
--
Description: 
at least some operations (or parts of operations) can run at READ_COMMITTED.
CompactionTxnHandler.setRunAs()

CompactionTxnHandler.findNextToCompact()
if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
and logic to look for "next" candidate

CompactionTxnHandler.markCompacted()
perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
consistency check)


  was:at least some operations (or parts of operations) can run at 
READ_COMMITTED.


> Investigate TxnHandler and CompactionTxnHandler to see where we can reduce 
> transaction isolation level
> --
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11400) insert overwrite task awasy stuck at latest job

2015-09-29 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan resolved HIVE-11400.
--
Resolution: Cannot Reproduce

> insert overwrite task awasy stuck at latest job
> ---
>
> Key: HIVE-11400
> URL: https://issues.apache.org/jira/browse/HIVE-11400
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Affects Versions: 0.14.0
> Environment: hadoop 2.6.0,centos 6.5
>Reporter: Feng Yuan
> Attachments: failed_logs, success_logs, task_explain
>
>
> when i run a task like "insert overwrite table a (select *  from b join 
> select * from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the 
> parser explain the task has 3 jobs,but the third job(or stage) will never get 
> executed).
> there have two files:
> 1.hql explain file.
> 2.running logs.
> you will see the stage-0 in explain file is Move Operation,but you will not 
> see it in the running logs.and the fact is 16 of 17 jobs has 
> complete(actually the 13th job get lost?i dont see anywhere in the logs),but 
> the 17th job get hanged forever.and even it not bean assigned a jobid and 
> launched!
> there are someone can help this?
> Thanks for you very much~!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11937) Improve StatsOptimizer to deal with query with additional constant columns

2015-09-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936185#comment-14936185
 ] 

Pengcheng Xiong commented on HIVE-11937:


The failed tests are unrelated. Pushed to master. Thanks [~ashutoshc] for the 
review!

> Improve StatsOptimizer to deal with query with additional constant columns
> --
>
> Key: HIVE-11937
> URL: https://issues.apache.org/jira/browse/HIVE-11937
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11937.01.patch, HIVE-11937.02.patch
>
>
> Right now StatsOptimizer can deal with query such as "select count(1) from 
> src" by directly looking into the metastore. However, it can not deal with 
> "select '1' as one, count(1) from src" which has an additional constant 
> column. We may improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-09-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936395#comment-14936395
 ] 

Lefty Leverenz commented on HIVE-10965:
---

Thanks, I wasn't sure if I should change the fix version myself, but this is 
better.

> direct SQL for stats fails in 0-column case
> ---
>
> Key: HIVE-10965
> URL: https://issues.apache.org/jira/browse/HIVE-10965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.1, 1.0.2
>
> Attachments: HIVE-10965.01.patch, HIVE-10965.02.patch, 
> HIVE-10965.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11930) how to prevent ppd the topN(a) udf predication in where clause?

2015-09-29 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936180#comment-14936180
 ] 

Feng Yuan commented on HIVE-11930:
--

do you mean this:

@UDFType(stateful=true)
public class top1000() extends UDF{}

i try like this,but my sql is:
...
  where  a.customer='Cdianyingwang'
  and a.taskid='33'
  and a.step_id='0' 
  and top1000(a.only_id)<=10;

complier say top1000 shouldnt place in where clause.

> how to prevent ppd the topN(a) udf predication in where clause?
> ---
>
> Key: HIVE-11930
> URL: https://issues.apache.org/jira/browse/HIVE-11930
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 0.14.0
>Reporter: Feng Yuan
>Priority: Minor
>
> select 
> a.state_date,a.customer,a.taskid,a.step_id,a.exit_title,a.pv,top1000(a.only_id)
>   from
> (  select 
> t1.state_date,t1.customer,t1.taskid,t1.step_id,t1.exit_title,t1.pv,t1.only_id
>   from 
>   ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
>order by t1.only_id,t1.pv desc
>  )a
>   where  a.customer='Cdianyingwang'
>   and a.taskid='33'
>   and a.step_id='0' 
>   and top1000(a.only_id)<=10;
> in above example:
> outer top1000(a.only_id)<=10;will ppd to:
> stage 1:
> ( select t11.state_date,
>t11.customer,
>t11.taskid,
>t11.step_id,
>t11.exit_title,
>t11.pv,
>concat(t11.customer,t11.taskid,t11.step_id) as 
> only_id
>from
>   (  select 
> state_date,customer,taskid,step_id,exit_title,count(*) as pv
>  from bdi_fact2.mid_url_step
>  where exit_url!='-1'
>  and exit_title !='-1'
>  and l_date='2015-08-31'
>  group by 
> state_date,customer,taskid,step_id,exit_title
> )t11
>)t1
> and this stage have 2 reduce,so you can see this will output 20 records,
> upon to outer stage,the final results is exactly this 20 records.
> so i want to know is there any way to hint this topN udf predication not to 
> ppd?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11972) [Refactor] Improve determination of dynamic partitioning columns in FileSink Operator

2015-09-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11972:

Attachment: HIVE-11972.3.patch

> [Refactor] Improve determination of dynamic partitioning columns in FileSink 
> Operator
> -
>
> Key: HIVE-11972
> URL: https://issues.apache.org/jira/browse/HIVE-11972
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11972.2.patch, HIVE-11972.3.patch, HIVE-11972.patch
>
>
> Currently it uses column names to locate DP columns, which is brittle since 
> column names may change during planning and optimization phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-09-29 Thread Aaron Dossett (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Dossett updated HIVE-11977:
-
Attachment: HIVE-11977-002.patch

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Attachments: HIVE-11977-002.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-09-29 Thread Aaron Dossett (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936249#comment-14936249
 ] 

Aaron Dossett commented on HIVE-11977:
--

Attached a second patch that includes a unit test and better patch formatting

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Attachments: HIVE-11977-2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-09-29 Thread Aaron Dossett (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Dossett updated HIVE-11977:
-
Attachment: HIVE-11977-2.patch

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Attachments: HIVE-11977-2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-09-29 Thread Aaron Dossett (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Dossett updated HIVE-11977:
-
Attachment: (was: HIVE-11977-002.patch)

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Attachments: HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11445) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby distinct does not work

2015-09-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11445:
---
Attachment: HIVE-11445.02.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby 
> distinct does not work
> -
>
> Key: HIVE-11445
> URL: https://issues.apache.org/jira/browse/HIVE-11445
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11445.01.patch, HIVE-11445.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935505#comment-14935505
 ] 

Hive QA commented on HIVE-11684:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12764000/HIVE-11684.12.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9633 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_join30.q-vector_data_types.q-filter_join_breaktask.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5459/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5459/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5459/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12764000 - PreCommit-HIVE-TRUNK-Build

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.03.patch, HIVE-11684.04.patch, HIVE-11684.05.patch, 
> HIVE-11684.07.patch, HIVE-11684.08.patch, HIVE-11684.09.patch, 
> HIVE-11684.10.patch, HIVE-11684.11.patch, HIVE-11684.12.patch, 
> HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11971) testResultSetMetaData() in TestJdbcDriver2.java is failing on CBO AST path

2015-09-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11971:
---
Summary: testResultSetMetaData() in TestJdbcDriver2.java is failing on CBO 
AST path  (was: testResultSetMetaData() in TestJdbc2.java is failing on CBO AST 
path)

> testResultSetMetaData() in TestJdbcDriver2.java is failing on CBO AST path
> --
>
> Key: HIVE-11971
> URL: https://issues.apache.org/jira/browse/HIVE-11971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11971.01.patch
>
>
> test is passing because wrong golden file is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-
Attachment: HIVE-11634.96.patch

[~jcamachorodriguez] Can you please look at the latest patch, made the required 
changes.

Thanks
Hari

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> --
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator

2015-09-29 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935574#comment-14935574
 ] 

Laljo John Pullokkaran commented on HIVE-11976:
---

Patch looks good. May be we should address the following:

1. HivePreFilter Rule the bail out condition should be modified (Pullup 
predicate should use the child real node).

2. Should we register child filter as well so that rule doesn't fire on child.

> Extend CBO rules to being able to apply rules only once on a given operator
> ---
>
> Key: HIVE-11976
> URL: https://issues.apache.org/jira/browse/HIVE-11976
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11976.patch
>
>
> Create a way to bail out quickly from HepPlanner if the rule has been already 
> applied on a certain operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11971) testResultSetMetaData() in TestJdbc2.java is failing on CBO AST path

2015-09-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935467#comment-14935467
 ] 

Pengcheng Xiong commented on HIVE-11971:


The failed tests are unrelated. [~ashutoshc] or [~jpullokkaran], could you 
please take a look? Thanks.

> testResultSetMetaData() in TestJdbc2.java is failing on CBO AST path
> 
>
> Key: HIVE-11971
> URL: https://issues.apache.org/jira/browse/HIVE-11971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11971.01.patch
>
>
> test is passing because wrong golden file is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-09-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935570#comment-14935570
 ] 

Thejas M Nair commented on HIVE-10965:
--

Thanks for catching that [~leftylev]!
[~pxiong] was back porting some critical fixes to 1.0 line. I had an offline 
discussion with him now clarifying the process to him, he is going to update 
the fix version for couple of other jiras that were backported.


> direct SQL for stats fails in 0-column case
> ---
>
> Key: HIVE-10965
> URL: https://issues.apache.org/jira/browse/HIVE-10965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.1, 1.0.2
>
> Attachments: HIVE-10965.01.patch, HIVE-10965.02.patch, 
> HIVE-10965.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11445) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby distinct does not work

2015-09-29 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935479#comment-14935479
 ] 

Jesus Camacho Rodriguez commented on HIVE-11445:


Problem was that distinct nodes that are part of the key were being added to 
distExprNodes; patch solves that issue.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby 
> distinct does not work
> -
>
> Key: HIVE-11445
> URL: https://issues.apache.org/jira/browse/HIVE-11445
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11445.01.patch, HIVE-11445.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11835) Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL

2015-09-29 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1493#comment-1493
 ] 

Szehon Ho commented on HIVE-11835:
--

Thanks for the clarification.

> Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL
> -
>
> Key: HIVE-11835
> URL: https://issues.apache.org/jira/browse/HIVE-11835
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11835.1.patch, HIVE-11835.2.patch, HIVE-11835.patch
>
>
> Steps to reproduce:
> 1. create a text file with values like 0.0, 0.00, etc.
> 2. create table in hive with type decimal(1,1).
> 3. run "load data local inpath ..." to load data into the table.
> 4. run select * on the table.
> You will see that NULL is displayed for 0.0, 0.00, .0, etc. Instead, these 
> should be read as 0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8527) Incorrect TIMESTAMP result on JDBC direct read when next row has no (null) value for the TIMESTAMP

2015-09-29 Thread David Zanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Zanter resolved HIVE-8527.

   Resolution: Fixed
Fix Version/s: 1.1.0

I can verify that this is fixed in Hive version 1.1.0.  (may have been fixed 
earlier than that as well.)

Seems to have been fixed by the same thing that fixed HIVE-8297.

> Incorrect TIMESTAMP result on JDBC direct read when next row has no (null) 
> value for the TIMESTAMP
> --
>
> Key: HIVE-8527
> URL: https://issues.apache.org/jira/browse/HIVE-8527
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.13.0
> Environment: Linux
>Reporter: Doug Sedlak
> Fix For: 1.1.0
>
>
> For the case:
>  SELECT * FROM [table]
> JDBC direct reads the table backing data, versus cranking up a MR and 
> creating a result set.  This report is another direct read JDBC issue with 
> TIMESTAMPS, see HIVE-8297 also.
> As in title, a succeeding row with no value corrupts the value read for the 
> current row.  To reproduce using beeline:
> 1) Create this file as follows in HDFS.
> $ cat > /tmp/ts2.txt
> 2014-09-28 00:00:00,2014-09-28 00:00:00,
> ,,
>  
> $ hadoop fs -copyFromLocal /tmp/ts2.txt /tmp/ts2.txt
> 2) In beeline load above HDFS data to a TEXTFILE table:
>  $ beeline
>  > !connect jdbc:hive2://:/ hive pass 
> org.apache.hive.jdbc.HiveDriver
>  > drop table `TIMESTAMP_TEXT2`;
>  > CREATE TABLE `TIMESTAMP_TEXT2` (`ts1` TIMESTAMP, `ts2` TIMESTAMP) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES TERMINATED BY '\012' 
> STORED AS TEXTFILE;
>  > LOAD DATA INPATH '/tmp/ts2.txt' OVERWRITE INTO TABLE
>  `TIMESTAMP_TEXT2`;
> 3) To demonstrate the corrupt data read, in beeline: 
> > select * from `TIMESTAMP_TEXT2`;
> Note 1: The incorrect conduct demonstrated above replicates with a standalone 
> Java/JDBC program.
> Note 2: Don't know if this is an issue with any other data types, also don't 
> know what releases affected, however this occurs in Hive 13. Hive CLI works 
> fine. Also works fine if you force a MR:
>  select * from `TIMESTAMP_TEXT2` where 1=1;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11724) WebHcat get jobs to order jobs on time order with latest at top

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11724:
-
Labels: TODOC1.3  (was: )

> WebHcat get jobs to order jobs on time order with latest at top
> ---
>
> Key: HIVE-11724
> URL: https://issues.apache.org/jira/browse/HIVE-11724
> Project: Hive
>  Issue Type: Improvement
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Kiran Kumar Kolli
>Assignee: Kiran Kumar Kolli
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11724.1.patch, HIVE-11724.2.patch, 
> HIVE-11724.3.patch, HIVE-11724.4.patch, HIVE-11724.5.patch, HIVE-11724.6.patch
>
>
> HIVE-5519 added pagination feature support to WebHcat. This implementation 
> returns the jobs lexicographically resulting in older jobs showing at the 
> top. 
> Improvement is to order them on time with latest at top. Typically latest 
> jobs (or running) ones are more relevant to the user. Time based ordering 
> with pagination makes more sense. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) handle long typenames from Avro schema in metastore

2015-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935699#comment-14935699
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

[~xuefuz] [~sachingoyal] are you familiar with it? I wonder who is. most 
commits on these files are pretty old, you have one in 2014 :)

> handle long typenames from Avro schema in metastore
> ---
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11850) On Windows, creating udf function using wasb fail throwing java.lang.RuntimeException: invalid url: wasb:///... expecting ( file | hdfs | ivy) as url scheme.

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11850:
-
Summary: On Windows, creating udf function using wasb fail throwing 
java.lang.RuntimeException: invalid url: wasb:///...  expecting ( file | hdfs | 
ivy)  as url scheme.  (was: On Humboldt, creating udf function using wasb fail 
throwing java.lang.RuntimeException: invalid url: wasb:///...  expecting ( file 
| hdfs | ivy)  as url scheme.)

> On Windows, creating udf function using wasb fail throwing 
> java.lang.RuntimeException: invalid url: wasb:///...  expecting ( file | hdfs 
> | ivy)  as url scheme.
> ---
>
> Key: HIVE-11850
> URL: https://issues.apache.org/jira/browse/HIVE-11850
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.1
> Environment: Humboldt
>Reporter: Takahiko Saito
> Fix For: 1.2.1
>
>
> {noformat}
> hive> drop function if exists gencounter;
> OK
> Time taken: 2.614 seconds
> On Humboldt, creating UDF function fail as follows:
> hive> create function gencounter as 
> 'org.apache.hive.udf.generic.GenericUDFGenCounter' using jar 
> 'wasb:///tmp/hive-udfs-0.1.jar';
> invalid url: wasb:///tmp/hive-udfs-0.1.jar, expecting ( file | hdfs | ivy)  
> as url scheme.
> Failed to register default.gencounter using class 
> org.apache.hive.udf.generic.GenericUDFGenCounter
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> {noformat}
> The jar exists in wasb dir:
> {noformat}
> hrt_qa@headnode0:~$ hadoop fs -ls wasb:///tmp/
> Found 2 items
> -rw-r--r--   1 hrt_qa supergroup   4472 2015-09-16 11:50 
> wasb:///tmp/hive-udfs-0.1.jar
> drwxrwxrwx   - hdfs   supergroup  0 2015-09-16 12:00 
> wasb:///tmp/阿䶵aa阿䶵
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11990) Loading data inpath from a temporary table dir fails on Humboldt

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11990:
-
Attachment: HIVE-11990.1.patch

[~jdere] Can you please review the change. 
Now that we support move files from one file system to another, we can remove 
the following code in LoadSemanticAnalyzer.java

{code}
// only in 'local' mode do we copy stuff from one place to another.
// reject different scheme/authority in other cases.
if (!isLocal
&& (!StringUtils.equals(fromURI.getScheme(), toURI.getScheme()) || 
!StringUtils
.equals(fromURI.getAuthority(), toURI.getAuthority( {
  String reason = "Move from: " + fromURI.toString() + " to: "
  + toURI.toString() + " is not valid. "
  + "Please check that values for params \"default.fs.name\" and "
  + "\"hive.metastore.warehouse.dir\" do not conflict.";
  throw new SemanticException(ErrorMsg.ILLEGAL_PATH.getMsg(ast, reason));
}

{code}

Thanks
Hari

> Loading data inpath from a temporary table dir fails on Humboldt
> 
>
> Key: HIVE-11990
> URL: https://issues.apache.org/jira/browse/HIVE-11990
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11990.1.patch
>
>
> The query runs:
> {noformat}
> load data inpath 'wasb:///tmp/testtemptable/temptablemisc_5/data' overwrite 
> into table temp2;
> {noformat}
> It fails with:
> {noformat}
> FAILED: SemanticException [Error 10028]: Line 2:37 Path is not legal 
> ''wasb:///tmp/testtemptable/temptablemisc_5/data'': Move from: 
> wasb://humb23-hi...@humboldttesting3.blob.core.windows.net/tmp/testtemptable/temptablemisc_5/data
>  to: 
> hdfs://headnode0.humb23-hive1-ssh.h2.internal.cloudapp.net:8020/tmp/hive/hrt_qa/0d5f8b31-5908-44bf-ae4c-9eee956da066/_tmp_space.db/75b44252-42a7-4d28-baf8-4977daa5d49c
>  is not valid. Please check that values for params "default.fs.name" and 
> "hive.metastore.warehouse.dir" do not conflict.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11990) Loading data inpath from a temporary table dir fails on Windows

2015-09-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11990:
-
Summary: Loading data inpath from a temporary table dir fails on Windows  
(was: Loading data inpath from a temporary table dir fails on Humboldt)

> Loading data inpath from a temporary table dir fails on Windows
> ---
>
> Key: HIVE-11990
> URL: https://issues.apache.org/jira/browse/HIVE-11990
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11990.1.patch
>
>
> The query runs:
> {noformat}
> load data inpath 'wasb:///tmp/testtemptable/temptablemisc_5/data' overwrite 
> into table temp2;
> {noformat}
> It fails with:
> {noformat}
> FAILED: SemanticException [Error 10028]: Line 2:37 Path is not legal 
> ''wasb:///tmp/testtemptable/temptablemisc_5/data'': Move from: 
> wasb://humb23-hi...@humboldttesting3.blob.core.windows.net/tmp/testtemptable/temptablemisc_5/data
>  to: 
> hdfs://headnode0.humb23-hive1-ssh.h2.internal.cloudapp.net:8020/tmp/hive/hrt_qa/0d5f8b31-5908-44bf-ae4c-9eee956da066/_tmp_space.db/75b44252-42a7-4d28-baf8-4977daa5d49c
>  is not valid. Please check that values for params "default.fs.name" and 
> "hive.metastore.warehouse.dir" do not conflict.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11915) BoneCP returns closed connections from the pool

2015-09-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935599#comment-14935599
 ] 

Thejas M Nair commented on HIVE-11915:
--

The retries are being set only for bonecp, but basing the log message on that 
seems very brittle. Other developers might add retries for other connection 
pooling types by setting getConnAttemptCount, and easily overlook updating the 
log message.

Even in case of bonecp exceptions, in some cases the error can be 
non-recoverable. This is a fatal error and should be rare. The delay due to 
retries likely to be very small (not easily noticeable to the user). I think 
that delay would be acceptable for the circumstance.
This looks like a tradeoff between more easy to maintain code and a delay that 
users are unlikely to notice.


> BoneCP returns closed connections from the pool
> ---
>
> Key: HIVE-11915
> URL: https://issues.apache.org/jira/browse/HIVE-11915
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11915.01.patch, HIVE-11915.WIP.patch, 
> HIVE-11915.patch
>
>
> It's a very old bug in BoneCP and it will never be fixed... There are 
> multiple workarounds on the internet but according to responses they are all 
> unreliable. We should upgrade to HikariCP (which in turn is only supported by 
> DN 4), meanwhile try some shamanic rituals. In this JIRA we will try a 
> relatively weak drum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >