[jira] [Commented] (HIVE-6113) Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-11-17 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008513#comment-15008513
 ] 

Oleksiy Sayankin commented on HIVE-6113:


>  Have you verified the elements in 
> http://www.datanucleus.org/products/accessplatform_4_2/migration.html to see 
> if we won't be affected adversely?

Yes. Added HIVE-6113-2.patch where applied all changes regarding DN version 
migration from 3.X.X to 4.X.X. See https://reviews.apache.org/r/40344/

Code change summary. 
Renamed:
datanucleus.validateTables ---> datanucleus.schema.validateTables
datanucleus.validateColumns ---> datanucleus.schema.validateColumns
datanucleus.validateConstraints ---> datanucleus.schema.validateConstraints
datanucleus.autoCreateSchema ---> datanucleus.schema.autoCreateAll

Deleted:
datanucleus.fixedDatastore

> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> --
>
> Key: HIVE-6113
> URL: https://issues.apache.org/jira/browse/HIVE-6113
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.1
> Environment: hadoop-0.20.2-cdh3u3,hive-0.12.0
>Reporter: William Stone
>Assignee: Oleksiy Sayankin
>Priority: Critical
>  Labels: HiveMetaStoreClient, metastore, unable_instantiate
> Attachments: HIVE-6113-2.patch, HIVE-6113.patch
>
>
> When I exccute SQL "use fdm; desc formatted fdm.tableName;"  in python, throw 
> Error as followed.
> but when I tryit again , It will success.
> 2013-12-25 03:01:32,290 ERROR exec.DDLTask (DDLTask.java:execute(435)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:507)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:875)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:769)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:708)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1217)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:62)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2372)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2383)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1210)
>   ... 25 more
> Caused by: javax.jdo.JDODataStoreException: Exception thrown flushing changes 
> to datastore
> NestedThrowables:
> 

[jira] [Commented] (HIVE-6113) Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-11-17 Thread Eli Acherkan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008558#comment-15008558
 ] 

Eli Acherkan commented on HIVE-6113:


> @Eli Acherkan : Very interesting analysis. Could you point me to where you 
> see the following:
>> If a table is deleted from the DB during this operation, 
>> DatabaseMetaData.getColumns will throw an exception.
>> This exception is interpreted by Hive to mean that the "default" Hive 
>> database doesn't exist.

Sure, let me try to explain.

#When DatabaseMetaData.getColumns throws an SQLException, it's caught by 
RDBMSSchemaHandler.refreshTableData and rethrown as a NucleusDataStoreException.
#This one is then caught by JDOQLQuery.compileQueryFull, which doesn't rethrow 
an exception - it simply returns an empty result.
#ObjectStore.getMDatabase then receives the empty result and throws a 
NoSuchObjectException.
#This exception is caught by HiveMetaStore.createDefaultDB_core, and taken to 
mean that the default DB doesn't exist.
#The createDefaultDB_core method then proceeds to try to create a DB, which 
fails because the DB actually _does_exist already.

Please let me know if the above is unclear.

> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> --
>
> Key: HIVE-6113
> URL: https://issues.apache.org/jira/browse/HIVE-6113
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.1
> Environment: hadoop-0.20.2-cdh3u3,hive-0.12.0
>Reporter: William Stone
>Assignee: Oleksiy Sayankin
>Priority: Critical
>  Labels: HiveMetaStoreClient, metastore, unable_instantiate
> Attachments: HIVE-6113-2.patch, HIVE-6113.patch
>
>
> When I exccute SQL "use fdm; desc formatted fdm.tableName;"  in python, throw 
> Error as followed.
> but when I tryit again , It will success.
> 2013-12-25 03:01:32,290 ERROR exec.DDLTask (DDLTask.java:execute(435)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:507)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:875)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:769)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:708)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1217)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:62)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2372)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2383)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> 

[jira] [Commented] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008469#comment-15008469
 ] 

Hive QA commented on HIVE-12384:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772568/HIVE-12384.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9784 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6054/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6054/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6054/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772568 - PreCommit-HIVE-TRUNK-Build

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-17 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11981:

Attachment: HIVE-11981.0992.patch

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6113) Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-11-17 Thread Oleksiy Sayankin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-6113:
---
Attachment: HIVE-6113-2.patch

> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> --
>
> Key: HIVE-6113
> URL: https://issues.apache.org/jira/browse/HIVE-6113
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.1
> Environment: hadoop-0.20.2-cdh3u3,hive-0.12.0
>Reporter: William Stone
>Assignee: Oleksiy Sayankin
>Priority: Critical
>  Labels: HiveMetaStoreClient, metastore, unable_instantiate
> Attachments: HIVE-6113-2.patch, HIVE-6113.patch
>
>
> When I exccute SQL "use fdm; desc formatted fdm.tableName;"  in python, throw 
> Error as followed.
> but when I tryit again , It will success.
> 2013-12-25 03:01:32,290 ERROR exec.DDLTask (DDLTask.java:execute(435)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:507)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:875)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:769)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:708)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1217)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:62)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2372)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2383)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1210)
>   ... 25 more
> Caused by: javax.jdo.JDODataStoreException: Exception thrown flushing changes 
> to datastore
> NestedThrowables:
> java.sql.BatchUpdateException: Duplicate entry 'default' for key 
> 'UNIQUE_DATABASE'
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
>   at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:165)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:358)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.createDatabase(ObjectStore.java:404)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> 

[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Summary: Merge master into spark 11/17/2015 [Spark Branch]  (was: Merge 
trunk into spark 11/17/2015 [Spark Branch])

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we improve concurrency

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008937#comment-15008937
 ] 

Eugene Koifman commented on HIVE-11948:
---

There is  a RB link for this patch, it may be easier to add comments there.

I'll take care of separate bugs.

The change in TxnHandler.checkLock() regarding heratbeat is intentional.  When 
there is a txn, only the txn needs to have heartbeat timestamp updated since 
that is the only timestamp checked for aborting an expired txn.  There is no 
way/reason to expire a single lock in a txn.  This simplifies both heartbeat 
and expiration operations (at least when there is a txn).


TxnHander unlock(), around line 581.  The statement to remove the lock is 
"delete from HIVE_LOCKS where hl_lock_ext_id = " + extLockId + " AND hl_txnid = 
0";
The "hl_txnid=0" ensures that this lock doesn't belong to a transaction.  So 
the unlock() operation runs in  a single SQL statement under most 
circumstances, but if the above SQL "misses", then there is some additional 
operations performed to produce a meaningful (best effort) message.  Previously 
this operation required 3 SQL statements in all cases.

TxnHandler.getRequiredIsolationLevel(), line 2270.  Since "dbProduct" is only 
set once per MS start I don't think this causes any more connections to be 
taken out...







> Investigate TxnHandler and CompactionTxnHandler to see where we improve 
> concurrency
> ---
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.3.patch, HIVE-11948.4.patch, 
> HIVE-11948.5.patch, HIVE-11948.6.patch, HIVE-11948.7.patch, HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-11-17 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-11878:
---
Attachment: HIVE-11878 ClassLoader Issues when Registering Jars.pptx

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12418) HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.

2015-11-17 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008875#comment-15008875
 ] 

Aihua Xu commented on HIVE-12418:
-

+1. Looks good to me.

> HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.
> -
>
> Key: HIVE-12418
> URL: https://issues.apache.org/jira/browse/HIVE-12418
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12418.patch
>
>
>   @Override
>   public RecordReader getRecordReader(
> ...
> ...
>  setHTable(HiveHBaseInputFormatUtil.getTable(jobConf));
> ...
> The HiveHBaseInputFormatUtil.getTable() creates new ZooKeeper 
> connections(when HTable instance is created) which are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-12433.

   Resolution: Fixed
Fix Version/s: spark-branch

Clean merge. Pushed to Spark branch.

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge trunk into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Fix Version/s: (was: 1.1.0)

> Merge trunk into spark 11/17/2015 [Spark Branch]
> 
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12433) Merge trunk into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-12433:
--

Assignee: Xuefu Zhang  (was: Brock Noland)

> Merge trunk into spark 11/17/2015 [Spark Branch]
> 
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12418) HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.

2015-11-17 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008830#comment-15008830
 ] 

Naveen Gangam commented on HIVE-12418:
--

The proposed fix closes the HTable resources when RecordReader.close() is 
called. Also overrides the Object.finalize() that closes these resources too in 
cases when RecordReader.close() is never called.

> HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.
> -
>
> Key: HIVE-12418
> URL: https://issues.apache.org/jira/browse/HIVE-12418
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12418.patch
>
>
>   @Override
>   public RecordReader getRecordReader(
> ...
> ...
>  setHTable(HiveHBaseInputFormatUtil.getTable(jobConf));
> ...
> The HiveHBaseInputFormatUtil.getTable() creates new ZooKeeper 
> connections(when HTable instance is created) which are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12418) HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.

2015-11-17 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008848#comment-15008848
 ] 

Naveen Gangam commented on HIVE-12418:
--

Code has been posted to reviewboard at https://reviews.apache.org/r/40390/. 

> HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.
> -
>
> Key: HIVE-12418
> URL: https://issues.apache.org/jira/browse/HIVE-12418
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12418.patch
>
>
>   @Override
>   public RecordReader getRecordReader(
> ...
> ...
>  setHTable(HiveHBaseInputFormatUtil.getTable(jobConf));
> ...
> The HiveHBaseInputFormatUtil.getTable() creates new ZooKeeper 
> connections(when HTable instance is created) which are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8396) Hive CliDriver command splitting can be broken when comments are present

2015-11-17 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008819#comment-15008819
 ] 

Elliot West commented on HIVE-8396:
---

'does this' -> 'parses full line comments consistently in both script and 
interactive shell modes'

> Hive CliDriver command splitting can be broken when comments are present
> 
>
> Key: HIVE-8396
> URL: https://issues.apache.org/jira/browse/HIVE-8396
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Query Processor
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>
> {noformat}
> -- SORT_QUERY_RESULTS
> set hive.cbo.enable=true;
> ... commands ...
> {noformat}
> causes
> {noformat}
> 2014-10-07 18:55:57,193 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: ParseException line 2:4 missing KW_ROLE at 'hive' near 'hive'
> {noformat}
> If the comment is moved after the command it works.
> I noticed this earlier when I comment out parts of some random q file for 
> debugging purposes, and it starts failing. This is annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12418) HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.

2015-11-17 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-12418:
-
Attachment: HIVE-12418.patch

> HiveHBaseTableInputFormat.getRecordReader() causes Zookeeper connection leak.
> -
>
> Key: HIVE-12418
> URL: https://issues.apache.org/jira/browse/HIVE-12418
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12418.patch
>
>
>   @Override
>   public RecordReader getRecordReader(
> ...
> ...
>  setHTable(HiveHBaseInputFormatUtil.getTable(jobConf));
> ...
> The HiveHBaseInputFormatUtil.getTable() creates new ZooKeeper 
> connections(when HTable instance is created) which are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12341) LLAP: add security to daemon protocol endpoint (excluding shuffle)

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008668#comment-15008668
 ] 

Hive QA commented on HIVE-12341:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772567/HIVE-12341.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9783 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6055/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6055/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6055/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772567 - PreCommit-HIVE-TRUNK-Build

> LLAP: add security to daemon protocol endpoint (excluding shuffle)
> --
>
> Key: HIVE-12341
> URL: https://issues.apache.org/jira/browse/HIVE-12341
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12341.01.patch, HIVE-12341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-11-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12435:
---
Component/s: (was: ORC)
 Vectorization

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-11-17 Thread Takahiko Saito (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takahiko Saito updated HIVE-12435:
--
Summary: SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case 
of ORC and vectorization is enabled.  (was: SELECT COUNT(CASE WHEN...) GROUPBY 
returns 1 for 'NULL' in a case of ORC)

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-11-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-12435:
--

Assignee: Gopal V

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Gopal V
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC

2015-11-17 Thread Takahiko Saito (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takahiko Saito updated HIVE-12435:
--
Summary: SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case 
of ORC  (was: SELECT COUNT(CASE WHEN...) GROUPBY returns wrong results in a 
case of ORC)

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC
> 
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-11-17 Thread Takahiko Saito (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takahiko Saito updated HIVE-12435:
--
Description: 
Run the following query:
{noformat}
create table count_case_groupby (key string, bool boolean) STORED AS orc;
insert into table count_case_groupby values ('key1', true),('key2', 
false),('key3', NULL),('key4', false),('key5',NULL);
{noformat}
The table contains the following:
{noformat}
key1true
key2false
key3NULL
key4false
key5NULL
{noformat}
The below query returns:
{noformat}
SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS 
cnt_bool0_ok FROM count_case_groupby GROUP BY key;
key11
key21
key31
key41
key51
{noformat}

while it expects the following results:
{noformat}
key11
key21
key30
key41
key50
{noformat}

The query works with hive ver 1.2. Also it works when a table is not orc format.
Also even if it's an orc table, when vectorization is disabled, the query works.

  was:
Run the following query:
{noformat}
create table count_case_groupby (key string, bool boolean) STORED AS orc;
insert into table count_case_groupby values ('key1', true),('key2', 
false),('key3', NULL),('key4', false),('key5',NULL);
{noformat}
The table contains the following:
{noformat}
key1true
key2false
key3NULL
key4false
key5NULL
{noformat}
The below query returns:
{noformat}
SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS 
cnt_bool0_ok FROM count_case_groupby GROUP BY key;
key11
key21
key31
key41
key51
{noformat}

while it expects the following results:
{noformat}
key11
key21
key30
key41
key50
{noformat}

The query works with hive ver 1.2. Also it works when a table is not orc format.


> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf and document the settings

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009022#comment-15009022
 ] 

Hive QA commented on HIVE-11358:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772583/HIVE-11358.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9746 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6056/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6056/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6056/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772583 - PreCommit-HIVE-TRUNK-Build

> LLAP: move LlapConfiguration into HiveConf and document the settings
> 
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11358.01.patch, HIVE-11358.patch
>
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12432) Hive on Spark Counter "RECORDS_OUT" always be zero

2015-11-17 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-12432:
-
Description: 
A simple way to reproduce :
set hive.execution.engine=spark;
CREATE TABLE  test(id INT);
insert into test values (1) ,(2);

  was:
A simple way to reproduce :
set hive.execution.engine=spark;
CREATE TABLE  test(id INT);
insert into test values (1) (2);


> Hive on Spark Counter "RECORDS_OUT" always  be zero
> ---
>
> Key: HIVE-12432
> URL: https://issues.apache.org/jira/browse/HIVE-12432
> Project: Hive
>  Issue Type: Bug
>  Components: Spark, Statistics
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>
> A simple way to reproduce :
> set hive.execution.engine=spark;
> CREATE TABLE  test(id INT);
> insert into test values (1) ,(2);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-11-17 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009197#comment-15009197
 ] 

Gopal V commented on HIVE-12435:


[~taksaito]: this query isn't vectorized in branch-1 (only in 2.0)

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Gopal V
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12434) Merge spark into master 11/17/1015

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12434:
---
Attachment: HIVE-12434.patch

> Merge spark into master 11/17/1015
> --
>
> Key: HIVE-12434
> URL: https://issues.apache.org/jira/browse/HIVE-12434
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12434.patch
>
>
> There are still a few patches that are in Spark branch only. We need to merge 
> them to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks

2015-11-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009159#comment-15009159
 ] 

Alan Gates commented on HIVE-12421:
---

[~roshan_naik] can comment further, but I think the thinking was some writers 
have no ability to buffer their inbound data, so blocking on a lock could force 
them to drop data.  We should change this to have the writer pass a maximum 
wait time.  If this code fails to obtain a lock in the allowed time then it can 
throw an exception that makes clear what happened and let the writer decide 
what to do from there.

> Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we improve concurrency

2015-11-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009126#comment-15009126
 ] 

Alan Gates commented on HIVE-11948:
---

bq. There is a RB link for this patch, it may be easier to add comments there.
I know, I just don't like review board.  It's harder to keep track of comments 
and feedback because they're not in the JIRA.  I look forward to moving to pull 
requests someday so feedback and changes can be tracked in one place 
conveniently.

bq. The change in TxnHandler.checkLock() regarding heratbeat is intentional...
Makes sense.

bq. TxnHander unlock(), around line 581...
My mistake, I was thinking that commit() called unlock(), but looking at the 
code I see it does the unlocking itself, so only worrying about locks that are 
not associated with a txn here makes sense.

bq. Since "dbProduct" is only set once per MS start I don't think this causes 
any more connections to be taken out...
That's not what I was saying.  I was saying for programmer convenience it would 
make sense to change this to be:
{code}
determineDatabaseProduct(null);
{code}
and change determineDataProduct to be:
{code}
  protected DatabaseProduct determineDatabaseProduct(Connection conn) throws 
MetaException {
if (dbProduct == null) {
  if (conn == null) {
// get connection...
  }
  try {
  ...
{code}
This is purely convenience and not necessary.

+1 for the patch.

> Investigate TxnHandler and CompactionTxnHandler to see where we improve 
> concurrency
> ---
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.3.patch, HIVE-11948.4.patch, 
> HIVE-11948.5.patch, HIVE-11948.6.patch, HIVE-11948.7.patch, HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12436) Default hive.metastore.schema.verification to true

2015-11-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12436:

Attachment: HIVE-12436.patch

> Default hive.metastore.schema.verification to true
> --
>
> Key: HIVE-12436
> URL: https://issues.apache.org/jira/browse/HIVE-12436
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12436.patch
>
>
> It enforces metastore schema version consistency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009803#comment-15009803
 ] 

Laljo John Pullokkaran commented on HIVE-12384:
---

Committed to Master

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009809#comment-15009809
 ] 

Sergey Shelukhin commented on HIVE-12384:
-

Does this need to be committed to branch-1 too?

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false

2015-11-17 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-12008:

Attachment: HIVE-12008.3.patch

> Make last two tests added by HIVE-11384 pass when hive.in.test is false
> ---
>
> Key: HIVE-12008
> URL: https://issues.apache.org/jira/browse/HIVE-12008
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12008.1.patch, HIVE-12008.2.patch, 
> HIVE-12008.3.patch
>
>
> The last two qfile unit tests fail when hive.in.test is false. It may relate 
> how we handle prunelist for select. When select include every column in a 
> table, the prunelist for the select is empty. It may cause issues to 
> calculate its parent's prunelist.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12437) SMB join in tez fails when one of the tables is empty

2015-11-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009921#comment-15009921
 ] 

Siddharth Seth commented on HIVE-12437:
---

+1. Looks good. If this is not expected to be hit very often - a log line 
indicating no readers could be useful.

> SMB join in tez fails when one of the tables is empty
> -
>
> Key: HIVE-12437
> URL: https://issues.apache.org/jira/browse/HIVE-12437
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-12437.1.patch
>
>
> It looks like a better check for empty tables is to depend on the existence 
> of the record reader for the input from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009820#comment-15009820
 ] 

Hive QA commented on HIVE-12433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772818/HIVE-12433.1-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9786 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1001/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1001/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1001/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772818 - PreCommit-HIVE-SPARK-Build

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12437) SMB join in tez fails when one of the tables is empty

2015-11-17 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-12437:
--
Attachment: HIVE-12437.1.patch

> SMB join in tez fails when one of the tables is empty
> -
>
> Key: HIVE-12437
> URL: https://issues.apache.org/jira/browse/HIVE-12437
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-12437.1.patch
>
>
> It looks like a better check for empty tables is to depend on the existence 
> of the record reader for the input from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12437) SMB join in tez fails when one of the tables is empty

2015-11-17 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009869#comment-15009869
 ] 

Vikram Dixit K commented on HIVE-12437:
---

Ping [~sseth]. Please review.

> SMB join in tez fails when one of the tables is empty
> -
>
> Key: HIVE-12437
> URL: https://issues.apache.org/jira/browse/HIVE-12437
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-12437.1.patch
>
>
> It looks like a better check for empty tables is to depend on the existence 
> of the record reader for the input from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11675:

Attachment: HIVE-11675.02.patch

Rebased the patch again.

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10937:

Component/s: llap

> LLAP: make ObjectCache for plans work properly in the daemon
> 
>
> Key: HIVE-10937
> URL: https://issues.apache.org/jira/browse/HIVE-10937
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10937.01.patch, HIVE-10937.02.patch, 
> HIVE-10937.03.patch, HIVE-10937.04.patch, HIVE-10937.patch
>
>
> There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
> 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12443) Hive Streaming should expose encoding and serdes for testing

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009794#comment-15009794
 ] 

Eugene Koifman commented on HIVE-12443:
---

if the idea is to intercept encode(), how does making the method public help?

Wouldn't you want to register some Callback object with RecordWriter which will 
callback(Object encoded) which will tell you what the writer is writing?

making encode() public seems like you'd have to call it 2 times, 1 to write and 
1 for test framework.

Perhaps I'm not understanding the requirement

> Hive Streaming should expose encoding and serdes for testing
> 
>
> Key: HIVE-12443
> URL: https://issues.apache.org/jira/browse/HIVE-12443
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure, Transactions
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12443.patch
>
>
> Currently how records are passed into the hive streaming RecordWriter are 
> converted from the inbound format to Hive format is opaque.  The encoding and 
> writing are done in a single call to RecordWriter.write().  This is 
> problematic for test tools that want to intercept the record stream and write 
> it to a benchmark in addition to Hive.
> All existing RecordWriters have an encode and getSerDe methods.  I propose to 
> expose these by making them public in AbstractRecordWriter, and making 
> AbstractRecordWriter a public class (it is currently package private).  This 
> keeps the RecordWriter interface clean (stream writers will not need to 
> directly call these methods) and avoids any backwards incompatible changes.  
> Having AbstractRecordWriter public is also desirable for anyone who wants to 
> write their own RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12175) Upgrade Kryo version to 3.0.x

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009808#comment-15009808
 ] 

Hive QA commented on HIVE-12175:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772591/HIVE-12175.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 9782 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explode_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_explode2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_write_correct_definition_levels
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_inline
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_percentile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sentences
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_distinct_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6058/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6058/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6058/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 38 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772591 - PreCommit-HIVE-TRUNK-Build

> Upgrade Kryo version to 3.0.x
> -
>
> Key: HIVE-12175
> URL: https://issues.apache.org/jira/browse/HIVE-12175
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12175.1.patch, HIVE-12175.2.patch, 
> HIVE-12175.3.patch, HIVE-12175.3.patch, HIVE-12175.4.patch
>
>
> Current version of kryo (2.22) has some issue (refer exception below and in 
> HIVE-12174) with serializing ArrayLists generated using Arrays.asList(). We 
> need to either replace all occurrences of  Arrays.asList() or change the 
> current StdInstantiatorStrategy. This issue is fixed in later versions and 
> kryo community recommends using DefaultInstantiatorStrategy with 

[jira] [Updated] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12384:

Target Version/s: 1.3.0, 2.0.0  (was: 2.0.0)

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12446) Tracking jira for changes required for move to Tez 0.8.2

2015-11-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12446:
--
Component/s: llap

> Tracking jira for changes required for move to Tez 0.8.2
> 
>
> Key: HIVE-12446
> URL: https://issues.apache.org/jira/browse/HIVE-12446
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12447) Fix LlapTaskReporter post TEZ-808 changes

2015-11-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12447:
--
Component/s: llap

> Fix LlapTaskReporter post TEZ-808 changes
> -
>
> Key: HIVE-12447
> URL: https://issues.apache.org/jira/browse/HIVE-12447
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12449) Report progress information from the Tez processor

2015-11-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12449:
--
Assignee: (was: Vikram Dixit K)

> Report progress information from the Tez processor
> --
>
> Key: HIVE-12449
> URL: https://issues.apache.org/jira/browse/HIVE-12449
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Reporter: Siddharth Seth
>
> After TEZ-808, Tez tracks processor progress and can kill the tasks if they 
> don't make progress fast enough (disabled by default). Hive needs to start 
> reporting progress while processing records.
> Also, progress will eventually help with better speculation decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009817#comment-15009817
 ] 

Laljo John Pullokkaran commented on HIVE-12384:
---

yes this needs to be back ported to branch-1.

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12384:

Fix Version/s: 2.0.0

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 2.0.0
>
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9028) Enhance the hive parser to accept tuples in where in clause filter

2015-11-17 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009897#comment-15009897
 ] 

Carter Shanklin commented on HIVE-9028:
---

Same as HIVE-11600 ?

> Enhance the hive parser to accept tuples in where in clause filter
> --
>
> Key: HIVE-9028
> URL: https://issues.apache.org/jira/browse/HIVE-9028
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 0.13.1
>Reporter: Yash Datta
>
> Currently, the hive parser will only accept a list of values in the where in 
> clause and the filter is applied only on a single column. Enhanced it to 
> accept filter on multiple columns.
> So current support is for queries like :
> Select * from table where c1 in (value1,value2,...value n);
> Added support in the parser for queries like :
> Select  * from table where (c1,c2,... cn) in ((value1,value2...value n), 
> (value1' , value2' ... ,value n') )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11675:

Description: 
Need to take a look at the best flow. It won't be much different if we do 
filtering metastore call for each partition. So perhaps we'd need the custom 
sync point/batching after all.
Or we can make it opportunistic and not fetch any footers unless it can be 
pushed down to metastore or fetched from local cache, that way the only slow 
threaded op is directory listings

  was:
Need to take a look at the best flow. It won't be much different if we do 
filtering metastore call for each partition. So perhaps we'd need the custom 
sync point/batching after all.
Or we can make it opportunistic and not fetch any footers unless it can be 
pushed down to metastore or fetched from local cache, that way the only slow 
threaded op is directory listings

NO PRECOMMIT TESTS


> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11675:

Attachment: (was: HIVE-11675.01.patch)

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11675.01.patch, HIVE-11675.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf and document the settings

2015-11-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009330#comment-15009330
 ] 

Siddharth Seth commented on HIVE-11358:
---

bq. One question is, do we need a separate daemon config, or should we just use 
hive-site.xml for everything?
Separate is better IMO, something other than hive-site.xml is less ambiguous. 
Also leaving this as hive-site likely means a large (100+ k-v) config file for 
LLAP - of which only about 10 are required.

Given LLAP is a new component - we could get started with separating client and 
server configs - along with the files.
e.g.
llap.daemon.rpc.num.handlers (and a lot of other configs) <- useless in client 
configs (hive-site on the client / hive-server)
llap.daemon.num.executors (and a few other configs) <- #threads for a daemon. 
Used by the client to determine available slots when not using slider. The 
client config could be renamed.

IAC, that's beyond the scope of what this jira is doing. Using annotations to 
separate config parameters to generate the public list of configurable fields 
would work.

I'll create a separate jira to separate llap server-side and client-side 
configs.

+1 for the patch.




> LLAP: move LlapConfiguration into HiveConf and document the settings
> 
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11358.01.patch, HIVE-11358.patch
>
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12439:
--
Summary: CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc 
improvements  (was: CompactionTxnHandler.markCleaned() add safety guards)

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> # add distinct to s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = 
> tc_txnid and txn_state = '" +
>TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12439) CompactionTxnHandler.markCleaned() add safety guards

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12439:
--
Description: 
# add distinct to s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = 
tc_txnid and txn_state = '" +
   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
tc_table = '" +

# add a safeguard to make sure IN clause is not too large; break up by txn id 
to delete from TXN_COMPONENTS where tc_txnid in ...

# TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
rather than 1 DB roundtrip per row

  was:
# add distinct to s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = 
tc_txnid and txn_state = '" +
   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
tc_table = '" +

# add a safeguard to make sure IN clause is not too large; break up by txn id 
to delete from TXN_COMPONENTS where tc_txnid in ...


> CompactionTxnHandler.markCleaned() add safety guards
> 
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> # add distinct to s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = 
> tc_txnid and txn_state = '" +
>TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Attachment: HIVE-9202.1-spark.patch

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.branch, HIVE-9202.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12406) HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface

2015-11-17 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009426#comment-15009426
 ] 

Aihua Xu commented on HIVE-12406:
-

Attached the patch which added deprecated inner class and initSerDeParams() 
back.

> HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface
> 
>
> Key: HIVE-12406
> URL: https://issues.apache.org/jira/browse/HIVE-12406
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0
>Reporter: Lenni Kuff
>Assignee: Aihua Xu
>Priority: Blocker
> Attachments: HIVE-12406.patch
>
>
> In the process of fixing HIVE-9500, an incompatibility was introduced that 
> will break 3rd party code that relies on LazySimpleSerde. In HIVE-9500, the 
> nested class SerDeParamaters was removed and the method 
> LazySimpleSerDe.initSerdeParms was also removed. They were replaced by a 
> standalone class LazySerDeParameters.
> Since this has already been released, I don't think we should revert the 
> change since that would mean breaking compatibility again. Instead, the best 
> approach would be to support both interfaces, if possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11144) Remove row by row reader.

2015-11-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-11144.
--
Resolution: Won't Fix

> Remove row by row reader.
> -
>
> Key: HIVE-11144
> URL: https://issues.apache.org/jira/browse/HIVE-11144
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The core ORC reader will be better served if the vectorized read path is the 
> primary API and the row by row reader becomes a Hive-specific shim.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Attachment: HIVE-12433.1-spark.patch

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009402#comment-15009402
 ] 

Xuefu Zhang commented on HIVE-12433:


Thanks! Corrected.

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.branch, HIVE-9202.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12438) Separate LLAP client side and server side config parameters

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009346#comment-15009346
 ] 

Sergey Shelukhin commented on HIVE-12438:
-

What is the goal of this separation? That will determine what we do. Might was 
well split HS2-only and MS-only stuff too.

> Separate LLAP client side and server side config parameters
> ---
>
> Key: HIVE-12438
> URL: https://issues.apache.org/jira/browse/HIVE-12438
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>
> Potentially separate out the files used as well. llap-daemon-site vs 
> llap-client-site.
> Most llap parameters are server side only. For ones which are required in 
> clients / AM - add an equivalent client side parameter for these.
> Also - parameters which enable the llap cache could be renamed.
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Attachment: HIVE-12433.1-spark.branch

Attached a dummy patch to make sure test is okay after the merge.

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12431) Cancel queries after configurable timeout waiting on compilation

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009408#comment-15009408
 ] 

Sergey Shelukhin commented on HIVE-12431:
-

Note that there's a setting now that disables the compile lock.

> Cancel queries after configurable timeout waiting on compilation
> 
>
> Key: HIVE-12431
> URL: https://issues.apache.org/jira/browse/HIVE-12431
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Query Processor
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>
> To help with HiveServer2 scalability, it would be useful to allow users to 
> configure a timeout value for queries waiting to be compiled. If the timeout 
> value is reached then the query would abort. One option to achieve this would 
> be to update the compile lock to use a try-lock with the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11833) TxnHandler heartbeat txn doesn't need serializable DB txn level

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11833:
--
Fix Version/s: 1.3.0

> TxnHandler heartbeat txn doesn't need serializable DB txn level
> ---
>
> Key: HIVE-11833
> URL: https://issues.apache.org/jira/browse/HIVE-11833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11833.patch
>
>
> What it does is:
> 1) Update lock heartbeat time, fails if not found.
> 2) Get txn state.
> 3) If not found, look for txn in completed, fails regardless of result.
> 4) Update txn heartbeat time if not (3) and not aborted.
> All this can run the same under repeatable-reads.
> Now if it runs under read-committed, someone could 
> 1) update txn state after we read it
> 2) delete txn state (moving to completed) after we read it
> 3) same for completed state
> In case of 1 we will update heartbeat for e.g. aborted txn without detecting 
> it. UPD: We can change queries to detect it
> In case of 2 the update will produce 0 rows so we will detect that and can 
> check completed as we already do.
> The 3 case seems like it doesn't matter.
> I don't know if (1) matters. These heartbeats happen often and can cause 
> contention on the db



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11831) TXN tables in Oracle should be created with ROWDEPENDENCIES

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11831:
--
Fix Version/s: 1.3.0

> TXN tables in Oracle should be created with ROWDEPENDENCIES
> ---
>
> Key: HIVE-11831
> URL: https://issues.apache.org/jira/browse/HIVE-11831
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11831.01.patch, HIVE-11831.patch
>
>
> These frequently-updated tables may otherwise suffer from spurious deadlocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11833) TxnHandler heartbeat txn doesn't need serializable DB txn level

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009976#comment-15009976
 ] 

Eugene Koifman commented on HIVE-11833:
---

committed to branch-1 
https://github.com/apache/hive/commit/b1c1bf2afa24b1cd10e74f2d3ea3a6377f214c26

> TxnHandler heartbeat txn doesn't need serializable DB txn level
> ---
>
> Key: HIVE-11833
> URL: https://issues.apache.org/jira/browse/HIVE-11833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11833.patch
>
>
> What it does is:
> 1) Update lock heartbeat time, fails if not found.
> 2) Get txn state.
> 3) If not found, look for txn in completed, fails regardless of result.
> 4) Update txn heartbeat time if not (3) and not aborted.
> All this can run the same under repeatable-reads.
> Now if it runs under read-committed, someone could 
> 1) update txn state after we read it
> 2) delete txn state (moving to completed) after we read it
> 3) same for completed state
> In case of 1 we will update heartbeat for e.g. aborted txn without detecting 
> it. UPD: We can change queries to detect it
> In case of 2 the update will produce 0 rows so we will detect that and can 
> check completed as we already do.
> The 3 case seems like it doesn't matter.
> I don't know if (1) matters. These heartbeats happen often and can cause 
> contention on the db



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7926:
---
Target Version/s: 2.0.0

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010036#comment-15010036
 ] 

Eugene Koifman commented on HIVE-12421:
---

you are right, lock() is a blocking call, so this isn't a problem.

> Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12045:
---
Attachment: HIVE-12045.2-spark.patch

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> HIVE-12045.2-spark.patch, HIVE-12045.2-spark.patch, example.jar, 
> genUDF.patch, hive.log.gz
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> 

[jira] [Resolved] (HIVE-10185) LLAP: LLAP IO doesn't get invoked inside MiniTezCluster q tests

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10185.
-
Resolution: Not A Problem

There's MiniLlap test now

> LLAP: LLAP IO doesn't get invoked inside MiniTezCluster q tests
> ---
>
> Key: HIVE-10185
> URL: https://issues.apache.org/jira/browse/HIVE-10185
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
>
> Took me a while to understand that it's not working. It might not be getting 
> initialized inside the container processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010055#comment-15010055
 ] 

Xuefu Zhang commented on HIVE-12433:


The following failures are seen in master as well:
{code}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
{code}
The others cannot be reproduced locally. 


> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12445) Tracking of completed dags is a slow memory leak

2015-11-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12445:
--
Component/s: llap

> Tracking of completed dags is a slow memory leak
> 
>
> Key: HIVE-12445
> URL: https://issues.apache.org/jira/browse/HIVE-12445
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>
> LLAP daemons track completed DAGs, but never clean up these structures. This 
> is primarily to disallow out of order executions. Evaluate whether that can 
> be avoided - otherwise this structure needs to be cleaned up with a delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11890) Create ORC module

2015-11-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11890:
-
Attachment: HIVE-11890.patch

I made some more compatibility tweaks so that clients don't need to update the 
packages for ColumnStatistics.

> Create ORC module
> -
>
> Key: HIVE-11890
> URL: https://issues.apache.org/jira/browse/HIVE-11890
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch
>
>
> Start moving classes over to the ORC module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12443) Hive Streaming should expose encoding and serdes for testing

2015-11-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009957#comment-15009957
 ] 

Alan Gates commented on HIVE-12443:
---

Sorry, I wasn't clear.  I don't want to intercept encode, I want to encode 
without writing.  In both RecordWriter's today, the write method contains:
{code}
  Object encodedRow = encode(record);
  int bucket = getBucket(encodedRow);
  updaters.get(bucket).insert(transactionId, encodedRow);
{code}
Both encoding the record and writing of the record are done in this one method. 
 I want to be able to encode the record without writing.  Thus I want to take 
encode, which is private in both, elevate it to AbstractRecordWriter and make 
it public so I can call it from capybara without writing data into an updater.  
This will allow capybara to split the stream coming into HiveEndPoint, sending 
one to Hive normal, and using the other to load data into a benchmark.

> Hive Streaming should expose encoding and serdes for testing
> 
>
> Key: HIVE-12443
> URL: https://issues.apache.org/jira/browse/HIVE-12443
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure, Transactions
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12443.patch
>
>
> Currently how records are passed into the hive streaming RecordWriter are 
> converted from the inbound format to Hive format is opaque.  The encoding and 
> writing are done in a single call to RecordWriter.write().  This is 
> problematic for test tools that want to intercept the record stream and write 
> it to a benchmark in addition to Hive.
> All existing RecordWriters have an encode and getSerDe methods.  I propose to 
> expose these by making them public in AbstractRecordWriter, and making 
> AbstractRecordWriter a public class (it is currently package private).  This 
> keeps the RecordWriter interface clean (stream writers will not need to 
> directly call these methods) and avoids any backwards incompatible changes.  
> Having AbstractRecordWriter public is also desirable for anyone who wants to 
> write their own RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11831) TXN tables in Oracle should be created with ROWDEPENDENCIES

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009983#comment-15009983
 ] 

Eugene Koifman commented on HIVE-11831:
---

committed to branch-1 
https://github.com/apache/hive/commit/236cc2ac9d4b27c6536b1d132b2e7706cb2109f3

> TXN tables in Oracle should be created with ROWDEPENDENCIES
> ---
>
> Key: HIVE-11831
> URL: https://issues.apache.org/jira/browse/HIVE-11831
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11831.01.patch, HIVE-11831.patch
>
>
> These frequently-updated tables may otherwise suffer from spurious deadlocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12443) Hive Streaming should expose encoding and serdes for testing

2015-11-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010031#comment-15010031
 ] 

Eugene Koifman commented on HIVE-12443:
---

+1

> Hive Streaming should expose encoding and serdes for testing
> 
>
> Key: HIVE-12443
> URL: https://issues.apache.org/jira/browse/HIVE-12443
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure, Transactions
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12443.patch
>
>
> Currently how records are passed into the hive streaming RecordWriter are 
> converted from the inbound format to Hive format is opaque.  The encoding and 
> writing are done in a single call to RecordWriter.write().  This is 
> problematic for test tools that want to intercept the record stream and write 
> it to a benchmark in addition to Hive.
> All existing RecordWriters have an encode and getSerDe methods.  I propose to 
> expose these by making them public in AbstractRecordWriter, and making 
> AbstractRecordWriter a public class (it is currently package private).  This 
> keeps the RecordWriter interface clean (stream writers will not need to 
> directly call these methods) and avoids any backwards incompatible changes.  
> Having AbstractRecordWriter public is also desirable for anyone who wants to 
> write their own RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10648) LLAP: registry; Tez attempted to schedule to daemon that didn't exist

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010040#comment-15010040
 ] 

Sergey Shelukhin commented on HIVE-10648:
-

[~gopalv] any update on this?

> LLAP: registry; Tez attempted to schedule to daemon that didn't exist
> -
>
> Key: HIVE-10648
> URL: https://issues.apache.org/jira/browse/HIVE-10648
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> I can post logs externally; for now app IDs on test cluster are 
> application_1429683757595_0784 and application_1429683757595_0783, I also 
> have logs copied over.
> AM found the node (same logs for other nodes):
> {noformat}
> 2015-05-07 12:13:28,074 INFO 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
> impl.LlapYarnRegistryImpl: Adding new worker 
> 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance 
> [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with 
> resources=]
> 
> 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: 
> Num cluster nodes = 19
> {noformat}
> Trouble is, this node never actually existed... The cluster only had 15 
> nodes. 
> As the job was progressing, AM repeatedly tried to schedule to this node and 
> failed. There was no other LLAP cluster running at the same time.
> In fact, given that I always start a 15-node cluster I am not sure where 
> 19-node data could conceivably come from...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10947) LLAP: preemption appears to count against failure count for the task

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10947.
-
Resolution: Cannot Reproduce

> LLAP: preemption appears to count against failure count for the task
> 
>
> Key: HIVE-10947
> URL: https://issues.apache.org/jira/browse/HIVE-10947
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
>
> Looks like the following stack in very parallel workload counts as task error 
> and DAG fails:
> {noformat}
> : Error while processing statement: FAILED: Execution Error, return code 2 
> from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1433459966952_0482_4_03, diagnostics=[Task 
> failed, taskId=task_1433459966952_0482_4_03_22, diagnostics=[TaskAttempt 
> 0 killed, TaskAttempt 1 killed, TaskAttempt 2 killed, TaskAttempt 3 killed, 
> TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 
> 7 killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, 
> TaskAttempt 11 killed, TaskAttempt 12 killed, TaskAttempt 13 killed, 
> TaskAttempt 14 killed, TaskAttempt 15 killed, TaskAttempt 16 killed, 
> TaskAttempt 17 killed, TaskAttempt 18 killed, TaskAttempt 19 failed, 
> info=[Error: Failure while running task: 
> attempt_1433459966952_0482_4_03_22_19:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1654)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:256)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
>   ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async 
> initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:416)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:388)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:511)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:464)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:378)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:241)
>   ... 15 more
> Caused by: java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:408)
>   ... 20 more
> ], TaskAttempt 20 failed, info=[Error: Failure while running task: 
> attempt_1433459966952_0482_4_03_22_20:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>   

[jira] [Updated] (HIVE-12313) Turn hive.optimize.union.remove on by default

2015-11-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12313:

Attachment: HIVE-12313.2.patch

> Turn hive.optimize.union.remove on by default
> -
>
> Key: HIVE-12313
> URL: https://issues.apache.org/jira/browse/HIVE-12313
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12313.1.patch, HIVE-12313.2.patch, HIVE-12313.patch
>
>
> This optimization always helps. It should be on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12451) Orc fast file merging/concatenation should be disabled for ACID tables

2015-11-17 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12451:
-
Assignee: Alan Gates  (was: Prasanth Jayachandran)

> Orc fast file merging/concatenation should be disabled for ACID tables
> --
>
> Key: HIVE-12451
> URL: https://issues.apache.org/jira/browse/HIVE-12451
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Alan Gates
>
> For ACID tables merging of small files should happen only through compaction. 
> We should disable "alter table .. concatenate" for ACID tables. We should 
> also disable ConditionalMergeFileTask if destination is an ACID table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12450) OrcFileMergeOperator does not use correct compression buffer size

2015-11-17 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010089#comment-15010089
 ] 

Prasanth Jayachandran commented on HIVE-12450:
--

[~sershe] Can you take a look at this patch? Most changes are test + golden 
file out.

> OrcFileMergeOperator does not use correct compression buffer size
> -
>
> Key: HIVE-12450
> URL: https://issues.apache.org/jira/browse/HIVE-12450
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12450.1.patch
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12450) OrcFileMergeOperator does not use correct compression buffer size

2015-11-17 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12450:
-
Attachment: HIVE-12450.1.patch

> OrcFileMergeOperator does not use correct compression buffer size
> -
>
> Key: HIVE-12450
> URL: https://issues.apache.org/jira/browse/HIVE-12450
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12450.1.patch
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12388) GetTables cannot get external tables when TABLE type argument is given

2015-11-17 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009944#comment-15009944
 ] 

Aihua Xu commented on HIVE-12388:
-

[~szehon] The patch looks good. 

Minor issues:
1.  {{LOG.info("Invalid hive table type " + hiveTypeName);}} does it make sense 
to log as a warning rather than info?
2. On Line 1181, the assert signature should be  assertEquals(expected, actual).

> GetTables cannot get external tables when TABLE type argument is given
> --
>
> Key: HIVE-12388
> URL: https://issues.apache.org/jira/browse/HIVE-12388
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-12388.1.patch.txt, HIVE-12388.2.patch
>
>
> By regression of HIVE-7575, external tables are not shown when "TABLE" type 
> is specified as argument. I'm working on this. Sorry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-17 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11981:

Attachment: (was: HIVE-11981.0992.patch)

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-17 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11981:

Attachment: HIVE-11981.0992.patch

Rebase during very long Hive QA queue.

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12451) Orc fast file merging/concatenation should be disabled for ACID tables

2015-11-17 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009989#comment-15009989
 ] 

Prasanth Jayachandran commented on HIVE-12451:
--

[~ekoifman] fyi..

> Orc fast file merging/concatenation should be disabled for ACID tables
> --
>
> Key: HIVE-12451
> URL: https://issues.apache.org/jira/browse/HIVE-12451
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> For ACID tables merging of small files should happen only through compaction. 
> We should disable "alter table .. concatenate" for ACID tables. We should 
> also disable ConditionalMergeFileTask if destination is an ACID table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12388) GetTables cannot get external tables when TABLE type argument is given

2015-11-17 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-12388:
-
Attachment: HIVE-12388.3.patch

Thanks Aihua for catching those.

> GetTables cannot get external tables when TABLE type argument is given
> --
>
> Key: HIVE-12388
> URL: https://issues.apache.org/jira/browse/HIVE-12388
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-12388.1.patch.txt, HIVE-12388.2.patch, 
> HIVE-12388.3.patch
>
>
> By regression of HIVE-7575, external tables are not shown when "TABLE" type 
> is specified as argument. I'm working on this. Sorry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10937:

Attachment: HIVE-10937.05.patch

Hmm. I could swear I removed the globals, but apparently I didn't. Here we go.

> LLAP: make ObjectCache for plans work properly in the daemon
> 
>
> Key: HIVE-10937
> URL: https://issues.apache.org/jira/browse/HIVE-10937
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10937.01.patch, HIVE-10937.02.patch, 
> HIVE-10937.03.patch, HIVE-10937.04.patch, HIVE-10937.05.patch, 
> HIVE-10937.patch
>
>
> There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
> 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12384) Union Operator may produce incorrect result on TEZ

2015-11-17 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-12384:
--
Attachment: HIVE-12384.4.patch

Added a modified patch (.4) to include missed test config changes.

> Union Operator may produce incorrect result on TEZ
> --
>
> Key: HIVE-12384
> URL: https://issues.apache.org/jira/browse/HIVE-12384
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.1.0, 1.0.1, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 2.0.0
>
> Attachments: HIVE-12384.1.patch, HIVE-12384.2.patch, 
> HIVE-12384.3.patch, HIVE-12384.4.patch
>
>
> Union queries may produce incorrect result on TEZ.
> TEZ removes union op, thus might loose the implicit cast in union op.
> Reproduction test case:
> set hive.cbo.enable=false;
> set hive.execution.engine=tez;
> select (x/sum(x) over())  as y from(select cast(1 as decimal(10,0))  as x 
> from (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) 
> x from (select * from src limit 2) s2 union all select '1' x from 
> (select * from src limit 2) s3)u order by y;
> select (x/sum(x) over()) as y from(select cast(1 as decimal(10,0))  as x from 
> (select * from src limit 2)s1 union all select cast(1 as decimal(10,0)) x 
> from (select * from src limit 2) s2 union all select cast (null as string) x 
> from (select * from src limit 2) s3)u order by y;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12421:
--
Issue Type: Improvement  (was: Bug)

> Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11766) LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11766:

Assignee: Prasanth Jayachandran

> LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal
> ---
>
> Key: HIVE-11766
> URL: https://issues.apache.org/jira/browse/HIVE-11766
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Remove HIVE-11732 changes after HIVE-11378 goes in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11766) LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11766:

Assignee: Sergey Shelukhin  (was: Prasanth Jayachandran)

> LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal
> ---
>
> Key: HIVE-11766
> URL: https://issues.apache.org/jira/browse/HIVE-11766
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>
> Remove HIVE-11732 changes after HIVE-11378 goes in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12421:
--
Summary: Streaming API add TransactionBatch.beginNextTransaction(long 
timeout)  (was: Streaming API TransactionBatch.beginNextTransaction() does not 
wait for locks)

> Streaming API add TransactionBatch.beginNextTransaction(long timeout)
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12451) Orc fast file merging/concatenation should be disabled for ACID tables

2015-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12451:
--
Component/s: Transactions
 ORC

> Orc fast file merging/concatenation should be disabled for ACID tables
> --
>
> Key: HIVE-12451
> URL: https://issues.apache.org/jira/browse/HIVE-12451
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> For ACID tables merging of small files should happen only through compaction. 
> We should disable "alter table .. concatenate" for ACID tables. We should 
> also disable ConditionalMergeFileTask if destination is an ACID table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10155) LLAP: switch to sensible cache policy

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10155:

Target Version/s: 2.0.0

> LLAP: switch to sensible cache policy
> -
>
> Key: HIVE-10155
> URL: https://issues.apache.org/jira/browse/HIVE-10155
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> FIFO policy is currently the default. We can test LRFU one but there were 
> concerns that it won't scale. One option is to implement a two-tier policy, 
> FIFO/LRU for blocks referenced once, and something more complex (LFU, LRFU) 
> for blocks referenced more than once. That should be friendly to large scans 
> of a fact table in terms of behavior and overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11264) LLAP: PipelineSorter got stuck

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11264.
-
Resolution: Cannot Reproduce

> LLAP: PipelineSorter got stuck
> --
>
> Key: HIVE-11264
> URL: https://issues.apache.org/jira/browse/HIVE-11264
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> Saving here for now, in case someone seems something similar or has a sudden 
> insight.
> On parallel query workload, at some point, one query became stuck while 
> everything else finished. The query ended with 3 reducer stages, 1 vertex 
> each. The only things running were Reducer 3 and Reducer 2 (10-machine 
> cluster), 3 waiting for 2, and 2 stuck for 20 minutes.
> Unfortunately log level was WARN so there's not much in terms of logs.
> When LLAP cluster was stopped, the thread running reducer 2 died like so: 
> {noformat}
> 2015-07-14 19:02:20,136 
> [TezTaskRunner_attempt_1435700346116_1889_1_06_00_104(attempt_1435700346116_1889_1_06_00_104)]
>  ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unexpected exception 
> from MapJoinOperator : java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@580b9a95 rejected from 
> java.util.concurrent.ThreadPoolExecutor@61d1d95b[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 753]
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@580b9a95 rejected from 
> java.util.concurrent.ThreadPoolExecutor@61d1d95b[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 753]
>at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
>at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:656)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:659)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:755)
>at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:415)
>at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:656)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:659)
>at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:755)
>at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:309)
>at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:272)
>at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:265)
>at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:462)
>at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:383)
>at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:651)
>at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:317)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:415)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1654)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> 

[jira] [Updated] (HIVE-11447) LLAP: make sure JMX view is correct

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11447:

Target Version/s: 2.0.0

> LLAP: make sure JMX view is correct
> ---
>
> Key: HIVE-11447
> URL: https://issues.apache.org/jira/browse/HIVE-11447
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> There were several issues where counters were incorrect, queue view had stale 
> tasks, and such. Need to check that all counters we currently output are 
> actually correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10132) LLAP: Tez heartbeats are delayed by ~500+ ms due to Hadoop IPC client

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010044#comment-15010044
 ] 

Sergey Shelukhin commented on HIVE-10132:
-

[~gopalv] do we still need this now that HADOOP-11772 is resolved?

> LLAP: Tez heartbeats are delayed by ~500+ ms due to Hadoop IPC client
> -
>
> Key: HIVE-10132
> URL: https://issues.apache.org/jira/browse/HIVE-10132
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Siddharth Seth
> Attachments: HIVE-10132.NOT.A.patch
>
>
> HADOOP-11772 has a clearer bug report of the core issue inside hadoop-common.
> Due to the delayed heartbeats reaching the AM, the reducers are losing up-to 
> a couple of seconds for a 60ms (x10 parallel) mapper + 300ms reducer instead 
> of finishing the query in under a second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9756) LLAP: use log4j 2 for llap

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9756:
---
Target Version/s: 2.0.0

> LLAP: use log4j 2 for llap
> --
>
> Key: HIVE-9756
> URL: https://issues.apache.org/jira/browse/HIVE-9756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-9756.1.patch, HIVE-9756.2.patch
>
>
> For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get 
> throughput friendly logging.
> http://logging.apache.org/log4j/2.0/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12450) OrcFileMergeOperator does not use correct compression buffer size

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010092#comment-15010092
 ] 

Sergey Shelukhin commented on HIVE-12450:
-

Java changes look good +1

> OrcFileMergeOperator does not use correct compression buffer size
> -
>
> Key: HIVE-12450
> URL: https://issues.apache.org/jira/browse/HIVE-12450
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12450.1.patch
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf and document the settings

2015-11-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010257#comment-15010257
 ] 

Lefty Leverenz commented on HIVE-11358:
---

General questions:

1.  Why don't the parameters that take a time value use a TimeValidator?  For 
examples, see hive.spark.client.future.timeout in the patch (seconds) and 
hive.stats.retries.wait in HiveConf.java (milliseconds).

2.  Why is milliseconds "ms" in some parameter names and "millis" in others?

3.  Why do several parameter names use hyphens where normally there would be 
dots?

Hyphens in parameter names:

* hive.llap.daemon.shuffle.dir-watcher.enabled
* hive.llap.daemon.am.liveness.heartbeat.interval-ms
* hive.llap.am.liveness.connection.timeout-millis
* hive.llap.am.liveness.connection.sleep-between-retries-millis
* hive.llap.file.cleanup.delay-seconds
* hive.llap.task.communicator.connection.timeout-millis
* hive.llap.task.communicator.connection.sleep-between-retries-millis

Also hive.llap.task.scheduler.node.re-enable.min.timeout.ms & 
hive.llap.task.scheduler.node.re-enable.max.timeout.ms but those are genuine 
hyphens.

> LLAP: move LlapConfiguration into HiveConf and document the settings
> 
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11358.01.patch, HIVE-11358.patch
>
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-11-17 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-11878:
---
Attachment: HIVE-11878.patch

Using SystemClassLoader as parent for per session classloader

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.patch, HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-11-17 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-11878:
---
Attachment: (was: HIVE-11878.patch)

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, 
> HIVE-11878_approach3_with_review_comments.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >