[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696862#comment-14696862
 ] 

Hive QA commented on HIVE-11502:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750428/HIVE-11502.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9357 tests executed
*Failed tests:*
{noformat}
TestDummy - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4962/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4962/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4962/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750428 - PreCommit-HIVE-TRUNK-Build

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11502.1.patch, HIVE-11502.2.patch


 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF

2015-08-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696883#comment-14696883
 ] 

Amareshwari Sriramadasu commented on HIVE-10276:


The implementation done here does not look SQL compliant. For ex: D is day of 
year in SimpleDateFormat 
(https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html), 
but is month of the year on SQL : 
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format

 Implement date_format(timestamp, fmt) UDF
 -

 Key: HIVE-10276
 URL: https://issues.apache.org/jira/browse/HIVE-10276
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Fix For: 1.2.0

 Attachments: HIVE-10276.01.patch


 date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a 
 value of String in the format specified by the java date format fmt.
 Supported formats listed here:
 https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11536) %TYPE and %ROWTYPE attributes in data type declaration

2015-08-14 Thread Dmitry Tolpeko (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696887#comment-14696887
 ] 

Dmitry Tolpeko commented on HIVE-11536:
---

An example:

{code}
DECLARE
   v src.key%TYPE;
BEGIN
   SELECT key INTO v FROM src LIMIT 1;
   PRINT v;
END
{code}

 %TYPE and %ROWTYPE attributes in data type declaration
 --

 Key: HIVE-11536
 URL: https://issues.apache.org/jira/browse/HIVE-11536
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko

 %TYPE and %ROWTYPE attributes allow you to derive the data type from the 
 corresponding table column. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-14 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696920#comment-14696920
 ] 

Yongzhi Chen commented on HIVE-11502:
-

The TestDummy failure is not related:

It failed because of FileNotFoundException:
[exec] + javac -cp 
/home/hiveptest/54.147.251.176-hiveptest-2/maven/org/apache/hive/hive-exec/2.0.0-SNAPSHOT/hive-exec-2.0.0-SNAPSHOT.jar
 /tmp/UDFExampleAdd.java -d /tmp
 [exec] + jar -cf /tmp/udfexampleadd-1.0.jar -C /tmp UDFExampleAdd.class
 [exec] java.io.FileNotFoundException: /tmp/UDFExampleAdd.class (No such 
file or directory)
 [exec] at java.io.FileInputStream.open(Native Method)
 [exec] at java.io.FileInputStream.init(FileInputStream.java:146)
 [exec] at sun.tools.jar.Main.copy(Main.java:791)
 [exec] at sun.tools.jar.Main.addFile(Main.java:740)
 [exec] at sun.tools.jar.Main.create(Main.java:491)
 [exec] at sun.tools.jar.Main.run(Main.java:201)
 [exec] at sun.tools.jar.Main.main(Main.java:1177)


 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11502.1.patch, HIVE-11502.2.patch


 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10435) Make HiveSession implementation pluggable through configuration

2015-08-14 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10435:
--
Labels: TODOC2.0  (was: )

 Make HiveSession implementation pluggable through configuration
 ---

 Key: HIVE-10435
 URL: https://issues.apache.org/jira/browse/HIVE-10435
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu
Assignee: Akshay Goyal
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-10435.1.patch, HIVE-10435.2.patch


 SessionManager in CLIService creates and keeps track of HiveSession. 
 Right now, it creates HiveSessionImpl which is one implementation of 
 HiveSession. This improvement request is to make it pluggable through a 
 configuration sothat other implementations can be passed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696536#comment-14696536
 ] 

Hive QA commented on HIVE-11304:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750362/HIVE-11304.9.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9361 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4959/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4959/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4959/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750362 - PreCommit-HIVE-TRUNK-Build

 Migrate to Log4j2 from Log4j 1.x
 

 Key: HIVE-11304
 URL: https://issues.apache.org/jira/browse/HIVE-11304
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11304.10.patch, HIVE-11304.2.patch, 
 HIVE-11304.3.patch, HIVE-11304.4.patch, HIVE-11304.5.patch, 
 HIVE-11304.6.patch, HIVE-11304.7.patch, HIVE-11304.8.patch, 
 HIVE-11304.9.patch, HIVE-11304.patch


 Log4J2 has some great benefits and can benefit hive significantly. Some 
 notable features include
 1) Performance (parametrized logging, performance when logging is disabled 
 etc.) More details can be found here 
 https://logging.apache.org/log4j/2.x/performance.html
 2) RoutingAppender - Route logs to different log files based on MDC context 
 (useful for HS2, LLAP etc.)
 3) Asynchronous logging
 This is an umbrella jira to track changes related to Log4j2 migration.
 Log4J1 EOL - 
 https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3983) Select on table with hbase storage handler fails with an SASL error

2015-08-14 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696575#comment-14696575
 ] 

Arup Malakar commented on HIVE-3983:


I don't have the setup to test this, will reopen if I see it again. Thanks! 

 Select on table with hbase storage handler fails with an SASL error
 ---

 Key: HIVE-3983
 URL: https://issues.apache.org/jira/browse/HIVE-3983
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, Security
 Environment: hive-0.10
 hbase-0.94.5.5
 hadoop-0.23.3.1
 hcatalog-0.5
Reporter: Arup Malakar
Assignee: Swarnim Kulkarni

 The table is created using the following query:
 {code}
 CREATE TABLE hbase_table_1(key int, value string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val)
 TBLPROPERTIES (hbase.table.name = xyz); 
 {code}
 Doing a select on the table launches a map-reduce job. But the job fails with 
 the following error:
 {code}
 2013-02-02 01:31:07,500 FATAL [IPC Server handler 3 on 40118] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1348093718159_1501_m_00_0 - exited : java.io.IOException: 
 java.lang.RuntimeException: SASL authentication failed. The most likely cause 
 is missing or invalid credentials. Consider 'kinit'.
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:160)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: java.lang.RuntimeException: SASL authentication failed. The most 
 likely cause is missing or invalid credentials. Consider 'kinit'.
   at 
 org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:242)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
   at org.apache.hadoop.hbase.security.User.call(User.java:590)
   at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
   at 
 org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444)
   at 
 org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:203)
   at 
 org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:291)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104)
   at $Proxy12.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
   at 
 

[jira] [Commented] (HIVE-10435) Make HiveSession implementation pluggable through configuration

2015-08-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696527#comment-14696527
 ] 

Lefty Leverenz commented on HIVE-10435:
---

Doc note:  This adds two configuration parameters 
(*hive.session.impl.classname* and *hive.session.impl.withugi.classname*) to 
HiveConf.java, so they need to be documented in the wiki.  Added a TODOC2.0 
label.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

 Make HiveSession implementation pluggable through configuration
 ---

 Key: HIVE-10435
 URL: https://issues.apache.org/jira/browse/HIVE-10435
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu
Assignee: Akshay Goyal
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-10435.1.patch, HIVE-10435.2.patch


 SessionManager in CLIService creates and keeps track of HiveSession. 
 Right now, it creates HiveSessionImpl which is one implementation of 
 HiveSession. This improvement request is to make it pluggable through a 
 configuration sothat other implementations can be passed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696656#comment-14696656
 ] 

Hive QA commented on HIVE-11548:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750373/HIVE-11548.1.patch

{color:red}ERROR:{color} -1 due to 133 failed/errored test(s), 9356 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testPartedRead
org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testUnpartedReadWrite
org.apache.hive.hcatalog.mapreduce.TestHCatHiveThriftCompatibility.testDynamicCols
org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteRead
org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteReadMR
org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteRead
org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteReadMR
org.apache.hive.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[0]
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[4]
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[5]
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[0]
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[5]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[0]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[5]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[0]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[4]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[5]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[0]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[5]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[0]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[1]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[2]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[3]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[0]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[1]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[2]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[3]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[5]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[0]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[1]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[2]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[3]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[5]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[1]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[2]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[3]
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[5]
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByPig[0]
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByPig[1]

[jira] [Commented] (HIVE-11556) HiveFilter.copy should take the condition given as a parameter

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697073#comment-14697073
 ] 

Ashutosh Chauhan commented on HIVE-11556:
-

+1 pending tests

 HiveFilter.copy should take the condition given as a parameter
 --

 Key: HIVE-11556
 URL: https://issues.apache.org/jira/browse/HIVE-11556
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11556.patch


 Currently the condition is taken from the original Filter. However, a new 
 condition is given as an input parameter; the new Filter should use that 
 condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697016#comment-14697016
 ] 

Ashutosh Chauhan commented on HIVE-10631:
-

Can you create a review board for this ?

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-08-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696671#comment-14696671
 ] 

Gopal V commented on HIVE-11306:


The patch .3 does not give performance boost observed in patch .2.

The crucial difference is that patch .3 does not really consider the bloom 
filter to be valid for spilled partitions.

{code}
+  if (!bloom1.testLong(keyHash)  !isOnDisk(partitionId)) {
{code}

the isOnDisk check negates all the performance benefits of checking the bloom 
filter to avoid spilling.

 Add a bloom-1 filter for Hybrid MapJoin spills
 --

 Key: HIVE-11306
 URL: https://issues.apache.org/jira/browse/HIVE-11306
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
 HIVE-11306.3.patch


 HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
 corner-case performance issue when joining wide small tables against a narrow 
 big table (like a user info table join events stream).
 The fact that the wide table is spilled causes extra IO, even though the nDV 
 of the join key might be in the thousands.
 A cheap bloom-1 filter would add a massive performance gain for such queries, 
 massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11472) ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row

2015-08-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11472:
---
Attachment: HIVE-11472.2.patch

 ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per 
 row
 ---

 Key: HIVE-11472
 URL: https://issues.apache.org/jira/browse/HIVE-11472
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: Performance
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11472.1.patch, HIVE-11472.2.patch


 For every row x column
 {code}
 int len = (int) lengths.next();
 int offset = 0;
 byte[] bytes = new byte[len];
 while (len  0) {
   int written = stream.read(bytes, offset, len);
   if (written  0) {
 throw new EOFException(Can't finish byte read from  + stream);
   }
 {code}
 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1552
 This is not a big issue until it misses the GC TLAB.
 From hadoop-2.6.x (HADOOP-10855) you can read into a Text directly. 
 Possibly can create a different TreeReader from the factory for 2.6.x  use a 
 DataInputStream per stream and prevent an allocation in the inner loop.
 {code}
 int len = (int) lengths.next();
 result.readWithKnownLength(datastream, len);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696754#comment-14696754
 ] 

Hive QA commented on HIVE-11317:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750421/HIVE-11317.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9360 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testExceptions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4961/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4961/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4961/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750421 - PreCommit-HIVE-TRUNK-Build

 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: triage
 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10602) optimize PTF for GC

2015-08-14 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696708#comment-14696708
 ] 

Takanobu Asanuma commented on HIVE-10602:
-

Hi, [~sershe].
I'd like to try this jira. Could you assign it to me? Thanks.

 optimize PTF for GC
 ---

 Key: HIVE-10602
 URL: https://issues.apache.org/jira/browse/HIVE-10602
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin

 see HIVE-10600



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-08-14 Thread Wojciech Indyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wojciech Indyk updated HIVE-11329:
--
Issue Type: Improvement  (was: Bug)

 Column prefix in key of hbase column prefix map
 ---

 Key: HIVE-11329
 URL: https://issues.apache.org/jira/browse/HIVE-11329
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Wojciech Indyk
Assignee: Wojciech Indyk
Priority: Minor
 Attachments: HIVE-11329.1.patch, HIVE-11329.2.patch


 When I create a table with hbase column prefix 
 https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
 map in hive. 
 E.g. record in HBase
 rowkey: 123
 column: tag_one, value: 0.5
 column: tag_two, value 0.5
 representation in Hive via column prefix mapping tag_.*:
 column: tag mapstring,string
 key: tag_one, value: 0.5
 key: tag_two, value: 0.5
 should be:
 key: one, value: 0.5
 key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11472) ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697079#comment-14697079
 ] 

Hive QA commented on HIVE-11472:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750476/HIVE-11472.2.patch

{color:green}SUCCESS:{color} +1 9358 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4965/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4965/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4965/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750476 - PreCommit-HIVE-TRUNK-Build

 ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per 
 row
 ---

 Key: HIVE-11472
 URL: https://issues.apache.org/jira/browse/HIVE-11472
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: Performance
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11472.1.patch, HIVE-11472.2.patch


 For every row x column
 {code}
 int len = (int) lengths.next();
 int offset = 0;
 byte[] bytes = new byte[len];
 while (len  0) {
   int written = stream.read(bytes, offset, len);
   if (written  0) {
 throw new EOFException(Can't finish byte read from  + stream);
   }
 {code}
 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1552
 This is not a big issue until it misses the GC TLAB.
 From hadoop-2.6.x (HADOOP-10855) you can read into a Text directly. 
 Possibly can create a different TreeReader from the factory for 2.6.x  use a 
 DataInputStream per stream and prevent an allocation in the inner loop.
 {code}
 int len = (int) lengths.next();
 result.readWithKnownLength(datastream, len);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10602) optimize PTF for GC

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10602:

Assignee: Takanobu Asanuma

 optimize PTF for GC
 ---

 Key: HIVE-10602
 URL: https://issues.apache.org/jira/browse/HIVE-10602
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Takanobu Asanuma

 see HIVE-10600



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11525) Bucket pruning

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697580#comment-14697580
 ] 

Sergey Shelukhin commented on HIVE-11525:
-

Sure! Thanks!

 Bucket pruning
 --

 Key: HIVE-11525
 URL: https://issues.apache.org/jira/browse/HIVE-11525
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0
Reporter: Maciek Kocon
Assignee: Takuya Fukudome
  Labels: gsoc2015

 Logically and functionally bucketing and partitioning are quite similar - 
 both provide mechanism to segregate and separate the table's data based on 
 its content. Thanks to that significant further optimisations like 
 [partition] PRUNING or [bucket] MAP JOIN are possible.
 The difference seems to be imposed by design where the PARTITIONing is 
 open/explicit while BUCKETing is discrete/implicit.
 Partitioning seems to be very common if not a standard feature in all current 
 RDBMS while BUCKETING seems to be HIVE specific only.
 In a way BUCKETING could be also called by hashing or simply IMPLICIT 
 PARTITIONING.
 Regardless of the fact that these two are recognised as two separate features 
 available in Hive there should be nothing to prevent leveraging same existing 
 query/join optimisations across the two.
 BUCKET pruning
 Enable partition PRUNING equivalent optimisation for queries on BUCKETED 
 tables
 Simplest example is for queries like:
 SELECT … FROM x WHERE colA=123123
 to read only the relevant bucket file rather than all file-buckets that 
 belong to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10602) optimize PTF for GC

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697579#comment-14697579
 ] 

Sergey Shelukhin commented on HIVE-10602:
-

Sure! Thanks for working on this.

 optimize PTF for GC
 ---

 Key: HIVE-10602
 URL: https://issues.apache.org/jira/browse/HIVE-10602
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin

 see HIVE-10600



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697589#comment-14697589
 ] 

Aaron Tokhy commented on HIVE-10631:


Sorry, fixed it just now.

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11272) LLAP: Execution order within LLAP daemons should consider query-specific priority assigned to fragments

2015-08-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-11272.
---
Resolution: Fixed

 LLAP: Execution order within LLAP daemons should consider query-specific 
 priority assigned to fragments
 ---

 Key: HIVE-11272
 URL: https://issues.apache.org/jira/browse/HIVE-11272
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical
 Fix For: llap

 Attachments: HIVE-11272.1.txt, HIVE-11272.2.txt


 It's currently looking at finishable state, start time and vertex 
 parallelism. Vertex parallelism can be replaced by upstream parallelism as 
 well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10600) optimize group by for GC

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10600:

Assignee: Matt McCline

 optimize group by for GC
 

 Key: HIVE-10600
 URL: https://issues.apache.org/jira/browse/HIVE-10600
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Matt McCline

 Quoting [~gopalv]:
 {noformat}
 So, something like a sum() GROUP BY will create a few hundred thousand
 AbstractAggregationBuffer objects all of which will suddenly go out of
 scope when the map.aggr flushes it down to the sort buffer.
 That particular GC collection takes forever because the tiny buffers take
 a lot of time to walk over and then they leave the memory space
 fragmented, which requires a compaction pass (which btw, writes to a
 page-interleaved NUMA zone).
 And to make things worse, the pre-allocated sort buffers with absolutely
 zero data in them take up most of the tenured regions causing these chunks
 of memory to be visited more and more often as they are part of the Eden
 space.
 {noformat}
 We need flat data structures to be GC friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697371#comment-14697371
 ] 

Aaron Tokhy commented on HIVE-10631:


Sure, here it is:

https://reviews.apache.org/r/37484/

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697368#comment-14697368
 ] 

Alan Gates commented on HIVE-11500:
---

bq. I think we should use YAGNI principle...
That's fine, but it means on the next interface you want to add you have to 
convince me that you should add another set of calls rather than refactor this 
one to be generic.

bq.  Having many methods on metastore is not really that big of a deal, since 
they do different things.
I disagree.  Having just implemented a new version of RawStore I can tell you 
that it took me a long time to understand the nuances of why there's five 
different ways to fetch partitions.  I'm still not sure I have it all straight. 
 We should not just add a new call each time because it's the shortest path to 
get the new thing working.  We need to think about code maintenance and 
understandability for future developers.

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11547) beeline does not continue running the script after an error occurs while beeline --force=true is already set.

2015-08-14 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697535#comment-14697535
 ] 

Yongzhi Chen commented on HIVE-11547:
-

HIVE-11203 has fixed the issue. 

 beeline does not continue running the script after an error occurs while 
 beeline --force=true is already set.
 ---

 Key: HIVE-11547
 URL: https://issues.apache.org/jira/browse/HIVE-11547
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.2.0
 Environment: HDP 2.3 on Virtual box 
Reporter: Wei Huang

 If you execute beeline to run a SQL script file, using the following command
  beeline -f query file name
 the beeline exists after the first error. i.e. when a test query fails 
 beeline quits to the CLI.
 The beeline --force=true seems to have a bug and it does not continue 
 running the script after an error occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11565) LLAP: Tez counters for LLAP

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697674#comment-14697674
 ] 

Sergey Shelukhin commented on HIVE-11565:
-

[~gopalv] [~sseth] fyi

 LLAP: Tez counters for LLAP
 ---

 Key: HIVE-11565
 URL: https://issues.apache.org/jira/browse/HIVE-11565
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin

 1) Tez counters for LLAP are incorrect.
 2) Some counters, such as cache hit ratio for a fragment, are not propagated.
 We need to make sure that Tez counters for LLAP are usable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9756) LLAP: use log4j 2 for llap

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9756:
---
Assignee: Prasanth Jayachandran  (was: Gopal V)

 LLAP: use log4j 2 for llap
 --

 Key: HIVE-9756
 URL: https://issues.apache.org/jira/browse/HIVE-9756
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Prasanth Jayachandran

 For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get 
 throughput friendly logging.
 http://logging.apache.org/log4j/2.0/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697664#comment-14697664
 ] 

Sergey Shelukhin edited comment on HIVE-11500 at 8/14/15 8:09 PM:
--

Actually the main reason all these calls exist for partitions is because they 
use args instead of request-response pattern, which makes it impossible to 
change the signature in a backward-compatible manner. I will happily refactor 
the newly added calls to be generic (req/resp should allow for that), or 
deprecate them in favor of generic calls and remove later, if the need arises. 


was (Author: sershe):
Actually the main reason all these calls exist for partitions is because they 
use args instead of request-response pattern, which makes it impossible to 
change the signature in a backward-compatible manner. I will happily refactor 
these calls to be generic, or deprecate them in favor of generic calls and 
remove later, if the need arises. 

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11535) LLAP: move EncodedTreeReaderFactory, TreeReaderFactory bits that rely on orc.encoded, and StreamUtils if needed, to orc.encoded package

2015-08-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697686#comment-14697686
 ] 

Prasanth Jayachandran commented on HIVE-11535:
--

LGTM, +1

 LLAP: move EncodedTreeReaderFactory, TreeReaderFactory bits that rely on 
 orc.encoded, and StreamUtils if needed, to orc.encoded package
 ---

 Key: HIVE-11535
 URL: https://issues.apache.org/jira/browse/HIVE-11535
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11535.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens

2015-08-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11566:
-
Description: 
Currently it's allocating one write buffer for a number of hash partitions up 
front, which can cause GC pause.

It's better to do the write buffer allocation on demand.

  was:
Currently it's allocating one write buffer for a number of hash partitions up 
front, which causes GC pause.

It's better to do the write buffer allocation on demand.


 Hybrid grace hash join should only allocate write buffer for a hash partition 
 when first write happens
 --

 Key: HIVE-11566
 URL: https://issues.apache.org/jira/browse/HIVE-11566
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng

 Currently it's allocating one write buffer for a number of hash partitions up 
 front, which can cause GC pause.
 It's better to do the write buffer allocation on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens

2015-08-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11566:
-
Affects Version/s: 1.2.0

 Hybrid grace hash join should only allocate write buffer for a hash partition 
 when first write happens
 --

 Key: HIVE-11566
 URL: https://issues.apache.org/jira/browse/HIVE-11566
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng

 Currently it's allocating one write buffer for a number of hash partitions up 
 front, which can cause GC pause.
 It's better to do the write buffer allocation on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11524) LLAP: tez.runtime.compress doesn't appear to be honored for LLAP

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11524:

Fix Version/s: llap

 LLAP: tez.runtime.compress doesn't appear to be honored for LLAP
 

 Key: HIVE-11524
 URL: https://issues.apache.org/jira/browse/HIVE-11524
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Fix For: llap


 When running llap on an openstack cluster without snappy installed, with 
 tez.runtime.compress set to false and codec set to snappy, one still gets the 
 exceptions due to snappy codec being absent:
 {noformat}
 2015-08-10 11:14:30,440 
 [TezTaskRunner_attempt_1438943112941_0015_2_00_00_0(attempt_1438943112941_0015_2_00_00_0)]
  ERROR org.apache.hadoop.io.compress.snappy.SnappyCompressor: failed to load 
 SnappyCompressor
 java.lang.NoSuchFieldError: clazz
   at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native 
 Method)
   at 
 org.apache.hadoop.io.compress.snappy.SnappyCompressor.clinit(SnappyCompressor.java:57)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
   at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:153)
   at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:138)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:406)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:367)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.finalSpill(UnorderedPartitionedKVWriter.java:612)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.close(UnorderedPartitionedKVWriter.java:521)
   at 
 org.apache.tez.runtime.library.output.UnorderedKVOutput.close(UnorderedKVOutput.java:128)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:376)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:79)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 When it's set to true, the client complains about snappy. When it's set to 
 fails, the client doesn't complain but it still tries to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11524) LLAP: tez.runtime.compress doesn't appear to be honored for LLAP

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11524.
-
Resolution: Cannot Reproduce

user error/misunderstanding

 LLAP: tez.runtime.compress doesn't appear to be honored for LLAP
 

 Key: HIVE-11524
 URL: https://issues.apache.org/jira/browse/HIVE-11524
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth

 When running llap on an openstack cluster without snappy installed, with 
 tez.runtime.compress set to false and codec set to snappy, one still gets the 
 exceptions due to snappy codec being absent:
 {noformat}
 2015-08-10 11:14:30,440 
 [TezTaskRunner_attempt_1438943112941_0015_2_00_00_0(attempt_1438943112941_0015_2_00_00_0)]
  ERROR org.apache.hadoop.io.compress.snappy.SnappyCompressor: failed to load 
 SnappyCompressor
 java.lang.NoSuchFieldError: clazz
   at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native 
 Method)
   at 
 org.apache.hadoop.io.compress.snappy.SnappyCompressor.clinit(SnappyCompressor.java:57)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
   at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:153)
   at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:138)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:406)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:367)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.finalSpill(UnorderedPartitionedKVWriter.java:612)
   at 
 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.close(UnorderedPartitionedKVWriter.java:521)
   at 
 org.apache.tez.runtime.library.output.UnorderedKVOutput.close(UnorderedKVOutput.java:128)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:376)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:79)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 When it's set to true, the client complains about snappy. When it's set to 
 fails, the client doesn't complain but it still tries to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11525) Bucket pruning

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11525:

Assignee: Takuya Fukudome

 Bucket pruning
 --

 Key: HIVE-11525
 URL: https://issues.apache.org/jira/browse/HIVE-11525
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0
Reporter: Maciek Kocon
Assignee: Takuya Fukudome
  Labels: gsoc2015

 Logically and functionally bucketing and partitioning are quite similar - 
 both provide mechanism to segregate and separate the table's data based on 
 its content. Thanks to that significant further optimisations like 
 [partition] PRUNING or [bucket] MAP JOIN are possible.
 The difference seems to be imposed by design where the PARTITIONing is 
 open/explicit while BUCKETing is discrete/implicit.
 Partitioning seems to be very common if not a standard feature in all current 
 RDBMS while BUCKETING seems to be HIVE specific only.
 In a way BUCKETING could be also called by hashing or simply IMPLICIT 
 PARTITIONING.
 Regardless of the fact that these two are recognised as two separate features 
 available in Hive there should be nothing to prevent leveraging same existing 
 query/join optimisations across the two.
 BUCKET pruning
 Enable partition PRUNING equivalent optimisation for queries on BUCKETED 
 tables
 Simplest example is for queries like:
 SELECT … FROM x WHERE colA=123123
 to read only the relevant bucket file rather than all file-buckets that 
 belong to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10144) [LLAP] merge brought in file blocking github sync

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697643#comment-14697643
 ] 

Sergey Shelukhin commented on HIVE-10144:
-

[~hagleitn] [~gopalv] [~vikram.dixit] [~sseth] [~prasanth_j] maybe now is the 
good time to destroy the history of the llap branch? :)
We can rebase to exclude the large file from history, and also rename all the 
commits that are not attached to JIRAs. 

 [LLAP] merge brought in file blocking github sync
 -

 Key: HIVE-10144
 URL: https://issues.apache.org/jira/browse/HIVE-10144
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Szehon Ho
Assignee: Gunther Hagleitner

 r1669718 brought in a file that is not in source control on llap branch:
 [http://svn.apache.org/repos/asf/hive/branches/llap/itests/thirdparty/|http://svn.apache.org/repos/asf/hive/branches/llap/itests/thirdparty/]
 It is a file downloaded during test build and should not be in source 
 control.  It is actually blocking the github sync as its too large. See 
 INFRA-9360



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697664#comment-14697664
 ] 

Sergey Shelukhin commented on HIVE-11500:
-

Actually the main reason all these calls exist for partitions is because they 
use args instead of request-response pattern, which makes it impossible to 
change the signature in a backward-compatible manner. I will happily refactor 
these calls to be generic, or deprecate them in favor of generic calls and 
remove later, if the need arises. 

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697666#comment-14697666
 ] 

Sergey Shelukhin commented on HIVE-11500:
-

Btw, you should review the API patch in HIVE-11552 ;)

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens

2015-08-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11566:
-
Description: 
Currently it's allocating one write buffer for a number of hash partitions up 
front, which causes GC pause.

It's better to do the write buffer allocation on demand.

  was:
Currently it's allocating a write buffer for a fixed number of hash partitions 
up front, which causes GC pause.

It's better to do the write buffer allocation on demand.


 Hybrid grace hash join should only allocate write buffer for a hash partition 
 when first write happens
 --

 Key: HIVE-11566
 URL: https://issues.apache.org/jira/browse/HIVE-11566
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng

 Currently it's allocating one write buffer for a number of hash partitions up 
 front, which causes GC pause.
 It's better to do the write buffer allocation on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8898) Remove HIVE-8874 once HBASE-12493 is fixed

2015-08-14 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697428#comment-14697428
 ] 

Swarnim Kulkarni commented on HIVE-8898:


I logged a JIRA here[1] to revert the work done.

[1] https://issues.apache.org/jira/browse/HIVE-11559

 Remove HIVE-8874 once HBASE-12493 is fixed
 --

 Key: HIVE-8898
 URL: https://issues.apache.org/jira/browse/HIVE-8898
 Project: Hive
  Issue Type: Task
  Components: HBase Handler
Reporter: Brock Noland
Assignee: Yongzhi Chen
Priority: Blocker
 Fix For: 1.2.0

 Attachments: HIVE-8898.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697453#comment-14697453
 ] 

Alan Gates commented on HIVE-11317:
---

+1

 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: triage
 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-08-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11304:
-
Attachment: HIVE-11304.11.patch

For some reason templeton.cmd file did not apply cleanly on master but 
precommit test did not have that problem. Uploading the clean patch for future 
reference.

 Migrate to Log4j2 from Log4j 1.x
 

 Key: HIVE-11304
 URL: https://issues.apache.org/jira/browse/HIVE-11304
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11304.10.patch, HIVE-11304.11.patch, 
 HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, 
 HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, 
 HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch


 Log4J2 has some great benefits and can benefit hive significantly. Some 
 notable features include
 1) Performance (parametrized logging, performance when logging is disabled 
 etc.) More details can be found here 
 https://logging.apache.org/log4j/2.x/performance.html
 2) RoutingAppender - Route logs to different log files based on MDC context 
 (useful for HS2, LLAP etc.)
 3) Asynchronous logging
 This is an umbrella jira to track changes related to Log4j2 migration.
 Log4J1 EOL - 
 https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools

2015-08-14 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-11558:
---
Description: 
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at 
scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at 
scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

What's interesting is that the table works fine in Hive when selecting out of 
it, even when doing select * on the whole table and letting it run to the end 
(it's a sample data set), it's only other tools it causes problems for.

All fields are string exception for the first one which is timestamp, but this 
is not that known issue since if I create another table with 3 fields including 
the timestamp and two string fields it works fine in other tools.

The only thing I can see which appears to cause this is the other fields have 
lots of NULLs in them as those json fields may or may not be present.

I've converted this exact same json data set to parquet using Apache Drill and 
also using Apache SparkSQL and both of those tools create parquet files from 
this data set as a straight conversion that are fine when accessed via Parquet 
tools or Drill or Spark or Hive (using an external Hive table definition 
layered over the generated parquet files).

This implies that it's Hive's generation of Parquet that is broken since both 
Drill and Spark can convert the dataset from JSON to Parquet without any issues 
on reading the files back in any of other tools.

  was:
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 

[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697438#comment-14697438
 ] 

Eugene Koifman commented on HIVE-11317:
---

patch 4 includes changes to tests such that they don't rely on timing and 
better comments.

The reason for separate thread is modularity and testing.  For example, if 
timed out transaction reaper is not keeping up it won't interfere with 
compaction scheduling and vs.  It can also be configured separately and makes 
testing easier.  I think HousekeeprService is a nice abstraction for later when 
we add alerting capability and perhaps an isAlive service for compaction 
processes.  

performTimeouts():  it's more efficient to read 2500 entries from TXNS than 
sending 25 queries and we can easily cache the result since it's just a list of 
longs.  The rest of the logic runs each batch in a separate transaction to keep 
lock duration shorter - hopefully reduce the number of retries due to 
deadlocks. 



 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: triage
 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11557) CBO (Calcite Return Path): Convert to flat AND/OR

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697437#comment-14697437
 ] 

Ashutosh Chauhan commented on HIVE-11557:
-

+1

 CBO (Calcite Return Path): Convert to flat AND/OR
 -

 Key: HIVE-11557
 URL: https://issues.apache.org/jira/browse/HIVE-11557
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11557.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11560) Patch for branch-1

2015-08-14 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11560:

Attachment: HIVE-11560.1.patch.txt

Patch attached.

 Patch for branch-1
 --

 Key: HIVE-11560
 URL: https://issues.apache.org/jira/browse/HIVE-11560
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-11560.1.patch.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697497#comment-14697497
 ] 

Daniel Dai commented on HIVE-10289:
---

I am doing comparison in actual datatype. Hbase bytes are converted to actual 
datatype using BinarySortableSerDe. Do you mean Operator.val? That is string 
and means different thing according to operators it handles. For LIKE, it is 
the regex string. For NOTEQUALS, it is the value to compare against, and yes 
this can be optimized by converting to actual type at init time rather than in 
compareTo(). But that's operator dependent.

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception

2015-08-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11562:
-
Attachment: HIVE-11562.patch

 Typo in hive-log4j2.xml throws unknown level exception
 --

 Key: HIVE-11562
 URL: https://issues.apache.org/jira/browse/HIVE-11562
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11562.patch


 Noticing some typo in default hive-log4j2.xml used for tests causing the 
 following exception
 {code}
 2015-08-14 11:26:35,965 WARN Error while converting string 
 [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using 
 default value [null]. java.lang.IllegalArgumentException: Unknown level 
 constant [{SYS:HIVE.LOG.LEVEL}].
   at org.apache.logging.log4j.Level.valueOf(Level.java:286)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161)
   at 
 org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442)
   at 
 org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102)
   at 
 org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55)
   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
   at 
 org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:200)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11493) Predicate with integer column equals double evaluates to false

2015-08-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11493:

Component/s: Query Planning

 Predicate with integer column equals double evaluates to false
 --

 Key: HIVE-11493
 URL: https://issues.apache.org/jira/browse/HIVE-11493
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Blocker
 Fix For: 2.0.0

 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch, 
 HIVE-11493.03.patch, HIVE-11493.04.patch


 Filters with integer column equals double constant evaluates to false 
 everytime. Negative double constant works fine.
 {code:title=explain select * from orc_ppd where t = 10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:false (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}
 {code:title=explain select * from orc_ppd where t = -10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:(t = (- 10.0)) (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: (was: HIVE-11424.01.patch)

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools

2015-08-14 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-11558:
---
Description: 
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at 
scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at 
scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

What's interesting is that the table works fine in Hive when selecting out of 
it, even when doing select * on the whole table and letting it run to the end 
(it's a sample data set), it's only other tools it causes problems for.

All fields are string except for the first one which is timestamp, but this is 
not that known issue since if I create another parquet table with 3 fields 
including the timestamp and two string fields using CTAS those hive generated 
parquet files works fine in the other tools.

The only thing I can see which appears to cause this is the other fields have 
lots of NULLs in them as those json fields may or may not be present.

I've converted this exact same json data set to parquet using Apache Drill and 
also using Apache Spark SQL and both of those tools create parquet files from 
this data set as a straight conversion that are fine when accessed via Parquet 
tools or Drill or Spark or Hive (using an external Hive table definition 
layered over the generated parquet files).

This implies that it's Hive's generation of Parquet that is broken since both 
Drill and Spark can convert the dataset from JSON to Parquet without any issues 
on reading the files back in any of other tools.

  was:
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 

[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF

2015-08-14 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697363#comment-14697363
 ] 

Alexander Pivovarov commented on HIVE-10276:


Hive documentation for date_format UDF clearly says - Supported formats are 
Java SimpleDateFormat formats
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Feel free to submit patch for date_format_mysql UDF

 Implement date_format(timestamp, fmt) UDF
 -

 Key: HIVE-10276
 URL: https://issues.apache.org/jira/browse/HIVE-10276
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Fix For: 1.2.0

 Attachments: HIVE-10276.01.patch


 date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a 
 value of String in the format specified by the java date format fmt.
 Supported formats listed here:
 https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11493) Predicate with integer column equals double evaluates to false

2015-08-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11493:

Fix Version/s: 2.0.0

 Predicate with integer column equals double evaluates to false
 --

 Key: HIVE-11493
 URL: https://issues.apache.org/jira/browse/HIVE-11493
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Blocker
 Fix For: 2.0.0

 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch, 
 HIVE-11493.03.patch, HIVE-11493.04.patch


 Filters with integer column equals double constant evaluates to false 
 everytime. Negative double constant works fine.
 {code:title=explain select * from orc_ppd where t = 10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:false (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}
 {code:title=explain select * from orc_ppd where t = -10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:(t = (- 10.0)) (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: (was: HIVE-11424.01.patch)

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-08-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11304:
-
Component/s: Logging

 Migrate to Log4j2 from Log4j 1.x
 

 Key: HIVE-11304
 URL: https://issues.apache.org/jira/browse/HIVE-11304
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11304.10.patch, HIVE-11304.2.patch, 
 HIVE-11304.3.patch, HIVE-11304.4.patch, HIVE-11304.5.patch, 
 HIVE-11304.6.patch, HIVE-11304.7.patch, HIVE-11304.8.patch, 
 HIVE-11304.9.patch, HIVE-11304.patch


 Log4J2 has some great benefits and can benefit hive significantly. Some 
 notable features include
 1) Performance (parametrized logging, performance when logging is disabled 
 etc.) More details can be found here 
 https://logging.apache.org/log4j/2.x/performance.html
 2) RoutingAppender - Route logs to different log files based on MDC context 
 (useful for HS2, LLAP etc.)
 3) Asynchronous logging
 This is an umbrella jira to track changes related to Log4j2 migration.
 Log4J1 EOL - 
 https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11534) Improve validateTableCols error message

2015-08-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11534:
---
Fix Version/s: 1.3.0

 Improve validateTableCols error message
 ---

 Key: HIVE-11534
 URL: https://issues.apache.org/jira/browse/HIVE-11534
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11534.patch


 For tables created without column definition in the DDL (but referencing the 
 schema in the underlying file format like Avro), 
 ObjectStore.validateTableCols throws an exception that doesn't include the 
 table and db name.  This makes it tedious to lookup table name in schema 
 files.
 Example:
 {code}
 ERROR org.apache.hadoop.hive.metastore.ObjectStore: Error retrieving 
 statistics via jdo
 MetaException(message:Column wpp_mbrshp_hix_ik doesn't exist.)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:6061)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6012)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:160)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6084)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6076)
 {code}
 We should add database and the table name to the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11561) Patch for master

2015-08-14 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697482#comment-14697482
 ] 

Swarnim Kulkarni commented on HIVE-11561:
-

I actually logged this one but seems like it's not really needed as we decided 
to keep forward the master to 1.x. So we would be keeping this NP change on 
master and then upgrading it to 1.x of HBase as part of [1]. 

Marking as won't fix.

[1] https://issues.apache.org/jira/browse/HIVE-10491

 Patch for master
 

 Key: HIVE-11561
 URL: https://issues.apache.org/jira/browse/HIVE-11561
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11561) Patch for master

2015-08-14 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni resolved HIVE-11561.
-
Resolution: Won't Fix

 Patch for master
 

 Key: HIVE-11561
 URL: https://issues.apache.org/jira/browse/HIVE-11561
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools

2015-08-14 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-11558:
---
Description: 
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at 
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
at 
scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at 
scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

What's interesting is that the table works fine in Hive when selecting out of 
it, even when doing select * on the whole table and letting it run to the end 
(it's a sample data set), it's only other tools it causes problems for.

All fields are string except for the first one which is timestamp, but this is 
not that known issue since if I create another parquet table with 3 fields 
including the timestamp and two string fields using CTAS those hive generated 
parquet files works fine in the other tools.

The only thing I can see which appears to cause this is the other fields have 
lots of NULLs in them as those json fields may or may not be present.

I've converted this exact same json data set to parquet using Apache Drill and 
also using Apache Spark SQL and both of those tools create parquet files from 
this data set as a straight conversion that are fine when accessed via Parquet 
tools or Drill or Spark or Hive (using an external Hive table definition 
layered over the generated parquet files).

This implies that it's Hive's generation of Parquet that is broken since both 
Drill and Spark can convert the dataset from JSON to Parquet without any issues 
on reading the files back in any of the other mentioned tools.

  was:
When creating a Parquet table in Hive from a table in another format (in this 
case JSON) using CTAS, the generated parquet files are created with broken 
footers and cause NullPointerExceptions in both Parquet tools and Spark when 
reading the files directly.

Here is the error from parquet tools:

{code}Could not read footer: java.lang.NullPointerException{code}

Here is the error from Spark reading the parquet file back:
{code}java.lang.NullPointerException
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249)
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520)
at 
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at 

[jira] [Updated] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11317:
--
Attachment: HIVE-11317.4.patch

 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: triage
 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11559) Revert work done in HIVE-8898

2015-08-14 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-11559:
---

Assignee: Swarnim Kulkarni

 Revert work done in HIVE-8898
 -

 Key: HIVE-11559
 URL: https://issues.apache.org/jira/browse/HIVE-11559
 Project: Hive
  Issue Type: Bug
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni

 We unfortunately need to revert the work done in HIVE-8898 as it is 
 non-passive with the older hbase versions. We need to revert this from 
 branch-1 and commit this onto master to maintain passivity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11560) Patch for branch-1

2015-08-14 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-11560:
---

Assignee: Swarnim Kulkarni

 Patch for branch-1
 --

 Key: HIVE-11560
 URL: https://issues.apache.org/jira/browse/HIVE-11560
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11561) Patch for master

2015-08-14 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-11561:
---

Assignee: Swarnim Kulkarni

 Patch for master
 

 Key: HIVE-11561
 URL: https://issues.apache.org/jira/browse/HIVE-11561
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11560) Patch for branch-1

2015-08-14 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697466#comment-14697466
 ] 

Swarnim Kulkarni commented on HIVE-11560:
-

[~sershe] Mind reviewing this for me?

 Patch for branch-1
 --

 Key: HIVE-11560
 URL: https://issues.apache.org/jira/browse/HIVE-11560
 Project: Hive
  Issue Type: Sub-task
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-11560.1.patch.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: HIVE-11424.01.patch

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.01.patch, HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11278) Partition.setOutputFormatClass should not do toString for Class object

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697512#comment-14697512
 ] 

Ashutosh Chauhan commented on HIVE-11278:
-

[~prongs] Will it be possible to add a test case which fails in absence of this 
patch?

 Partition.setOutputFormatClass should not do toString for Class object 
 ---

 Key: HIVE-11278
 URL: https://issues.apache.org/jira/browse/HIVE-11278
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
 Fix For: 2.0.0

 Attachments: HIVE-11278.01.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L286
 inside setInputFormatClass, we're doing:
 {noformat}
  public void setInputFormatClass(Class? extends InputFormat 
 inputFormatClass) {
 this.inputFormatClass = inputFormatClass;
 tPartition.getSd().setInputFormat(inputFormatClass.getName());
   }
 {noformat}
 But inside setOutputFormatClass, we're doing toString for class, instead of 
 getName().
 {noformat}
   public void setOutputFormatClass(Class? extends HiveOutputFormat 
 outputFormatClass) {
 this.outputFormatClass = outputFormatClass;
 tPartition.getSd().setOutputFormat(HiveFileFormatUtils
 .getOutputFormatSubstitute(outputFormatClass).toString());
   }
 {noformat}
 Difference is that, for Class A.class, toString is class A.class, getName 
 is A.class. So Class.forName(cls.getName()) succeeds, but 
 Class.forName(cls.toString()) is not valid. 
 So if you get a partition, set outputformat, and make an alter call, then get 
 the partition again and make a getOutputFormatClass call on that object, it 
 throws a ClassNotFoundException on 
 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L316,
  because it's basically calling Class.forName(class a.b.c.ClassName.class) 
 which is wrong!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697356#comment-14697356
 ] 

Alan Gates commented on HIVE-10289:
---

I'm not done reviewing the patch yet, but I have one bigger question:  it looks 
like we're doing the comparison in String format (in 
PartitionKeyComparator.compareTo we convert the byte[] value passed in into a 
String and the values passed in the protobuf are string).  Why pay the cost of 
the string conversion?  Why not leave it in byte[] and use bytes in the 
protobuf?  This seems like it would be faster since this filter will be applied 
to every row in the scan range.

[~sershe], I think the value of ObjectInspectors over OrderedBytes is that 
there's guaranteed to be an ObjectInspector for every Hive type, whereas there 
are some Hive types not covered by OrderedBytes (e.g. Date, Timestamp). 

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697375#comment-14697375
 ] 

Ashutosh Chauhan commented on HIVE-10631:
-

Getting this:

You don't have access to this review request.
This review request is private. You must be a requested reviewer, either 
directly or on a requested group, and have permission to access the repository 
in order to view this review request.

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Description: We create a rule that will transform OR clauses into IN 
clauses (when possible).  (was: 1) Remove early bail out condition.
2) Create IN clause instead of OR tree (when possible).)

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11534) Improve validateTableCols error message

2015-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697442#comment-14697442
 ] 

Xuefu Zhang commented on HIVE-11534:


Also pushed to branch-1.0.

 Improve validateTableCols error message
 ---

 Key: HIVE-11534
 URL: https://issues.apache.org/jira/browse/HIVE-11534
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11534.patch


 For tables created without column definition in the DDL (but referencing the 
 schema in the underlying file format like Avro), 
 ObjectStore.validateTableCols throws an exception that doesn't include the 
 table and db name.  This makes it tedious to lookup table name in schema 
 files.
 Example:
 {code}
 ERROR org.apache.hadoop.hive.metastore.ObjectStore: Error retrieving 
 statistics via jdo
 MetaException(message:Column wpp_mbrshp_hix_ik doesn't exist.)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:6061)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6012)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:160)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6084)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6076)
 {code}
 We should add database and the table name to the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: HIVE-11424.01.patch

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.01.patch, HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11557) CBO (Calcite Return Path): Convert to flat AND/OR

2015-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697424#comment-14697424
 ] 

Hive QA commented on HIVE-11557:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750503/HIVE-11557.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9320 tests executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4967/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4967/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4967/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750503 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): Convert to flat AND/OR
 -

 Key: HIVE-11557
 URL: https://issues.apache.org/jira/browse/HIVE-11557
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11557.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: HIVE-11424.01.patch

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.01.patch, HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11490) Lazily call ASTNode::toStringTree() after tree modification

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697767#comment-14697767
 ] 

Ashutosh Chauhan commented on HIVE-11490:
-

+1

 Lazily call ASTNode::toStringTree() after tree modification
 ---

 Key: HIVE-11490
 URL: https://issues.apache.org/jira/browse/HIVE-11490
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11490.1.patch, HIVE-11490.2.patch, 
 HIVE-11490.3.patch


 Currently, we call toStringTree() as part of HIVE-11316 everytime the tree is 
 modified. This is a bad approach as we can lazily delay this to the point 
 when toStringTree() is called again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7152) OutputJobInfo.setPosOfPartCols() Comparator bug

2015-08-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7152:
-
Assignee: (was: Eugene Koifman)

 OutputJobInfo.setPosOfPartCols() Comparator bug
 ---

 Key: HIVE-7152
 URL: https://issues.apache.org/jira/browse/HIVE-7152
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Eugene Koifman

 this method compares Integer objects using '=='.  This may break for wide 
 tables that have more than 127 columns.
 http://stackoverflow.com/questions/2602636/why-cant-the-compiler-jvm-just-make-autoboxing-just-work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11570) Fix PTest2 log4j2.version

2015-08-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-11570.

Resolution: Fixed

 Fix PTest2 log4j2.version 
 --

 Key: HIVE-11570
 URL: https://issues.apache.org/jira/browse/HIVE-11570
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11570.1.patch


 {code}
 + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
 [INFO] Scanning for projects...
 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hive:hive-ptest:1.0 
 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
  has 4 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
 '${log4j2.version}'. @ line 69, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
 '${log4j2.version}'. @ line 74, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 79, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 84, column 16
 [ERROR] 
 {code}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10497) Upgrade hive branch to latest Tez

2015-08-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10497:
---
Attachment: (was: HIVE-10497.3.patch)

 Upgrade hive branch to latest Tez
 -

 Key: HIVE-10497
 URL: https://issues.apache.org/jira/browse/HIVE-10497
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10497.1.patch, HIVE-10497.1.patch, 
 HIVE-10497.2.patch


 Upgrade hive to the upcoming tez-0.7 release 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11563) Perflogger loglines are repeated

2015-08-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11563.
--
Resolution: Fixed

 Perflogger loglines are repeated
 

 Key: HIVE-11563
 URL: https://issues.apache.org/jira/browse/HIVE-11563
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11563.patch


 After HIVE-11304, the perflogger log lines in qtests are repeated.
 {code}
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch

2015-08-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697762#comment-14697762
 ] 

Prasanth Jayachandran commented on HIVE-11542:
--

I don't see HdfsUtils class being used anywhere. Remove it? Otherwise looks 
good to me +1

 port fileId support on shims and splits from llap branch
 

 Key: HIVE-11542
 URL: https://issues.apache.org/jira/browse/HIVE-11542
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch, 2.0.0

 Attachments: HIVE-11542.patch


 This is helpful for any kind of file-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11552:

Attachment: HIVE-11552.nogen.patch

Updated nogen patch to remove some spurious changes

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11570) Fix PTest2 log4j2.version

2015-08-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11570:
---
Description: 
{code}
+ mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
-Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
[INFO] Scanning for projects...
[ERROR] The build could not read 1 project - [Help 1]
[ERROR]   
[ERROR]   The project org.apache.hive:hive-ptest:1.0 
(/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
 has 4 errors
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
'${log4j2.version}'. @ line 69, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
'${log4j2.version}'. @ line 74, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
'${log4j2.version}'. @ line 79, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
'${log4j2.version}'. @ line 84, column 16
[ERROR] 
{code}

NO PRECOMMIT TESTS

  was:
{code}
+ mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
-Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
[INFO] Scanning for projects...
[ERROR] The build could not read 1 project - [Help 1]
[ERROR]   
[ERROR]   The project org.apache.hive:hive-ptest:1.0 
(/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
 has 4 errors
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
'${log4j2.version}'. @ line 69, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
'${log4j2.version}'. @ line 74, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
'${log4j2.version}'. @ line 79, column 16
[ERROR] 'dependencies.dependency.version' for 
org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
'${log4j2.version}'. @ line 84, column 16
[ERROR] 
{code}


 Fix PTest2 log4j2.version 
 --

 Key: HIVE-11570
 URL: https://issues.apache.org/jira/browse/HIVE-11570
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11570.1.patch


 {code}
 + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
 [INFO] Scanning for projects...
 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hive:hive-ptest:1.0 
 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
  has 4 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
 '${log4j2.version}'. @ line 69, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
 '${log4j2.version}'. @ line 74, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 79, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 84, column 16
 [ERROR] 
 {code}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-14 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697900#comment-14697900
 ] 

Sushanth Sowmyan commented on HIVE-11552:
-

+cc [~thejas]

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11567) Some trace logs seeped through with new log4j2 changes

2015-08-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697746#comment-14697746
 ] 

Prasanth Jayachandran commented on HIVE-11567:
--

I don't think this needs a precommit test run as it just reduces the log lines 
in hive.log when running tests.

 Some trace logs seeped through with new log4j2 changes
 --

 Key: HIVE-11567
 URL: https://issues.apache.org/jira/browse/HIVE-11567
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11567.patch


 Observed hive.log file size difference when running with new log4j2 changes 
 (HIVE-11304). Looks like the default threshold was DEBUG in log4j1.x (as 
 log4j.threshold was misspelt). In log4j2 the default threshold was set to ALL 
 which emitted some trace logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception

2015-08-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697744#comment-14697744
 ] 

Prasanth Jayachandran commented on HIVE-11562:
--

I don't think this needs a precommit test run as it just avoids errors written 
to console wrt initialization.

 Typo in hive-log4j2.xml throws unknown level exception
 --

 Key: HIVE-11562
 URL: https://issues.apache.org/jira/browse/HIVE-11562
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11562.patch


 Noticing some typo in default hive-log4j2.xml used for tests causing the 
 following exception
 {code}
 2015-08-14 11:26:35,965 WARN Error while converting string 
 [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using 
 default value [null]. java.lang.IllegalArgumentException: Unknown level 
 constant [{SYS:HIVE.LOG.LEVEL}].
   at org.apache.logging.log4j.Level.valueOf(Level.java:286)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161)
   at 
 org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442)
   at 
 org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102)
   at 
 org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55)
   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
   at 
 org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:200)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11563) Perflogger loglines are repeated

2015-08-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697745#comment-14697745
 ] 

Prasanth Jayachandran commented on HIVE-11563:
--

I don't think this needs a precommit test run as it just reduces the log lines 
in hive.log when running tests.

 Perflogger loglines are repeated
 

 Key: HIVE-11563
 URL: https://issues.apache.org/jira/browse/HIVE-11563
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11563.patch


 After HIVE-11304, the perflogger log lines in qtests are repeated.
 {code}
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree

2015-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697770#comment-14697770
 ] 

Ashutosh Chauhan commented on HIVE-11341:
-

[~hsubramaniyan] Are above failures not related to patch or are you working on 
fixing those?

 Avoid expensive resizing of ASTNode tree 
 -

 Key: HIVE-11341
 URL: https://issues.apache.org/jira/browse/HIVE-11341
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, 
 HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, 
 HIVE-11341.6.patch, HIVE-11341.7.patch


 {code}
 Stack TraceSample CountPercentage(%) 
 parse.BaseSemanticAnalyzer.analyze(ASTNode, Context)   1,605   90 
parse.CalcitePlanner.analyzeInternal(ASTNode)   1,605   90 
   parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,605   90 
  parse.CalcitePlanner.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext)  1,604   90 
 parse.SemanticAnalyzer.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,604   90 
parse.SemanticAnalyzer.genPlan(QB)  1,604   90 
   parse.SemanticAnalyzer.genPlan(QB, boolean)  1,604   90 
  parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map)
  1,604   90 
 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, 
 Operator, Map, boolean)  1,603   90 
parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, 
 Operator, boolean)1,603   90 
   parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, 
 RowResolver, boolean)1,603   90 
  
 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx)
 1,603   90 
 
 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 
  1,603   90 

 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx)   1,603   90 
   
 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, 
 TypeCheckProcFactory)  1,603   90 
  
 lib.DefaultGraphWalker.startWalking(Collection, HashMap)  1,579   89 
 
 lib.DefaultGraphWalker.walk(Node)  1,571   89 

 java.util.ArrayList.removeAll(Collection)   1,433   81 
   
 java.util.ArrayList.batchRemove(Collection, boolean) 1,433   81 
  
 java.util.ArrayList.contains(Object)  1,228   69 
 
 java.util.ArrayList.indexOf(Object)1,228   69 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11570) Fix PTest2 log4j2.version

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697857#comment-14697857
 ] 

Sergey Shelukhin commented on HIVE-11570:
-

I wonder if anything needs to be updated for this [~spena]


 Fix PTest2 log4j2.version 
 --

 Key: HIVE-11570
 URL: https://issues.apache.org/jira/browse/HIVE-11570
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11570.1.patch


 {code}
 + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
 [INFO] Scanning for projects...
 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hive:hive-ptest:1.0 
 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
  has 4 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
 '${log4j2.version}'. @ line 69, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
 '${log4j2.version}'. @ line 74, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 79, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 84, column 16
 [ERROR] 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11570) Fix PTest2 log4j2.version

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697856#comment-14697856
 ] 

Sergey Shelukhin commented on HIVE-11570:
-

+1

 Fix PTest2 log4j2.version 
 --

 Key: HIVE-11570
 URL: https://issues.apache.org/jira/browse/HIVE-11570
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11570.1.patch


 {code}
 + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2
 [INFO] Scanning for projects...
 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hive:hive-ptest:1.0 
 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml)
  has 4 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is 
 '${log4j2.version}'. @ line 69, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-web:jar must be a valid version but is 
 '${log4j2.version}'. @ line 74, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 79, column 16
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is 
 '${log4j2.version}'. @ line 84, column 16
 [ERROR] 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697934#comment-14697934
 ] 

Alan Gates commented on HIVE-11552:
---

I can review it.

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11563) Perflogger loglines are repeated

2015-08-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11563:
-
Attachment: HIVE-11563.patch

 Perflogger loglines are repeated
 

 Key: HIVE-11563
 URL: https://issues.apache.org/jira/browse/HIVE-11563
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11563.patch


 After HIVE-11304, the perflogger log lines in qtests are repeated.
 {code}
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree

2015-08-14 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697773#comment-14697773
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11341:
--

[~ashutoshc] I am working on fixing them as I can reproduce them locally.

Thanks
Hari

 Avoid expensive resizing of ASTNode tree 
 -

 Key: HIVE-11341
 URL: https://issues.apache.org/jira/browse/HIVE-11341
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, 
 HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, 
 HIVE-11341.6.patch, HIVE-11341.7.patch


 {code}
 Stack TraceSample CountPercentage(%) 
 parse.BaseSemanticAnalyzer.analyze(ASTNode, Context)   1,605   90 
parse.CalcitePlanner.analyzeInternal(ASTNode)   1,605   90 
   parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,605   90 
  parse.CalcitePlanner.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext)  1,604   90 
 parse.SemanticAnalyzer.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,604   90 
parse.SemanticAnalyzer.genPlan(QB)  1,604   90 
   parse.SemanticAnalyzer.genPlan(QB, boolean)  1,604   90 
  parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map)
  1,604   90 
 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, 
 Operator, Map, boolean)  1,603   90 
parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, 
 Operator, boolean)1,603   90 
   parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, 
 RowResolver, boolean)1,603   90 
  
 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx)
 1,603   90 
 
 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 
  1,603   90 

 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx)   1,603   90 
   
 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, 
 TypeCheckProcFactory)  1,603   90 
  
 lib.DefaultGraphWalker.startWalking(Collection, HashMap)  1,579   89 
 
 lib.DefaultGraphWalker.walk(Node)  1,571   89 

 java.util.ArrayList.removeAll(Collection)   1,433   81 
   
 java.util.ArrayList.batchRemove(Collection, boolean) 1,433   81 
  
 java.util.ArrayList.contains(Object)  1,228   69 
 
 java.util.ArrayList.indexOf(Object)1,228   69 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10497) Upgrade hive branch to latest Tez

2015-08-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10497:
---
Attachment: (was: HIVE-10497.3.patch)

 Upgrade hive branch to latest Tez
 -

 Key: HIVE-10497
 URL: https://issues.apache.org/jira/browse/HIVE-10497
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10497.1.patch, HIVE-10497.1.patch, 
 HIVE-10497.2.patch


 Upgrade hive to the upcoming tez-0.7 release 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11568) merge master into branch

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11568:

Description: 
NO PRECOMMIT TESTS


 merge master into branch
 

 Key: HIVE-11568
 URL: https://issues.apache.org/jira/browse/HIVE-11568
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11568.nogen.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11571) Fix Hive PTest2 logging configuration

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697879#comment-14697879
 ] 

Sergey Shelukhin commented on HIVE-11571:
-

+1

 Fix Hive PTest2 logging configuration
 -

 Key: HIVE-11571
 URL: https://issues.apache.org/jira/browse/HIVE-11571
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Trivial
 Fix For: 2.0.0

 Attachments: HIVE-11571.patch


 {code}
 [Fatal Error] log4j2.xml:79:3: The element type Loggers must be terminated 
 by the matching end-tag /Loggers.
 ERROR StatusLogger Error parsing 
 jar:file:/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar!/log4j2.xml
  org.xml.sax.SAXParseException; systemId: 
 jar:file:/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar!/log4j2.xml;
  lineNumber: 79; columnNumber: 3; The element type Loggers must be 
 terminated by the matching end-tag /Loggers.
   at 
 com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
 {code}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697905#comment-14697905
 ] 

Sergey Shelukhin commented on HIVE-11552:
-

is cc equivalent to 1 in some encoding? :)

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11568) merge master into branch

2015-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697911#comment-14697911
 ] 

Alan Gates commented on HIVE-11568:
---

+1, looks like all the relevant changes are outside of hbase metastore code.

 merge master into branch
 

 Key: HIVE-11568
 URL: https://issues.apache.org/jira/browse/HIVE-11568
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11568.nogen.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11563) Perflogger loglines are repeated

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11563:


+1

 Perflogger loglines are repeated
 

 Key: HIVE-11563
 URL: https://issues.apache.org/jira/browse/HIVE-11563
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11563.patch


 After HIVE-11304, the perflogger log lines in qtests are repeated.
 {code}
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,765 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-08-14T12:02:05,766 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11562:


+1

 Typo in hive-log4j2.xml throws unknown level exception
 --

 Key: HIVE-11562
 URL: https://issues.apache.org/jira/browse/HIVE-11562
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11562.patch


 Noticing some typo in default hive-log4j2.xml used for tests causing the 
 following exception
 {code}
 2015-08-14 11:26:35,965 WARN Error while converting string 
 [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using 
 default value [null]. java.lang.IllegalArgumentException: Unknown level 
 constant [{SYS:HIVE.LOG.LEVEL}].
   at org.apache.logging.log4j.Level.valueOf(Level.java:286)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226)
   at 
 org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130)
   at 
 org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247)
   at 
 org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358)
   at 
 org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161)
   at 
 org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426)
   at 
 org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442)
   at 
 org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147)
   at 
 org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102)
   at 
 org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39)
   at 
 org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40)
   at 
 org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55)
   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
   at 
 org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:200)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11567) Some trace logs seeped through with new log4j2 changes

2015-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11567:


+1

 Some trace logs seeped through with new log4j2 changes
 --

 Key: HIVE-11567
 URL: https://issues.apache.org/jira/browse/HIVE-11567
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11567.patch


 Observed hive.log file size difference when running with new log4j2 changes 
 (HIVE-11304). Looks like the default threshold was DEBUG in log4j1.x (as 
 log4j.threshold was misspelt). In log4j2 the default threshold was set to ALL 
 which emitted some trace logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11317:
--
Attachment: HIVE-11317.5.patch

one final tweak: don't start housekeeper unless hive.compactor.initiator.on=true

 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: triage
 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >