[jira] [Commented] (HIVE-8300) Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152843#comment-14152843
 ] 

Hive QA commented on HIVE-8300:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671962/HIVE-8300.1-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6508 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/181/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/181/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-181/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671962

 Missing guava lib causes IllegalStateException when deserializing a task 
 [Spark Branch]
 ---

 Key: HIVE-8300
 URL: https://issues.apache.org/jira/browse/HIVE-8300
 Project: Hive
  Issue Type: Bug
  Components: Spark
 Environment: Spark-1.2.0-SNAPSHOT
Reporter: Rui Li
 Attachments: HIVE-8300.1-spark.patch


 In spark-1.2, we have guava shaded in spark-assembly. And we only ship 
 hive-exec to spark cluster. So spark executor won't have (original) guava in 
 its class path.
 This can cause some problem when TaskRunner deserializes a task, and throws 
 something like this:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, node13-1): java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}
 We may have to verify this issue and ship guava to spark cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4224) Upgrade to Thrift 1.0 when available

2014-09-30 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152849#comment-14152849
 ] 

Nemon Lou commented on HIVE-4224:
-

THRIFT-1869  has been fixed in Thrift 0.9.1,which is released on 21/Aug/13.
Any plan to upgrade thrift to 0.9.1 ?

 Upgrade to Thrift 1.0 when available
 

 Key: HIVE-4224
 URL: https://issues.apache.org/jira/browse/HIVE-4224
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, Metastore, Server Infrastructure
Affects Versions: 0.11.0
Reporter: Brock Noland
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8151:
-
Attachment: HIVE-8151.7.patch

Fixes yet another failure.

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8287) Metadata action errors don't have information about cause

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152893#comment-14152893
 ] 

Hive QA commented on HIVE-8287:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671866/HIVE-8287.3.patch

{color:green}SUCCESS:{color} +1 6372 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1047/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1047/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1047/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671866

 Metadata action errors don't have information about cause
 -

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch, HIVE-8287.3.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}
 Some calls to get database and get table from metastore also treat all 
 exceptions as an 'object does not exist' exception, and end up ignoring the 
 errors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152951#comment-14152951
 ] 

Damien Carol commented on HIVE-8231:


Restarting HDFS AND Yarn AND remote Metastore AND Hiveserver2 didn't helped.
I think the bug comes because there are no base dir.
When I'm doing major compaction. The base is created and I can see the new rows.



 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02 seconds)
 {noformat}
  



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2014-09-30 Thread Sanghyun Yun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanghyun Yun updated HIVE-8285:
---
Attachment: HIVE-8285.patch

I changed to equals function. Please review, [~tedyu] :)

 Reference equality is used on boolean values in 
 PartitionPruner#removeTruePredciates()
 --

 Key: HIVE-8285
 URL: https://issues.apache.org/jira/browse/HIVE-8285
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8285.patch


 {code}
   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
eC.getValue() == Boolean.TRUE) {
 {code}
 equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2014-09-30 Thread Sanghyun Yun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanghyun Yun updated HIVE-8285:
---
Affects Version/s: 0.14.0
   Status: Patch Available  (was: Open)

 Reference equality is used on boolean values in 
 PartitionPruner#removeTruePredciates()
 --

 Key: HIVE-8285
 URL: https://issues.apache.org/jira/browse/HIVE-8285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8285.patch


 {code}
   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
eC.getValue() == Boolean.TRUE) {
 {code}
 equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152972#comment-14152972
 ] 

Hive QA commented on HIVE-8295:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671876/HIVE-8295.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6370 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1048/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671876

 Add batch retrieve partition objects for metastore direct sql 
 --

 Key: HIVE-8295
 URL: https://issues.apache.org/jira/browse/HIVE-8295
 Project: Hive
  Issue Type: Bug
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8295.1.patch


 Currently in MetastoreDirectSql partition objects are constructed in a way 
 that fetching partition ids first. However, if the partition ids that match 
 the filter is larger than 1000, direct sql will fail with the following stack 
 trace:
 {code}
 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
 (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms 
 + 13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
 inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
 TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID
   and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
 FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
 where ((FILTER2.PART_KEY_VAL = ?))]
 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
 PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
 SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, 
 SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, 
 SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS  
 left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID   left outer 
 join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
 (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
 ) order by PART_NAME asc.
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152978#comment-14152978
 ] 

Damien Carol commented on HIVE-8231:


I don't think it's a cache issue. Doing stop/run of ALL daemons of the cluster 
change nothing.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02 seconds)
 {noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()

2014-09-30 Thread Sanghyun Yun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanghyun Yun updated HIVE-8282:
---
Attachment: HIVE-8282.patch

I added null check and logging. Please review, [~yuzhih...@gmail.com]. :)

 Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
 -

 Key: HIVE-8282
 URL: https://issues.apache.org/jira/browse/HIVE-8282
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8282.patch


 In convertJoinMapJoin():
 {code}
 for (Operator? extends OperatorDesc parentOp : 
 joinOp.getParentOperators()) {
   if (parentOp instanceof MuxOperator) {
 return null;
   }
 }
 {code}
 NPE would result if convertJoinMapJoin() returns null:
 {code}
 MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
 bigTablePosition);
 MapJoinDesc joinDesc = mapJoinOp.getConf();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()

2014-09-30 Thread Sanghyun Yun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanghyun Yun updated HIVE-8282:
---
Affects Version/s: 0.14.0
   Status: Patch Available  (was: Open)

 Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
 -

 Key: HIVE-8282
 URL: https://issues.apache.org/jira/browse/HIVE-8282
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8282.patch


 In convertJoinMapJoin():
 {code}
 for (Operator? extends OperatorDesc parentOp : 
 joinOp.getParentOperators()) {
   if (parentOp instanceof MuxOperator) {
 return null;
   }
 }
 {code}
 NPE would result if convertJoinMapJoin() returns null:
 {code}
 MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
 bigTablePosition);
 MapJoinDesc joinDesc = mapJoinOp.getConf();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153012#comment-14153012
 ] 

Damien Carol commented on HIVE-8231:


Another strange result, when I'm doing this query:
{code}
select ROW__ID, INPUT__FILE__NAME, * from foo7;
{code}
I'm having this result :
{noformat}
+---+-+--+--+
|row__id|  
input__file__name  | foo7.id  |
+---+-+--+--+
| {transactionid:536,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:537,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:538,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:539,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:540,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:541,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:542,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:544,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:545,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
| {transactionid:546,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0  | 1 
   |
+---+-+--+--+
10 rows selected (0.168 seconds)
{noformat}

Which is wrong.

See here :
{noformat}
0: jdbc:hive2://nc-h04:1/casino dfs -ls 
/user/hive/warehouse/casino.db/foo7;
+-+--+
|   DFS Output  
  |
+-+--+
| Found 4 items 
  |
| drwxr-xr-x   - hduser supergroup  0 2014-09-30 11:29 
/user/hive/warehouse/casino.db/foo7/base_542   |
| drwxr-xr-x   - hduser supergroup  0 2014-09-30 11:30 
/user/hive/warehouse/casino.db/foo7/delta_544_544  |
| drwxr-xr-x   - hduser supergroup  0 2014-09-30 11:30 
/user/hive/warehouse/casino.db/foo7/delta_545_545  |
| drwxr-xr-x   - hduser supergroup  0 2014-09-30 11:30 
/user/hive/warehouse/casino.db/foo7/delta_546_546  |
+-+--+
5 rows selected (0.025 seconds)
{noformat}


 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153022#comment-14153022
 ] 

Damien Carol commented on HIVE-8231:


With block offset VC :
{noformat}
+---+-+--+--+--+
|row__id|  
input__file__name  | block__offset__inside__file  | 
foo7.id  |
+---+-+--+--+--+
| {transactionid:536,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
61   | 1|
| {transactionid:537,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
122  | 1|
| {transactionid:538,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
183  | 1|
| {transactionid:539,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
244  | 1|
| {transactionid:540,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
306  | 1|
| {transactionid:541,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
367  | 1|
| {transactionid:542,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
428  | 1|
| {transactionid:544,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
489  | 1|
| {transactionid:545,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
550  | 1|
| {transactionid:546,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
612  | 1|
| {transactionid:547,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
612  | 1|
| {transactionid:548,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
612  | 1|
| {transactionid:549,bucketid:0,rowid:0}  | 
hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0  | 
612  | 1|
+---+-+--+--+--+
13 rows selected (0.162 seconds)
{noformat}

Column {{input\_\_file\_\_name}} and {{block\_\_offset\_\_inside\_\_file}} have 
wrong values for the last 3 rows. These rows are in delta directories.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. 

[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Attachment: HIVE-7689.9.patch

Rebased on last trunk.

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Description: 
This patch fix wrong lower case tables names in Postgres Metastore back end.

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable LOCKS, COMPACTION and STATS on postgres metastore.

  was:
I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
metastore.


 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 This patch fix wrong lower case tables names in Postgres Metastore back end.
 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and STATS on postgres metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Summary: Fix wrong lower case table names in Postgres Metastore back end  
(was: Enable Postgres as METASTORE back-end)

 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 This patch fix wrong lower case tables names in Postgres Metastore back end.
 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and STATS on postgres metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Description: 
Current 0.14 patch create table with lower case nmae.
This patch fix wrong lower case tables names in Postgres Metastore back end.

  was:
This patch fix wrong lower case tables names in Postgres Metastore back end.

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable LOCKS, COMPACTION and STATS on postgres metastore.


 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 Current 0.14 patch create table with lower case nmae.
 This patch fix wrong lower case tables names in Postgres Metastore back end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8296) Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153040#comment-14153040
 ] 

Hive QA commented on HIVE-8296:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671904/HIVE-8296.02.patch

{color:green}SUCCESS:{color} +1 6371 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1049/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1049/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1049/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671904

 Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for 
 adding rows
 

 Key: HIVE-8296
 URL: https://issues.apache.org/jira/browse/HIVE-8296
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8296.01.patch, HIVE-8296.02.patch


 We reuse the keys for the vectorized row batch and need to use a separate 
 buffer (for strings) for reuse the batch for new values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Description: 
Current 0.14 patch create table with lower case names.
This patch fix wrong lower case tables names in Postgres Metastore back end.

  was:
Current 0.14 patch create table with lower case nmae.
This patch fix wrong lower case tables names in Postgres Metastore back end.


 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 Current 0.14 patch create table with lower case names.
 This patch fix wrong lower case tables names in Postgres Metastore back end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26172: HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException

2014-09-30 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26172/
---

Review request for hive and Thejas Nair.


Bugs: HIVE-8299
https://issues.apache.org/jira/browse/HIVE-8299


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-8299


Diffs
-

  service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java a0f7667 
  service/src/java/org/apache/hive/service/auth/HttpAuthUtils.java 07e8c9a 
  service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java 
245d793 
  service/src/java/org/apache/hive/service/auth/TSetIpAddressProcessor.java 
0149dcf 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
4654acc 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
c4b273c 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
795115e 

Diff: https://reviews.apache.org/r/26172/diff/


Testing
---

Manually on a secure cluster.


Thanks,

Vaibhav Gumashta



[jira] [Updated] (HIVE-8299) HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException

2014-09-30 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8299:
---
Attachment: HIVE-8299.1.patch

 HiveServer2 in http-kerberos  doAs=true is failing with 
 org.apache.hadoop.security.AccessControlException
 --

 Key: HIVE-8299
 URL: https://issues.apache.org/jira/browse/HIVE-8299
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8299.1.patch


 The issue is that it does a doAs at processor level and fails at scratch dir 
 creation before the session is opened. Since we are now using a proxy class 
 to implement doAs at HiveSession level, we should get rid of 
 HttpCLIServiceUGIProcessor and related classes that were used before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8299) HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException

2014-09-30 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8299:
---
Status: Patch Available  (was: Open)

 HiveServer2 in http-kerberos  doAs=true is failing with 
 org.apache.hadoop.security.AccessControlException
 --

 Key: HIVE-8299
 URL: https://issues.apache.org/jira/browse/HIVE-8299
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8299.1.patch


 The issue is that it does a doAs at processor level and fails at scratch dir 
 creation before the session is opened. Since we are now using a proxy class 
 to implement doAs at HiveSession level, we should get rid of 
 HttpCLIServiceUGIProcessor and related classes that were used before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153099#comment-14153099
 ] 

Hive QA commented on HIVE-8298:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671935/HIVE-8298.patch

{color:green}SUCCESS:{color} +1 6371 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1051/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1051/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1051/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671935

 Incorrect results for n-way join when join expressions are not in same order 
 across joins
 -

 Key: HIVE-8298
 URL: https://issues.apache.org/jira/browse/HIVE-8298
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Attachments: HIVE-8298.patch


 select *  from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join 
 srcpart c on a.hr = c.hr and a.key = c.key;
 is minimal query which reproduces it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Zhichun Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153118#comment-14153118
 ] 

Zhichun Wu commented on HIVE-8151:
--

@ [~prasanth_j], after applying HIVE-8151.7.patch , the bug still exists, here 
is the testcase:
{code}
use test;

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.optimize.sort.dynamic.partition=true;

drop table if exists src1;
create table src1 (
key int,
val string
);
load data local inpath '../hive/examples/files/kv1.txt' overwrite into table 
src1;


drop table if exists hive13_dp1;
create table if not exists hive13_dp1 (
k1 int,
k2 int
)
PARTITIONED BY(`day` string COMMENT 'days')
STORED AS ORC;

insert overwrite table `hive13_dp1` partition(`day`)
select
key k1,
count(val) k2,
day `day`
from src1
group by day, key;
select * from hive13_dp1 limit 5;
{code}

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153149#comment-14153149
 ] 

Hive QA commented on HIVE-8196:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671936/HIVE-8196.6.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6371 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1052/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1052/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1052/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671936

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
   

[jira] [Updated] (HIVE-8300) Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]

2014-09-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8300:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch.

 Missing guava lib causes IllegalStateException when deserializing a task 
 [Spark Branch]
 ---

 Key: HIVE-8300
 URL: https://issues.apache.org/jira/browse/HIVE-8300
 Project: Hive
  Issue Type: Bug
  Components: Spark
 Environment: Spark-1.2.0-SNAPSHOT
Reporter: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-8300.1-spark.patch


 In spark-1.2, we have guava shaded in spark-assembly. And we only ship 
 hive-exec to spark cluster. So spark executor won't have (original) guava in 
 its class path.
 This can cause some problem when TaskRunner deserializes a task, and throws 
 something like this:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, node13-1): java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}
 We may have to verify this issue and ship guava to spark cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2014-09-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153220#comment-14153220
 ] 

Ted Yu commented on HIVE-8285:
--

+1

 Reference equality is used on boolean values in 
 PartitionPruner#removeTruePredciates()
 --

 Key: HIVE-8285
 URL: https://issues.apache.org/jira/browse/HIVE-8285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8285.patch


 {code}
   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
eC.getValue() == Boolean.TRUE) {
 {code}
 equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()

2014-09-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153232#comment-14153232
 ] 

Ted Yu commented on HIVE-8282:
--

lgtm

nit: 'bucket map join' was mentioned in the log message @ line 321.
It should appear in the new message as well.

 Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
 -

 Key: HIVE-8282
 URL: https://issues.apache.org/jira/browse/HIVE-8282
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8282.patch


 In convertJoinMapJoin():
 {code}
 for (Operator? extends OperatorDesc parentOp : 
 joinOp.getParentOperators()) {
   if (parentOp instanceof MuxOperator) {
 return null;
   }
 }
 {code}
 NPE would result if convertJoinMapJoin() returns null:
 {code}
 MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
 bigTablePosition);
 MapJoinDesc joinDesc = mapJoinOp.getConf();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8278) Restoring a graph representation of SparkPlan [Spark Branch]

2014-09-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8278:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chao for the nice contribuiton.

 Restoring a graph representation of SparkPlan [Spark Branch]
 

 Key: HIVE-8278
 URL: https://issues.apache.org/jira/browse/HIVE-8278
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8278.1-spark.patch, HIVE-8278.2-spark.patch, 
 HIVE-8278.3-spark.patch


 HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator 
 logic. As a side effect, however, a visual representation of SparkPlan got 
 lost. Such representation is helpful for debugging and performance profiling. 
 In addition, it would be also good to separate plan generation and plan 
 execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8302) GroupByShuffler.java missing apache license header [Spark Branch]

2014-09-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8302:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chao for the contribution.

 GroupByShuffler.java missing apache license header [Spark Branch]
 -

 Key: HIVE-8302
 URL: https://issues.apache.org/jira/browse/HIVE-8302
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8302.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153237#comment-14153237
 ] 

Hive QA commented on HIVE-8263:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671940/HIVE-8263.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6371 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_bigdata
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1053/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1053/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1053/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671940

 CBO : TPC-DS Q64 is item is joined last with store_sales while it should be 
 first as it is the most selective
 -

 Key: HIVE-8263
 URL: https://issues.apache.org/jira/browse/HIVE-8263
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip


 Plan for TPC-DS Q64 shows that item is joined last with store_sales while 
 store_sales x item is the most selective join in the plan.
 Interestingly predicate push down is applied on item but item comes so late 
 in the join which most likely means that calculation of the join selectivity 
 gave too high of a number of it was never considered.
 This is a subset of the logical plan showing that item was joined very last
 {code}
 HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], 
 _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], 
 _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], 
 _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], 
 _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount 
 = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 
 990
 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost 
 = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], 
 _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], 
 _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], 
 _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3571
 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], 
 joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3566
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], 
 _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 890
 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, 
 cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888
   HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], 
 agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 886
 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], 
 $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], 
 $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], 
 $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 
 0.0 cpu, 0.0 io}, id = 884
   HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], 
 ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], 
 ss_store_sk=[$23], ss_promo_sk=[$24], 

[jira] [Created] (HIVE-8307) null character in columns.comments schema property breaks jobconf.xml

2014-09-30 Thread Carl Laird (JIRA)
Carl Laird created HIVE-8307:


 Summary: null character in columns.comments schema property breaks 
jobconf.xml
 Key: HIVE-8307
 URL: https://issues.apache.org/jira/browse/HIVE-8307
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1, 0.13.0
Reporter: Carl Laird


It would appear that the fix for 
https://issues.apache.org/jira/browse/HIVE-6681 is causing the null character 
to show up in job config xml files:

I get the following when trying to insert into an elasticsearch backed table:

[Fatal Error] :336:51: Character reference #
14/06/17 14:40:11 FATAL conf.Configuration: error parsing conf file: 
org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; Character 
reference #
Exception in thread main java.lang.RuntimeException: 
org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; Character 
reference #
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1263)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1129)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1063)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:416)
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:604)
at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:1273)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:667)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; 
Character reference #
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:251)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:300)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1181)
... 11 more
Execution failed with exit status: 1

Line 336 of jobconf.xml:
propertynamecolumns.comments/namevalue#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;/value/property

See https://groups.google.com/forum/#!msg/mongodb-user/lKbha0SzMP8/jvE8ZrJom4AJ 
for more discussion.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6681) Describe table sometimes shows from deserializer for column comments

2014-09-30 Thread Carl Laird (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153278#comment-14153278
 ] 

Carl Laird commented on HIVE-6681:
--

I believe this fix has caused another issue: 
https://issues.apache.org/jira/browse/HIVE-8307



 Describe table sometimes shows from deserializer for column comments
 --

 Key: HIVE-6681
 URL: https://issues.apache.org/jira/browse/HIVE-6681
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6681.2.patch, HIVE-6681.3.patch, HIVE-6681.4.patch, 
 HIVE-6681.5.patch, HIVE-6681.6.patch, HIVE-6681.7.patch, HIVE-6681.8.patch, 
 HIVE-6681.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153279#comment-14153279
 ] 

Alan Gates commented on HIVE-8231:
--

Ok, I'm not sure if we're chasing the same bug or not.  But I'll keep chasing 
the one I see and if we get lucky it will turn out to have the same root cause.

Could you turn on debug level logging on your hive client and HiveServer2 
instance, then do the insert and select that reproduces the error and attach 
both logs.  That would help me have an idea where things may be going wrong.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153281#comment-14153281
 ] 

Alan Gates commented on HIVE-8231:
--

I mean JDBC client, not hive client.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02 seconds)
 {noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6905) Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns

2014-09-30 Thread Greg W (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153297#comment-14153297
 ] 

Greg W commented on HIVE-6905:
--

Now that HIVE-5317 is resolved, is it still conceivable this feature 
(particularly the auto-increment component) will be available in Hive 0.14?

 Implement  Auto increment, primary-foreign Key, not null constraints and 
 default value in Hive Table columns
 

 Key: HIVE-6905
 URL: https://issues.apache.org/jira/browse/HIVE-6905
 Project: Hive
  Issue Type: New Feature
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Pardeep Kumar

 For Hive to replace a modern datawarehouse based on RDBMS, it must have 
 support for keys, constraints, auto-increment values, surrogate keys and not 
 null features etc. Many customers do not move their EDW to Hive due to these 
 reasons as these have been challenging to maintain in Hive.
 This must be implemented once https://issues.apache.org/jira/browse/HIVE-5317 
 for Updates, Deletes and Inserts are done in Hive. This should be next stop 
 for Hive enhancement to take it closer to a very wide mainstream adoption..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8298:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. [~vikram.dixit] It will be good to have this in 0.14 as 
well.

 Incorrect results for n-way join when join expressions are not in same order 
 across joins
 -

 Key: HIVE-8298
 URL: https://issues.apache.org/jira/browse/HIVE-8298
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.15.0

 Attachments: HIVE-8298.patch


 select *  from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join 
 srcpart c on a.hr = c.hr and a.key = c.key;
 is minimal query which reproduces it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6905) Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns

2014-09-30 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153321#comment-14153321
 ] 

Damien Carol commented on HIVE-6905:


[~grw] Wait few days, I will create a new JIRA with Sequence Generator after 
0.14 release. My implementation is less intrusive than my previous comment.
Anyway, this feature must wait 0.14 release because few bugs in ACID code still 
there.


 Implement  Auto increment, primary-foreign Key, not null constraints and 
 default value in Hive Table columns
 

 Key: HIVE-6905
 URL: https://issues.apache.org/jira/browse/HIVE-6905
 Project: Hive
  Issue Type: New Feature
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Pardeep Kumar

 For Hive to replace a modern datawarehouse based on RDBMS, it must have 
 support for keys, constraints, auto-increment values, surrogate keys and not 
 null features etc. Many customers do not move their EDW to Hive due to these 
 reasons as these have been challenging to maintain in Hive.
 This must be implemented once https://issues.apache.org/jira/browse/HIVE-5317 
 for Updates, Deletes and Inserts are done in Hive. This should be next stop 
 for Hive enhancement to take it closer to a very wide mainstream adoption..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-09-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153326#comment-14153326
 ] 

Brock Noland commented on HIVE-7689:


[~damien.carol] do you mean without the double quotes, the tables are created 
as lowercase and thus do not work?

 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 Current 0.14 patch create table with lower case names.
 This patch fix wrong lower case tables names in Postgres Metastore back end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4224) Upgrade to Thrift 1.0 when available

2014-09-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153329#comment-14153329
 ] 

Brock Noland commented on HIVE-4224:


[~nemon] that'd be great. Can you create a separate JIRA to do that upgrade?

 Upgrade to Thrift 1.0 when available
 

 Key: HIVE-4224
 URL: https://issues.apache.org/jira/browse/HIVE-4224
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, Metastore, Server Infrastructure
Affects Versions: 0.11.0
Reporter: Brock Noland
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.

2014-09-30 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153344#comment-14153344
 ] 

Vikram Dixit K commented on HIVE-8270:
--

+1 for 0.14.

 JDBC uber jar is missing some classes required in secure setup.
 ---

 Key: HIVE-8270
 URL: https://issues.apache.org/jira/browse/HIVE-8270
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC14
 Fix For: 0.15.0

 Attachments: HIVE-8270.1.patch


 JDBC uber jar is missing some required classes for a secure setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins

2014-09-30 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153345#comment-14153345
 ] 

Vikram Dixit K commented on HIVE-8298:
--

+1 for 0.14.

 Incorrect results for n-way join when join expressions are not in same order 
 across joins
 -

 Key: HIVE-8298
 URL: https://issues.apache.org/jira/browse/HIVE-8298
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.15.0

 Attachments: HIVE-8298.patch


 select *  from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join 
 srcpart c on a.hr = c.hr and a.key = c.key;
 is minimal query which reproduces it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153346#comment-14153346
 ] 

Hive QA commented on HIVE-8290:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671955/HIVE-8290.2.patch

{color:green}SUCCESS:{color} +1 6380 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1054/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1054/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1054/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671955

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8270:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 JDBC uber jar is missing some classes required in secure setup.
 ---

 Key: HIVE-8270
 URL: https://issues.apache.org/jira/browse/HIVE-8270
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8270.1.patch


 JDBC uber jar is missing some required classes for a secure setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8298:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 Incorrect results for n-way join when join expressions are not in same order 
 across joins
 -

 Key: HIVE-8298
 URL: https://issues.apache.org/jira/browse/HIVE-8298
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8298.patch


 select *  from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join 
 srcpart c on a.hr = c.hr and a.key = c.key;
 is minimal query which reproduces it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.

2014-09-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153358#comment-14153358
 ] 

Ashutosh Chauhan commented on HIVE-8270:


Committed to 0.14

 JDBC uber jar is missing some classes required in secure setup.
 ---

 Key: HIVE-8270
 URL: https://issues.apache.org/jira/browse/HIVE-8270
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8270.1.patch


 JDBC uber jar is missing some required classes for a secure setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins

2014-09-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153354#comment-14153354
 ] 

Ashutosh Chauhan commented on HIVE-8298:


Committed to 0.14

 Incorrect results for n-way join when join expressions are not in same order 
 across joins
 -

 Key: HIVE-8298
 URL: https://issues.apache.org/jira/browse/HIVE-8298
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8298.patch


 select *  from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join 
 srcpart c on a.hr = c.hr and a.key = c.key;
 is minimal query which reproduces it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]

2014-09-30 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-8180:
---
Attachment: HIVE-8180.3-spark.patch

Removed trailing spaces.

 Update SparkReduceRecordHandler for processing the vectors [spark branch]
 -

 Key: HIVE-8180
 URL: https://issues.apache.org/jira/browse/HIVE-8180
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
 HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch


 Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]

2014-09-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153437#comment-14153437
 ] 

Xuefu Zhang commented on HIVE-8180:
---

+1

 Update SparkReduceRecordHandler for processing the vectors [spark branch]
 -

 Key: HIVE-8180
 URL: https://issues.apache.org/jira/browse/HIVE-8180
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
 HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch


 Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective

2014-09-30 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153446#comment-14153446
 ] 

Harish Butani commented on HIVE-8263:
-

Failure in 'groupby_bigdata' is not related to this patch.

 CBO : TPC-DS Q64 is item is joined last with store_sales while it should be 
 first as it is the most selective
 -

 Key: HIVE-8263
 URL: https://issues.apache.org/jira/browse/HIVE-8263
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip


 Plan for TPC-DS Q64 shows that item is joined last with store_sales while 
 store_sales x item is the most selective join in the plan.
 Interestingly predicate push down is applied on item but item comes so late 
 in the join which most likely means that calculation of the join selectivity 
 gave too high of a number of it was never considered.
 This is a subset of the logical plan showing that item was joined very last
 {code}
 HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], 
 _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], 
 _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], 
 _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], 
 _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount 
 = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 
 990
 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost 
 = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], 
 _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], 
 _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], 
 _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3571
 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], 
 joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3566
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], 
 _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 890
 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, 
 cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888
   HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], 
 agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 886
 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], 
 $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], 
 $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], 
 $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 
 0.0 cpu, 0.0 io}, id = 884
   HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], 
 ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], 
 ss_store_sk=[$23], ss_promo_sk=[$24], ss_ticket_number=[$25], 
 ss_wholesale_cost=[$26], ss_list_price=[$27], ss_coupon_amt=[$28], 
 sr_item_sk=[$29], sr_ticket_number=[$30], c_customer_sk=[$31], 
 c_current_cdemo_sk=[$32], c_current_hdemo_sk=[$33], c_current_addr_sk=[$34], 
 c_first_shipto_date_sk=[$35], c_first_sales_date_sk=[$36], d_date_sk=[$37], 
 d_year=[$38], d_date_sk0=[$39], d_year0=[$40], d_date_sk1=[$41], 
 d_year1=[$42], s_store_sk=[$43], s_store_name=[$44], s_zip=[$45], 
 cd_demo_sk=[$46], cd_marital_status=[$47], cd_demo_sk0=[$48], 
 cd_marital_status0=[$49], p_promo_sk=[$0], hd_demo_sk=[$15], 
 hd_income_band_sk=[$16], hd_demo_sk0=[$13], hd_income_band_sk0=[$14], 
 ca_address_sk=[$6], ca_street_number=[$7], ca_street_name=[$8], ca_city=[$9], 
 ca_zip=[$10], ca_address_sk0=[$1], ca_street_number0=[$2], 
 ca_street_name0=[$3], ca_city0=[$4], ca_zip0=[$5], ib_income_band_sk=[$12], 
 ib_income_band_sk0=[$11], i_item_sk=[$51], i_current_price=[$52], 
 i_color=[$53], i_product_name=[$54], _o__col0=[$50]): 

[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8250:
---
Status: Open  (was: Patch Available)

 Truncating table doesnt invalidate stats
 

 Key: HIVE-8250
 URL: https://issues.apache.org/jira/browse/HIVE-8250
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.1, 0.13.0
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8250.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8250:
---
Attachment: HIVE-8250.1.patch

Updated .q.out for failed test.

 Truncating table doesnt invalidate stats
 

 Key: HIVE-8250
 URL: https://issues.apache.org/jira/browse/HIVE-8250
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0, 0.13.1
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8250.1.patch, HIVE-8250.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8250:
---
Status: Patch Available  (was: Open)

 Truncating table doesnt invalidate stats
 

 Key: HIVE-8250
 URL: https://issues.apache.org/jira/browse/HIVE-8250
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.1, 0.13.0
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8250.1.patch, HIVE-8250.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26178: Truncating table doesnt invalidate stats

2014-09-30 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26178/
---

Review request for hive and Prasanth_J.


Bugs: HIVE-8250
https://issues.apache.org/jira/browse/HIVE-8250


Repository: hive-git


Description
---

Truncating table doesnt invalidate stats


Diffs
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
c95473c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java dc00d66 
  
ql/src/test/results/clientpositive/alter_numbuckets_partitioned_table_h23.q.out 
5047b23 

Diff: https://reviews.apache.org/r/26178/diff/


Testing
---


Thanks,

Ashutosh Chauhan



[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-30 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153450#comment-14153450
 ] 

Eugene Koifman commented on HIVE-8290:
--

There is unused import hive_metastoreConstants.  Also, could you add a 
comment on ACID_TABLE_PROPERTY, basically the equivalent of the the Description 
of this Jira ticket?  

This is minor, but would it make sense to move the constant to AcidInputFormat 
or some other more directly ACID related class?

Otherwise, LGTM +1.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq

2014-09-30 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8261:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 CBO : Predicate pushdown is removed by Optiq 
 -

 Key: HIVE-8261
 URL: https://issues.apache.org/jira/browse/HIVE-8261
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0, 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8261.1.patch


 Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I 
 realized that predicate pushdown is not applied on date_dim d1.
 Interestingly before optiq we have the predicate pushed :
 {code}
 HiveFilterRel(condition=[=($5, $1)])
 HiveJoinRel(condition=[=($3, $6)], joinType=[inner])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], 
 _o__col3=[$1])
 HiveFilterRel(condition=[=($0, 2000)])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1])
 HiveFilterRel(condition=[=($0, +(2000, 1))])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
 {code}
 While after Optiq the filter on date_dim gets pulled up the plan 
 {code}
   HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = 
 {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895
 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount = 
 

[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153456#comment-14153456
 ] 

Hive QA commented on HIVE-8151:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671975/HIVE-8151.7.patch

{color:green}SUCCESS:{color} +1 6374 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1055/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1055/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1055/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671975

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8182:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you for the contribution Sergo! I have committed this to trunk!

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6148) Support arbitrary structs stored in HBase

2014-09-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6148:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you very much Swarnim! I have committed this to trunk!

 Support arbitrary structs stored in HBase
 -

 Key: HIVE-6148
 URL: https://issues.apache.org/jira/browse/HIVE-6148
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Fix For: 0.14.0

 Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, 
 HIVE-6148.3.patch.txt


 We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8182:
---
Fix Version/s: (was: 0.14.0)
   0.15.0

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.15.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6148) Support arbitrary structs stored in HBase

2014-09-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6148:
---
Fix Version/s: (was: 0.14.0)
   0.15.0

 Support arbitrary structs stored in HBase
 -

 Key: HIVE-6148
 URL: https://issues.apache.org/jira/browse/HIVE-6148
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Fix For: 0.15.0

 Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, 
 HIVE-6148.3.patch.txt


 We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8262:
---
Attachment: HIVE-8262.1-spark.patch

 Create CacheTran that transforms the input RDD by caching it [Spark Branch]
 ---

 Key: HIVE-8262
 URL: https://issues.apache.org/jira/browse/HIVE-8262
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-8262.1-spark.patch


 In a few cases we need to cache a RDD to avoid recompute it for better 
 performance. However, caching a map input RDD is different from caching a 
 regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
 input to MapWork, is to cache, the result RDD that is transformed from the 
 original Hadoop RDD by applying a map function, in which key, value pairs 
 are copied. To cache intermediate RDDs, such as that from a shuffle, is just 
 calling .cache().
 This task is to create a CacheTran to capture this, which can be used to plug 
 in Spark Plan when caching is desirable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8262:
---
Status: Patch Available  (was: Open)

 Create CacheTran that transforms the input RDD by caching it [Spark Branch]
 ---

 Key: HIVE-8262
 URL: https://issues.apache.org/jira/browse/HIVE-8262
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-8262.1-spark.patch


 In a few cases we need to cache a RDD to avoid recompute it for better 
 performance. However, caching a map input RDD is different from caching a 
 regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
 input to MapWork, is to cache, the result RDD that is transformed from the 
 original Hadoop RDD by applying a map function, in which key, value pairs 
 are copied. To cache intermediate RDDs, such as that from a shuffle, is just 
 calling .cache().
 This task is to create a CacheTran to capture this, which can be used to plug 
 in Spark Plan when caching is desirable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153473#comment-14153473
 ] 

Alan Gates commented on HIVE-8290:
--

bq. This is minor, but would it make sense to move the constant to 
AcidInputFormat or some other more directly ACID related class?
I didn't see a general place to put table parameter keys.  According to the 
Hive jedi master (Ashutosh), there is no central place for them.  I agree it 
makes sense to collect ACID related ones into one place.  In addition to 
ACID_TABLE_PROPERTY there's NO_AUTO_COMPACT in Initiator.  I'll file a separate 
ticket to collect those together, and then the patch to do that will be trivial.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8308) Acid related table properties should be defined in one place

2014-09-30 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8308:


 Summary: Acid related table properties should be defined in one 
place
 Key: HIVE-8308
 URL: https://issues.apache.org/jira/browse/HIVE-8308
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor


Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT 
are defined in the classes that use them.  Since these are both potential table 
properties and they both are ACID related it makes sense to collect them 
together.  There's no central place for Table properties at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-30 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153477#comment-14153477
 ] 

Prasanth J commented on HIVE-8196:
--

The last test failures are unrelated.

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date_sk is not null and 

Review Request 26181: HIVE-8262 - Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26181/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8262
https://issues.apache.org/jira/browse/HIVE-8262


Repository: hive-git


Description
---

In a few cases we need to cache a RDD to avoid recompute it for better 
performance. However, caching a map input RDD is different from caching a 
regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
input to MapWork, is to cache, the result RDD that is transformed from the 
original Hadoop RDD by applying a map function, in which key, value pairs are 
copied. To cache intermediate RDDs, such as that from a shuffle, is just 
calling .cache().
This task is to create a CacheTran to capture this, which can be used to plug 
in Spark Plan when caching is desirable. 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CachedTran.java PRE-CREATION 

Diff: https://reviews.apache.org/r/26181/diff/


Testing
---


Thanks,

Chao Sun



[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance

2014-09-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8196:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch 0.14.

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date_sk is not 

[jira] [Updated] (HIVE-7939) Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch]

2014-09-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-7939:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

No longer needed since {{GraphTran}} is removed.

 Refactoring GraphTran to make it conform to SparkTran interface. [Spark 
 Branch]
 ---

 Key: HIVE-7939
 URL: https://issues.apache.org/jira/browse/HIVE-7939
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-7939.1-spark.patch


 Currently, {{GraphTran}} uses its own {{execute}} method, which executes the 
 operator plan in a DFS fashion, and does something special for union. The 
 goal for this JIRA is to do some refactoring and make it conform to the 
 {{SparkTran}} interface.
 The initial idea is to use varargs for {{SparkTran::transform}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]

2014-09-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao resolved HIVE-7525.

Resolution: Fixed

 Research to find out if it's possible to submit Spark jobs concurrently using 
 shared SparkContext [Spark Branch]
 

 Key: HIVE-7525
 URL: https://issues.apache.org/jira/browse/HIVE-7525
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao

 Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit 
 multiple spark jobs concurrently using a shared SparkContext. SparkClient's 
 code can be manipulated for this test. Here is the process:
 1. Transform rdd1 to rdd2 using some transformation.
 2. call rdd2.cache() to persist it in memory.
 3. in two threads, calling accordingly:
 Thread a. rdd2 - rdd3; rdd3.foreach()
 Thread b. rdd2 - rdd4; rdd4.foreach()
 It would be nice to find out monitoring and error reporting aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8276) Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]

2014-09-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-8276:
--

Assignee: Chao

 Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]
 -

 Key: HIVE-8276
 URL: https://issues.apache.org/jira/browse/HIVE-8276
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao

 Currently ShuffleTran captures both shuffle and reduce side processing. Per 
 HIVE-8118, sometimes the output RDD from shuffle needs to be cached for 
 better performance. Thus, it makes sense to separate shuffle from Reduce and 
 create ShuffleTran class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out

2014-09-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153513#comment-14153513
 ] 

Gunther Hagleitner commented on HIVE-7857:
--

+1. [~vikram.dixit] hive-14?

 Hive query fails after Tez session times out
 

 Key: HIVE-7857
 URL: https://issues.apache.org/jira/browse/HIVE-7857
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch


 Originally reported by [~deepesh]
 Steps to reproduce:
 Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar 
 in the path.
 Keep it idle for more than 5 minutes (this is the default tez session 
 timeout). Essentially Tez session should time out.
 Run a Hive on Tez query, the query fails. Here is a sample CLI session:
 {noformat}
 hive select from_unixtime(unix_timestamp(), dd-MMM-) from 
 vectortab10korc limit 1;
 Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22
 Total jobs = 1
 Launching Job 1 out of 1
 Tez session was closed. Reopening...
 Session re-established.
 Status: Running (application id: application_1403688364015_1930)
 Map 1: -/-
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, 
 diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, 
 diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 
 Info:Container container_1403688364015_1930_01_02 COMPLETED with 
 diagnostics set to [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container 
 container_1403688364015_1930_01_03 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container 
 container_1403688364015_1930_01_04 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container 
 container_1403688364015_1930_01_05 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ]], Vertex failed as one or more tasks failed. failedTasks:1]
 DAG failed due to vertex failure. failedVertices:1 killedVertices:0
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out

2014-09-30 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153518#comment-14153518
 ] 

Vikram Dixit K commented on HIVE-7857:
--

Yes. Will be required in 0.14 as well.

 Hive query fails after Tez session times out
 

 Key: HIVE-7857
 URL: https://issues.apache.org/jira/browse/HIVE-7857
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch


 Originally reported by [~deepesh]
 Steps to reproduce:
 Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar 
 in the path.
 Keep it idle for more than 5 minutes (this is the default tez session 
 timeout). Essentially Tez session should time out.
 Run a Hive on Tez query, the query fails. Here is a sample CLI session:
 {noformat}
 hive select from_unixtime(unix_timestamp(), dd-MMM-) from 
 vectortab10korc limit 1;
 Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22
 Total jobs = 1
 Launching Job 1 out of 1
 Tez session was closed. Reopening...
 Session re-established.
 Status: Running (application id: application_1403688364015_1930)
 Map 1: -/-
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, 
 diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, 
 diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 
 Info:Container container_1403688364015_1930_01_02 COMPLETED with 
 diagnostics set to [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container 
 container_1403688364015_1930_01_03 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container 
 container_1403688364015_1930_01_04 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container 
 container_1403688364015_1930_01_05 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ]], Vertex failed as one or more tasks failed. failedTasks:1]
 DAG failed due to vertex failure. failedVertices:1 killedVertices:0
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8151:
-
Attachment: HIVE-8151.8.patch

Rebase patch after HIVE-8196 commit.

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, 
 HIVE-8151.8.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7857) Hive query fails after Tez session times out

2014-09-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7857:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to both 0.14 and trunk.

 Hive query fails after Tez session times out
 

 Key: HIVE-7857
 URL: https://issues.apache.org/jira/browse/HIVE-7857
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch


 Originally reported by [~deepesh]
 Steps to reproduce:
 Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar 
 in the path.
 Keep it idle for more than 5 minutes (this is the default tez session 
 timeout). Essentially Tez session should time out.
 Run a Hive on Tez query, the query fails. Here is a sample CLI session:
 {noformat}
 hive select from_unixtime(unix_timestamp(), dd-MMM-) from 
 vectortab10korc limit 1;
 Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22
 Total jobs = 1
 Launching Job 1 out of 1
 Tez session was closed. Reopening...
 Session re-established.
 Status: Running (application id: application_1403688364015_1930)
 Map 1: -/-
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, 
 diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, 
 diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 
 Info:Container container_1403688364015_1930_01_02 COMPLETED with 
 diagnostics set to [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container 
 container_1403688364015_1930_01_03 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container 
 container_1403688364015_1930_01_04 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container 
 container_1403688364015_1930_01_05 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ]], Vertex failed as one or more tasks failed. failedTasks:1]
 DAG failed due to vertex failure. failedVertices:1 killedVertices:0
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153534#comment-14153534
 ] 

Hive QA commented on HIVE-8180:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12672070/HIVE-8180.3-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6511 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/182/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/182/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-182/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12672070

 Update SparkReduceRecordHandler for processing the vectors [spark branch]
 -

 Key: HIVE-8180
 URL: https://issues.apache.org/jira/browse/HIVE-8180
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
 HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch


 Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific

2014-09-30 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-8309:


 Summary: CBO: Fix OB by removing constraining DT, Use external 
names for col Aliases, Remove unnecessary Selects, Make DT Name counter query 
specific
 Key: HIVE-8309
 URL: https://issues.apache.org/jira/browse/HIVE-8309
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153537#comment-14153537
 ] 

Xuefu Zhang commented on HIVE-8262:
---

Let's put this one on hold until we find out if it's simpler just to put a 
caching flag in other SparkTran subclasses. 

 Create CacheTran that transforms the input RDD by caching it [Spark Branch]
 ---

 Key: HIVE-8262
 URL: https://issues.apache.org/jira/browse/HIVE-8262
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-8262.1-spark.patch


 In a few cases we need to cache a RDD to avoid recompute it for better 
 performance. However, caching a map input RDD is different from caching a 
 regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
 input to MapWork, is to cache, the result RDD that is transformed from the 
 original Hadoop RDD by applying a map function, in which key, value pairs 
 are copied. To cache intermediate RDDs, such as that from a shuffle, is just 
 calling .cache().
 This task is to create a CacheTran to capture this, which can be used to plug 
 in Spark Plan when caching is desirable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8310) RetryingHMSHandler is not used when kerberos auth enabled

2014-09-30 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-8310:
---

 Summary: RetryingHMSHandler is not used when kerberos auth enabled
 Key: HIVE-8310
 URL: https://issues.apache.org/jira/browse/HIVE-8310
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Blocker
 Fix For: 0.14.0


RetryingHMSHandler is not being used when kerberos auth enabled, after changes 
in HIVE-3255 . The changes in HIVE-4996 also removed the lower level retrying 
layer - RetryingRawStore. This means that in kerberos mode, retries are not 
done for database query failures.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8310) RetryingHMSHandler is not used when kerberos auth enabled

2014-09-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153546#comment-14153546
 ] 

Thejas M Nair commented on HIVE-8310:
-

[~vikram.dixit] This will be very useful fix for hive 0.14, it will make 
metastore more resilient to database failures. It is a regression.


 RetryingHMSHandler is not used when kerberos auth enabled
 -

 Key: HIVE-8310
 URL: https://issues.apache.org/jira/browse/HIVE-8310
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Blocker
 Fix For: 0.14.0


 RetryingHMSHandler is not being used when kerberos auth enabled, after 
 changes in HIVE-3255 . The changes in HIVE-4996 also removed the lower level 
 retrying layer - RetryingRawStore. This means that in kerberos mode, retries 
 are not done for database query failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective

2014-09-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153548#comment-14153548
 ] 

Ashutosh Chauhan commented on HIVE-8263:


+1 
[~vikram.dixit] It will be good to have this in 0.14 as well.

 CBO : TPC-DS Q64 is item is joined last with store_sales while it should be 
 first as it is the most selective
 -

 Key: HIVE-8263
 URL: https://issues.apache.org/jira/browse/HIVE-8263
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip


 Plan for TPC-DS Q64 shows that item is joined last with store_sales while 
 store_sales x item is the most selective join in the plan.
 Interestingly predicate push down is applied on item but item comes so late 
 in the join which most likely means that calculation of the join selectivity 
 gave too high of a number of it was never considered.
 This is a subset of the logical plan showing that item was joined very last
 {code}
 HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], 
 _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], 
 _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], 
 _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], 
 _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount 
 = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 
 990
 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost 
 = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], 
 _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], 
 _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], 
 _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3571
 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], 
 joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 3566
   HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], 
 _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], 
 _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], 
 _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 890
 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, 
 cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888
   HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], 
 agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 
 rows, 0.0 cpu, 0.0 io}, id = 886
 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], 
 $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], 
 $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], 
 $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 
 0.0 cpu, 0.0 io}, id = 884
   HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], 
 ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], 
 ss_store_sk=[$23], ss_promo_sk=[$24], ss_ticket_number=[$25], 
 ss_wholesale_cost=[$26], ss_list_price=[$27], ss_coupon_amt=[$28], 
 sr_item_sk=[$29], sr_ticket_number=[$30], c_customer_sk=[$31], 
 c_current_cdemo_sk=[$32], c_current_hdemo_sk=[$33], c_current_addr_sk=[$34], 
 c_first_shipto_date_sk=[$35], c_first_sales_date_sk=[$36], d_date_sk=[$37], 
 d_year=[$38], d_date_sk0=[$39], d_year0=[$40], d_date_sk1=[$41], 
 d_year1=[$42], s_store_sk=[$43], s_store_name=[$44], s_zip=[$45], 
 cd_demo_sk=[$46], cd_marital_status=[$47], cd_demo_sk0=[$48], 
 cd_marital_status0=[$49], p_promo_sk=[$0], hd_demo_sk=[$15], 
 hd_income_band_sk=[$16], hd_demo_sk0=[$13], hd_income_band_sk0=[$14], 
 ca_address_sk=[$6], ca_street_number=[$7], ca_street_name=[$8], ca_city=[$9], 
 ca_zip=[$10], ca_address_sk0=[$1], ca_street_number0=[$2], 
 ca_street_name0=[$3], ca_city0=[$4], ca_zip0=[$5], ib_income_band_sk=[$12], 
 ib_income_band_sk0=[$11], i_item_sk=[$51], i_current_price=[$52], 
 i_color=[$53], i_product_name=[$54], 

[jira] [Created] (HIVE-8311) Driver is encoding transaction information too late

2014-09-30 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8311:


 Summary: Driver is encoding transaction information too late
 Key: HIVE-8311
 URL: https://issues.apache.org/jira/browse/HIVE-8311
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


Currently Driver is obtaining the transaction information and encoding it in 
the conf in runInternal.  But this is too late, as the query has already been 
planned.  Either we need to change the plan when this info is obtained or we 
need to obtain it at compile time.  This bug was introduced by HIVE-8203.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8311) Driver is encoding transaction information too late

2014-09-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153560#comment-14153560
 ] 

Alan Gates commented on HIVE-8311:
--

[~vikram.dixit] I'd like to get this into 0.14, as it produces wrong results.  
I should have a patch in a few hours.

 Driver is encoding transaction information too late
 ---

 Key: HIVE-8311
 URL: https://issues.apache.org/jira/browse/HIVE-8311
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


 Currently Driver is obtaining the transaction information and encoding it in 
 the conf in runInternal.  But this is too late, as the query has already been 
 planned.  Either we need to change the plan when this info is obtained or we 
 need to obtain it at compile time.  This bug was introduced by HIVE-8203.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8265) Build failure on hadoop-1

2014-09-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153565#comment-14153565
 ] 

Szehon Ho commented on HIVE-8265:
-

[~vikram.dixit] I would like to get this fixed for 0.14.  Can you help take a 
look at this patch, if you have the cycle?  Thanks.

 Build failure on hadoop-1 
 --

 Key: HIVE-8265
 URL: https://issues.apache.org/jira/browse/HIVE-8265
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Navis
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-8265.1.patch.txt, HIVE-8265.2.patch


 no pre-commit-tests
 Fails from CustomPartitionVertex and TestHive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26178: Truncating table doesnt invalidate stats

2014-09-30 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26178/#review54998
---

Ship it!


Ship It!

- Prasanth_J


On Sept. 30, 2014, 5:47 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26178/
 ---
 
 (Updated Sept. 30, 2014, 5:47 p.m.)
 
 
 Review request for hive and Prasanth_J.
 
 
 Bugs: HIVE-8250
 https://issues.apache.org/jira/browse/HIVE-8250
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Truncating table doesnt invalidate stats
 
 
 Diffs
 -
 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 c95473c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java dc00d66 
   
 ql/src/test/results/clientpositive/alter_numbuckets_partitioned_table_h23.q.out
  5047b23 
 
 Diff: https://reviews.apache.org/r/26178/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ashutosh Chauhan
 




[jira] [Resolved] (HIVE-7293) Hive-trunk does not build against JDK8 with generic class checks

2014-09-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-7293.
---
Resolution: Not a Problem

Builds are succeeding on JDK8.

 Hive-trunk does not build against JDK8 with generic class checks
 

 Key: HIVE-7293
 URL: https://issues.apache.org/jira/browse/HIVE-7293
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: java version 1.8.0
 Java(TM) SE Runtime Environment (build 1.8.0-b132)
 Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: Vectorization

 The current build and tests on my laptop are failing due to generic argument 
 mismatch errors.
 {code}
 hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPGreaterThan.java:[46,82]
  incompatible types
 found   : 
 java.lang.Classorg.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleScalarGreaterDoubleColumn
 required: java.lang.Class? extends 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8296) Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows

2014-09-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153580#comment-14153580
 ] 

Gopal V commented on HIVE-8296:
---

LGTM - +1.

[~vikram.dixit]: this is necessary for 0.14 over the HIVE-8156 fix.

 Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for 
 adding rows
 

 Key: HIVE-8296
 URL: https://issues.apache.org/jira/browse/HIVE-8296
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8296.01.patch, HIVE-8296.02.patch


 We reuse the keys for the vectorized row batch and need to use a separate 
 buffer (for strings) for reuse the batch for new values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153616#comment-14153616
 ] 

Hive QA commented on HIVE-8285:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671999/HIVE-8285.patch

{color:green}SUCCESS:{color} +1 6373 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1056/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1056/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1056/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671999

 Reference equality is used on boolean values in 
 PartitionPruner#removeTruePredciates()
 --

 Key: HIVE-8285
 URL: https://issues.apache.org/jira/browse/HIVE-8285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8285.patch


 {code}
   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
eC.getValue() == Boolean.TRUE) {
 {code}
 equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153622#comment-14153622
 ] 

Hive QA commented on HIVE-8262:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12672078/HIVE-8262.1-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6509 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/183/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/183/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-183/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12672078

 Create CacheTran that transforms the input RDD by caching it [Spark Branch]
 ---

 Key: HIVE-8262
 URL: https://issues.apache.org/jira/browse/HIVE-8262
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-8262.1-spark.patch


 In a few cases we need to cache a RDD to avoid recompute it for better 
 performance. However, caching a map input RDD is different from caching a 
 regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
 input to MapWork, is to cache, the result RDD that is transformed from the 
 original Hadoop RDD by applying a map function, in which key, value pairs 
 are copied. To cache intermediate RDDs, such as that from a shuffle, is just 
 calling .cache().
 This task is to create a CacheTran to capture this, which can be used to plug 
 in Spark Plan when caching is desirable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153627#comment-14153627
 ] 

Alan Gates commented on HIVE-8231:
--

I definitely think we are seeing separate issues.  I have a filed a new issue 
HIVE-8311 for what I am seeing.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02 seconds)
 {noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific

2014-09-30 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8309:
-
Attachment: HIVE-8309.patch

 CBO: Fix OB by removing constraining DT, Use external names for col Aliases, 
 Remove unnecessary Selects, Make DT Name counter query specific
 

 Key: HIVE-8309
 URL: https://issues.apache.org/jira/browse/HIVE-8309
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-8309.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8306) Map join sizing done by auto.convert.join.noconditionaltask.size doesn't take into account Hash table overhead and results in OOM

2014-09-30 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8306:
--
Priority: Minor  (was: Critical)

 Map join sizing done by auto.convert.join.noconditionaltask.size doesn't take 
 into account Hash table overhead and results in OOM
 -

 Key: HIVE-8306
 URL: https://issues.apache.org/jira/browse/HIVE-8306
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.14.0

 Attachments: query64_oom_trim.txt


 When hive.auto.convert.join.noconditionaltask = true we check 
 noconditionaltask.size and if the sum of tables sizes in the map join is less 
 than  noconditionaltask.size the plan would generate a Map join, the issue 
 with this is that the calculation doesn't take into account the overhead 
 introduced by different HashTable implementation as results if the sum of 
 input sizes is smaller than the noconditionaltask size by a small margin 
 queries will hit OOM.
 TPC-DS query 64 is a good example for this issue as one as non conditional 
 task size is set to 1,280,000,000 while sum of input is 1,012,379,321 which 
 is 20% smaller than the expected size.
 
 Vertex
 {code}
Map 28 - Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 14 
 (BROADCAST_EDGE), Map 15 (BROADCAST_EDGE), Map 16 (BROADCAST_EDGE), Map 24 
 (BROADCAST_EDGE), Map 26 (BROADCAST_EDGE), Map 30 (BROADCAST_EDGE), Map 31 
 (BROADCAST_EDGE), Map 32 (BROADCAST_EDGE), Map 39 (BROADCAST_EDGE), Map 40 
 (BROADCAST_EDGE), Map 43 (BROADCAST_EDGE), Map 45 (BROADCAST_EDGE), Map 5 
 (BROADCAST_EDGE)
 {code}
 Exception
 {code}
 , TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:169)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:206)
   at 
 org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:182)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeKey(MapJoinBytesTableContainer.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:200)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:267)
   at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:114)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:184)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:210)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1036)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:164)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
   

[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8151:
-
Status: Open  (was: Patch Available)

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, 
 HIVE-8151.8.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

2014-09-30 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153665#comment-14153665
 ] 

Prasanth J commented on HIVE-8151:
--

[~wzc1989] Thanks for providing test case. Looks like the there is some issue 
with casting before writing the file. I will put up a fix for it soon in the 
next version of this patch.

 Dynamic partition sort optimization inserts record wrongly to partition when 
 used with GroupBy
 --

 Key: HIVE-8151
 URL: https://issues.apache.org/jira/browse/HIVE-8151
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
 HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, 
 HIVE-8151.8.patch


 HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
 method to FileSink operator to look for changes in reduce key for creating 
 partition directories. This method however is not reliable as the key called 
 with startGroup() is different from the key called with processOp(). 
 startGroup() is called with newly changed key whereas processOp() is called 
 with previously aggregated key. This will result in processOp() writing the 
 last row of previous group as the first row of next group. This happens only 
 when used with group by operator.
 The fix is to not rely on startGroup() and do the partition directory 
 creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8231) Error when insert into empty table with ACID

2014-09-30 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-8231:
---
Attachment: a_beeline_insert.txt
a_hiveserver2_insert.txt
b_beeline_insert.txt
b_hiveserver2_insert.txt

Attached few files.

Use case :
1. drop table if exists foo7 (no log)
2. create table foo7 (id int) STORED AS ORC (no log)
3. insert into table foo7 VALUES(1) (log a_hiveserver2 and a_beeline)
4. select * from foo7  (log b_hiveserver2 and b_beeline)

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Critical
 Fix For: 0.14.0

 Attachments: a_beeline_insert.txt, a_hiveserver2_insert.txt, 
 b_beeline_insert.txt, b_hiveserver2_insert.txt


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 

[jira] [Updated] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific

2014-09-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8309:
---
Status: Patch Available  (was: Open)

 CBO: Fix OB by removing constraining DT, Use external names for col Aliases, 
 Remove unnecessary Selects, Make DT Name counter query specific
 

 Key: HIVE-8309
 URL: https://issues.apache.org/jira/browse/HIVE-8309
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-8309.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8312) Implicit type conversion on Join keys

2014-09-30 Thread Lin Liu (JIRA)
Lin Liu created HIVE-8312:
-

 Summary: Implicit type conversion on Join keys
 Key: HIVE-8312
 URL: https://issues.apache.org/jira/browse/HIVE-8312
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Lin Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8311) Driver is encoding transaction information too late

2014-09-30 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8311:
-
Status: Patch Available  (was: Open)

 Driver is encoding transaction information too late
 ---

 Key: HIVE-8311
 URL: https://issues.apache.org/jira/browse/HIVE-8311
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8311.patch


 Currently Driver is obtaining the transaction information and encoding it in 
 the conf in runInternal.  But this is too late, as the query has already been 
 planned.  Either we need to change the plan when this info is obtained or we 
 need to obtain it at compile time.  This bug was introduced by HIVE-8203.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8311) Driver is encoding transaction information too late

2014-09-30 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8311:
-
Attachment: HIVE-8311.patch

This patch moves the encoding of the transaction information from runInternal 
to compile.

 Driver is encoding transaction information too late
 ---

 Key: HIVE-8311
 URL: https://issues.apache.org/jira/browse/HIVE-8311
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8311.patch


 Currently Driver is obtaining the transaction information and encoding it in 
 the conf in runInternal.  But this is too late, as the query has already been 
 planned.  Either we need to change the plan when this info is obtained or we 
 need to obtain it at compile time.  This bug was introduced by HIVE-8203.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys

2014-09-30 Thread Lin Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HIVE-8312:
--
Description: 
Suppose we have a query as follows.

SELECT 
FROM A LEFT SEMI JOIN B
ON (A.col1 = B.col2)
WHERE ...

If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
Hive finds the common compatible type (here is DOUBLE) for both cols and do 
implicit type conversion.

However, this implicit conversion from STRING to DOUBLE could produce NULL 
values, which could further
generate unexpected results, like skew.

I just wonder: Is this case by design? If so, what is the logic? If not, how 
can we solve it?

 Implicit type conversion on Join keys
 -

 Key: HIVE-8312
 URL: https://issues.apache.org/jira/browse/HIVE-8312
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Lin Liu

 Suppose we have a query as follows.
 
 SELECT 
 FROM A LEFT SEMI JOIN B
 ON (A.col1 = B.col2)
 WHERE ...
 
 If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
 Hive finds the common compatible type (here is DOUBLE) for both cols and do 
 implicit type conversion.
 However, this implicit conversion from STRING to DOUBLE could produce NULL 
 values, which could further
 generate unexpected results, like skew.
 I just wonder: Is this case by design? If so, what is the logic? If not, how 
 can we solve it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >