date:20140821


[ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105092#comment-14105092
 ] 

Ashutosh Chauhan commented on HIVE-7812:


[~owen.omalley] Can you create RB request for this. Also, cc : [~gopalv] who 
has spent some time in this part of code while dealing with locality issues.

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24920: CBO path doesn't handle null expr in select list correctly.

2014-08-21 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24920/
---

Review request for hive and John Pullokkaran.


Repository: hive


Description
---

CBO path doesn't handle null expr in select list correctly.


Diffs
-

  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
 1619294 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/TypeConverter.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1619294 
  branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619294 
  branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619294 

Diff: https://reviews.apache.org/r/24920/diff/


Testing
---

added new test.


Thanks,

Ashutosh Chauhan

Hive on Tez Counters

2014-08-21 Thread Suma Shivaprasad

Hi,

Needed info on where I can get detailed job counters for Hive on Tez. Am
running this on a HDP cluster with Hive 0.13 and see only the following job
counters through Hive Tez in Yarn application logs which I got through(
yarn logs -applicationId ...) .

a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS
is the only counter present in MapOperator
b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate
c. What does COMMITTED_HEAP_BYTES indicate?
d. Is there any other place I should be checking the counters?

[[File System Counters
FILE: BYTES_READ=512,
FILE: BYTES_WRITTEN=3079881,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS:
BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS:
LARGE_READ_OPS=0, HDFS: WRITE_OPS=0]

[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543,
GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*,
PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248,
COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543,
OUTPUT_RECORDS=222543,
OUTPUT_BYTES=23543896,
OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILL_COUNT=0]


[*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter DESERIALIZE_ERRORS=0]]

Thanks
Suma

[jira] [Created] (HIVE-7827) [CBO] null expr in select list is not handled correctly

Ashutosh Chauhan created HIVE-7827:
--

 Summary: [CBO] null expr in select list is not handled correctly
 Key: HIVE-7827
 URL: https://issues.apache.org/jira/browse/HIVE-7827
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


select null from t1 fails



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24920: CBO path doesn't handle null expr in select list correctly.

2014-08-21 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24920/
---

(Updated Aug. 21, 2014, 6:28 a.m.)


Review request for hive and John Pullokkaran.


Bugs: HIVE-7827
https://issues.apache.org/jira/browse/HIVE-7827


Repository: hive


Description (updated)
---

CBO path doesn't handle null expr in select list correctly


Diffs
-

  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
 1619294 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/TypeConverter.java
 1619293 
  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1619294 
  branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619294 
  branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619294 

Diff: https://reviews.apache.org/r/24920/diff/


Testing
---

added new test.


Thanks,

Ashutosh Chauhan

[jira] [Updated] (HIVE-7827) [CBO] null expr in select list is not handled correctly


 [ 
https://issues.apache.org/jira/browse/HIVE-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7827:
---

Attachment: HIVE-7827.patch

 [CBO] null expr in select list is not handled correctly
 ---

 Key: HIVE-7827
 URL: https://issues.apache.org/jira/browse/HIVE-7827
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7827.patch


 select null from t1 fails



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7805) Support running multiple scans in hbase-handler


[ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105099#comment-14105099
 ] 

Hive QA commented on HIVE-7805:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663200/HIVE-7805.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 6099 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_multiscan_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/425/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/425/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-425/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663200

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4523) round() function with specified decimal places not consistent with mysql

2014-08-21 Thread Zhan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105103#comment-14105103
 ] 

Zhan Zhang commented on HIVE-4523:
--

I also met the same problem with the new round UDF in spark.
org.apache.hadoop.hive.ql.exec.UDFArgumentException: ROUND second argument only 
takes constant.
Because when spark init the udf, it does not know this udf needs to be taken 
special care. The round should follow the same contract of other UDF, which 
needs ObjectInspector, instead of ConstObjectInspector.

Can we file a jira to get this fixed?



 round() function with specified decimal places not consistent with mysql 
 -

 Key: HIVE-4523
 URL: https://issues.apache.org/jira/browse/HIVE-4523
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 0.7.1
Reporter: Fred Desing
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-4523.1.patch, HIVE-4523.2.patch, HIVE-4523.3.patch, 
 HIVE-4523.4.patch, HIVE-4523.5.patch, HIVE-4523.6.patch, HIVE-4523.7.patch, 
 HIVE-4523.8.patch, HIVE-4523.patch


 // hive
 hive select round(150.000, 2) from temp limit 1;
 150.0
 hive select round(150, 2) from temp limit 1;
 150.0
 // mysql
 mysql select round(150.000, 2) from DUAL limit 1;
 round(150.000, 2)
 150.00
 mysql select round(150, 2) from DUAL limit 1;
 round(150, 2)
 150
 http://dev.mysql.com/doc/refman/5.1/en/mathematical-functions.html#function_round



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-21 Thread Xiaomeng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: (was: HIVE-7730.002.patch)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-21 Thread Xiaomeng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.002.patch

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7736) improve the columns stats update speed for all the partitions of a table


[ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105127#comment-14105127
 ] 

Ashutosh Chauhan commented on HIVE-7736:


+1

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end

2014-08-21 Thread Damien Carol


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24602/
---

(Updated août 21, 2014, 8:40 matin)


Review request for hive.


Bugs: HIVE-7689
https://issues.apache.org/jira/browse/HIVE-7689


Repository: hive-git


Description (updated)
---

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable these features :
* LOCKS on postgres metastore
* COMPACTION on postgres metastore
* TRANSACTION on postgres metastore
* fix metastore update script for postgres


Diffs
-

  metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 
2ebd3b0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 524a7a4 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 
30cf814 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
063dee6 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java f74f683 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
f636cff 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
db62721 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 

Diff: https://reviews.apache.org/r/24602/diff/


Testing
---

Using patched version in production. Enable concurrency with DbTxnManager.


Thanks,

Damien Carol

[jira] [Commented] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats


[ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105245#comment-14105245
 ] 

Hive QA commented on HIVE-7420:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663310/HIVE-7420.5.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 6207 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[1]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[2]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[3]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[4]
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[5]
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/436/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/436/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-436/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663310

 Parameterize tests for HCatalog Pig interfaces for testing against all 
 storage formats
 --

 Key: HIVE-7420
 URL: https://issues.apache.org/jira/browse/HIVE-7420
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
 HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420-without-HIVE-7457.4.patch, 
 HIVE-7420-without-HIVE-7457.5.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, 
 HIVE-7420.3.patch, HIVE-7420.4.patch, HIVE-7420.5.patch


 Currently, HCatalog tests only test against RCFile with a few testing against 
 ORC. The tests should be covering other Hive storage formats as well.
 HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
 all Hive storage formats and with that patch, all test suites built on 
 HCatMapReduceTest are running and passing against Sequence File, Text, and 
 ORC in addition to RCFile.
 Similar changes should be made to make the tests for HCatLoader and 
 HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7821) StarterProject: enable groupby4.q


 [ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-7821:
--

Assignee: Chinna Rao Lalam

 StarterProject: enable groupby4.q
 -

 Key: HIVE-7821
 URL: https://issues.apache.org/jira/browse/HIVE-7821
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7820) union_null.q is not deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105289#comment-14105289
 ] 

Hive QA commented on HIVE-7820:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663327/HIVE-7820.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/437/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/437/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-437/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663327

 union_null.q is not deterministic 
 --

 Key: HIVE-7820
 URL: https://issues.apache.org/jira/browse/HIVE-7820
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch


 union_null.q selects 10 rows from a subquery which returns man rows. Since 
 the subquery does not have an order by the 10 results returned vary.
 This problem exists on trunk and spark. We'll fix on trunk and merge to spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

2014-08-21 Thread Lianhui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105301#comment-14105301
 ] 

Lianhui Wang commented on HIVE-7384:


i think current spark already support hash by join_col,sort by {join_col,tag}. 
because in spark map's shuffleWriter hash by Key.hashcode and sort by Key and 
in Hive HiveKey class already define the hashcode. so that can support hash by 
HiveKey.hashcode, sort by HiveKey's bytes

 Research into reduce-side join [Spark Branch]
 -

 Key: HIVE-7384
 URL: https://issues.apache.org/jira/browse/HIVE-7384
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
 Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, 
 sales_products.txt, sales_stores.txt


 Hive's join operator is very sophisticated, especially for reduce-side join. 
 While we expect that other types of join, such as map-side join and SMB 
 map-side join, will work out of the box with our design, there may be some 
 complication in reduce-side join, which extensively utilizes key tag and 
 shuffle behavior. Our design principle prefers to making Hive implementation 
 work out of box also, which might requires new functionality from Spark. The 
 tasks is to research into this area, identifying requirements for Spark 
 community and the work to be done on Hive to make reduce-side join work.
 A design doc might be needed for this. For more information, please refer to 
 the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()


[ 
https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105321#comment-14105321
 ] 

Hive QA commented on HIVE-7598:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663321/HIVE-7598.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/438/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/438/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-438/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663321

 Potential null pointer dereference in MergeTask#closeJob()
 --

 Key: HIVE-7598
 URL: https://issues.apache.org/jira/browse/HIVE-7598
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Attachments: HIVE-7598.patch


 Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
 conf.
 null gets passed to createEmptyBuckets() which dereferences conf directly:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]

2014-08-21 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7772:
-

Attachment: HIVE-7772-spark.patch

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]

2014-08-21 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7772:
-

Status: Patch Available  (was: Open)

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]

2014-08-21 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105327#comment-14105327
 ] 

Rui Li commented on HIVE-7772:
--

This patch adds some simple cases. Other cases require join or union to be 
ready.
I also found some errors in the output file for some cases (e.g. 
enforce_order.q):
{noformat}
[Error 30017]: Skipping stats aggregation by error 
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats 
aggregator of type counter cannot be connected to
{noformat}
I think this is related to HIVE-7761, so I left these cases out as well.

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7824) CLIServer.getOperationStatus eats ExceutionException


[ 
https://issues.apache.org/jira/browse/HIVE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105350#comment-14105350
 ] 

Hive QA commented on HIVE-7824:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663336/HIVE-7824.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/439/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/439/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-439/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663336

 CLIServer.getOperationStatus eats ExceutionException
 

 Key: HIVE-7824
 URL: https://issues.apache.org/jira/browse/HIVE-7824
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
 Attachments: HIVE-7824.patch, HIVE-7824.patch, HIVE-7824.patch


 ExceutionException has a cause member which could be anything including 
 serious errors and thus it should be logged. The other lines are escape 
 exceptions and can be logged at trace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105352#comment-14105352
 ] 

Hive QA commented on HIVE-7772:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663387/HIVE-7772-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5983 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/74/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/74/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-74/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663387

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7702:
---

Attachment: HIVE-7702-spark.patch

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105357#comment-14105357
 ] 

Chinna Rao Lalam commented on HIVE-7702:


Join related query files will handle in this jira HIVE-7816

filter_join_breaktask.q,\
filter_join_breaktask2.q


 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7702:
---

Status: Patch Available  (was: Open)

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7816) Enable join tests which Tez executes


 [ 
https://issues.apache.org/jira/browse/HIVE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7816:
---

Description: 
 
{noformat}
  auto_join0.q,\
  auto_join1.q,\
  cross_join.q,\
  cross_product_check_1.q,\
  cross_product_check_2.q,\
{noformat}

{noformat}
filter_join_breaktask.q,\
filter_join_breaktask2.q
{noformat}

  was:
 
{noformat}
  auto_join0.q,\
  auto_join1.q,\
  cross_join.q,\
  cross_product_check_1.q,\
  cross_product_check_2.q,\
{noformat}


 Enable join tests which Tez executes
 

 Key: HIVE-7816
 URL: https://issues.apache.org/jira/browse/HIVE-7816
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

  
 {noformat}
   auto_join0.q,\
   auto_join1.q,\
   cross_join.q,\
   cross_product_check_1.q,\
   cross_product_check_2.q,\
 {noformat}
 {noformat}
 filter_join_breaktask.q,\
 filter_join_breaktask2.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105379#comment-14105379
 ] 

Hive QA commented on HIVE-7702:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5984 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663390

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7823) HIVE-6185 removed Partition.getPartition


[ 
https://issues.apache.org/jira/browse/HIVE-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105400#comment-14105400
 ] 

Hive QA commented on HIVE-7823:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663332/HIVE-7823.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/440/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/440/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-440/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663332

 HIVE-6185 removed Partition.getPartition
 

 Key: HIVE-7823
 URL: https://issues.apache.org/jira/browse/HIVE-7823
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Attachments: HIVE-7823.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7824) CLIServer.getOperationStatus eats ExceutionException


 [ 
https://issues.apache.org/jira/browse/HIVE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7824:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for the review! I committed this trivial patch to trunk.

 CLIServer.getOperationStatus eats ExceutionException
 

 Key: HIVE-7824
 URL: https://issues.apache.org/jira/browse/HIVE-7824
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-7824.patch, HIVE-7824.patch, HIVE-7824.patch


 ExceutionException has a cause member which could be anything including 
 serious errors and thus it should be logged. The other lines are escape 
 exceptions and can be logged at trace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105447#comment-14105447
 ] 

Brock Noland commented on HIVE-7702:


Nice work [~chinnalalam]!! Looks like insert_into2 fails. Looking at the DIFF I 
see a bunch of odd characters at the bottom. Thank you!!

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7815) Reduce Side Join with single reducer [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105456#comment-14105456
 ] 

Brock Noland commented on HIVE-7815:


Nice [~szehon]! I think union_null failed since it's non-deterministic. I have 
a patch up on HIVE-7820 to fix this.

 Reduce Side Join with single reducer [Spark Branch]
 ---

 Key: HIVE-7815
 URL: https://issues.apache.org/jira/browse/HIVE-7815
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7815-spark.patch


 This is the first part of the reduce-side join work, see HIVE-7384 for 
 details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7828) Fix CLIDriver test parquet_join.q

Brock Noland created HIVE-7828:
--

 Summary: Fix CLIDriver test parquet_join.q
 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q


[ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105458#comment-14105458
 ] 

Brock Noland commented on HIVE-7821:


Hi [~chinnalalam],

Thank you for picking this up! I should have mentioned, I created this one for 
Suhas who has recently joined the project team. Would you mind if he takes this 
one?

 StarterProject: enable groupby4.q
 -

 Key: HIVE-7821
 URL: https://issues.apache.org/jira/browse/HIVE-7821
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105464#comment-14105464
 ] 

Brock Noland commented on HIVE-7772:


Nice work [~lirui]!! I think union_null failed since it's non-deterministic. I 
have a patch up on HIVE-7820 to fix this.

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7821) StarterProject: enable groupby4.q

2014-08-21 Thread Suhas Satish (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish reassigned HIVE-7821:
--

Assignee: Suhas Satish  (was: Chinna Rao Lalam)

 StarterProject: enable groupby4.q
 -

 Key: HIVE-7821
 URL: https://issues.apache.org/jira/browse/HIVE-7821
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Suhas Satish





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7820) union_null.q is not deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105472#comment-14105472
 ] 

Brock Noland commented on HIVE-7820:


Note this patch should be committed to trunk and merged to spark.

 union_null.q is not deterministic 
 --

 Key: HIVE-7820
 URL: https://issues.apache.org/jira/browse/HIVE-7820
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch


 union_null.q selects 10 rows from a subquery which returns man rows. Since 
 the subquery does not have an order by the 10 results returned vary.
 This problem exists on trunk and spark. We'll fix on trunk and merge to spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7829) Entity.getLocation can throw an NPE

Brock Noland created HIVE-7829:
--

 Summary: Entity.getLocation can throw an NPE
 Key: HIVE-7829
 URL: https://issues.apache.org/jira/browse/HIVE-7829
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-7892.patch

It's possible for the getDataLocation methods which Entity.getLocation calls to 
return null and as such NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7829:
---

Assignee: Brock Noland
  Status: Patch Available  (was: Open)

 Entity.getLocation can throw an NPE
 ---

 Key: HIVE-7829
 URL: https://issues.apache.org/jira/browse/HIVE-7829
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7892.patch


 It's possible for the getDataLocation methods which Entity.getLocation calls 
 to return null and as such NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7829:
---

Attachment: HIVE-7892.patch

 Entity.getLocation can throw an NPE
 ---

 Key: HIVE-7829
 URL: https://issues.apache.org/jira/browse/HIVE-7829
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-7892.patch


 It's possible for the getDataLocation methods which Entity.getLocation calls 
 to return null and as such NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24934: HIVE-7829 - Entity.getLocation can throw an NPE

2014-08-21 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24934/
---

Review request for hive and Szehon Ho.


Repository: hive-git


Description
---

Very simple change to return null if location cannot be obtained


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java aafeaab 

Diff: https://reviews.apache.org/r/24934/diff/


Testing
---


Thanks,

Brock Noland

[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe

2014-08-21 Thread Mohit Sabharwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105481#comment-14105481
 ] 

Mohit Sabharwal commented on HIVE-7735:
---

Updated patch after rebase (to address VirtualColumn conflict)

 Implement Char, Varchar in ParquetSerDe
 ---

 Key: HIVE-7735
 URL: https://issues.apache.org/jira/browse/HIVE-7735
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
  Labels: Parquet
 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, 
 HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch


 This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe.
 Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7735) Implement Char, Varchar in ParquetSerDe

2014-08-21 Thread Mohit Sabharwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-7735:
--

Attachment: HIVE-7735.3.patch

 Implement Char, Varchar in ParquetSerDe
 ---

 Key: HIVE-7735
 URL: https://issues.apache.org/jira/browse/HIVE-7735
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
  Labels: Parquet
 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, 
 HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch


 This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe.
 Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105511#comment-14105511
 ] 

Alan Gates commented on HIVE-7689:
--

I will review this but I'll need to test it against other backends (MySQL, 
Oracle).  It will be a week or so until I get to it.

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7828) Fix CLIDriver test parquet_join.q


[ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105512#comment-14105512
 ] 

Brock Noland commented on HIVE-7828:


Caused by the fact that HIVE-7513 was committed between the time  HIVE-7629 
.out file was generated and committed.

 Fix CLIDriver test parquet_join.q
 -

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105514#comment-14105514
 ] 

Sergio Peña commented on HIVE-7373:
---

[~leftylev] Here's the statement. 
Does it need more explanation or examples?

Prior to 0.14, Hive used to trim trailing zeros for decimal numbers. 
Currently, the trailing zeros are preserved up to what the scale allows 
(HIVE-7373).

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, 
 HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7828) Fix CLIDriver test parquet_join.q


 [ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-7828:
--

Assignee: Brock Noland

 Fix CLIDriver test parquet_join.q
 -

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland

 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7828) Fix CLIDriver test parquet_join.q


 [ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7828:
---

Attachment: HIVE-7828.patch

 Fix CLIDriver test parquet_join.q
 -

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7828) Fix CLIDriver test parquet_join.q


[ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105523#comment-14105523
 ] 

Brock Noland commented on HIVE-7828:


[~alangates] since you reviewed HIVE-7513 would you mind reviewing this one?

 Fix CLIDriver test parquet_join.q
 -

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7828:
---

Summary: TestCLIDriver.parquet_join.q is failing on trunk  (was: Fix 
CLIDriver test parquet_join.q)

 TestCLIDriver.parquet_join.q is failing on trunk
 

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used

2014-08-21 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105529#comment-14105529
 ] 

Owen O'Malley commented on HIVE-7812:
-

[~ashutoshc] RB posted as https://reviews.apache.org/r/24937/

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used

2014-08-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-7812:


Attachment: HIVE-7812.patch

Updated patch rebased against current trunk.

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-7812.patch, HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7807) Refer to umask property using FsPermission.UMASK_LABEL.


[ 
https://issues.apache.org/jira/browse/HIVE-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105539#comment-14105539
 ] 

Hive QA commented on HIVE-7807:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663210/HIVE-7807.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6099 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/442/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/442/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-442/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663210

 Refer to umask property using FsPermission.UMASK_LABEL.
 ---

 Key: HIVE-7807
 URL: https://issues.apache.org/jira/browse/HIVE-7807
 Project: Hive
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Attachments: HIVE-7807.1.patch


 Currently in 
 {{org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission}} umask 
 property is referred using {{fs.permissions.umask-mode}} which is only 
 available in Hadoop 2.x. Property dfs.umaskmode is used in 1.x for the same 
 purpose. Also dfs.umaskmode was not deprecated in 1.x according to 
 HADOOP-8727. This JIRA is to change umask property references to 
 {{FsPermission.UMASK_LABEL}} which always points to proper property in latest 
 Hadoop in each version (0.23.x, 1.x, 2.x)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7829) Entity.getLocation can throw an NPE


[ 
https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105548#comment-14105548
 ] 

Hive QA commented on HIVE-7829:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663413/HIVE-7892.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/443/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/443/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-443/

Messages:
{noformat}
 This message was trimmed, see log for full details 
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN LPAREN KW_NULL using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE StringLiteral using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:115:5: 
Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:127:5: 
Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:138:5: 
Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:149:5: 
Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:166:7: 
Decision can match input such as STAR using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 
6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_NULL using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_TRUE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_DATE StringLiteral using multiple 
alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_FALSE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_OVERWRITE using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_INTO using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL 
KW_VIEW using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200):

[jira] [Commented] (HIVE-7663) OrcRecordUpdater needs to implement getStats

2014-08-21 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105549#comment-14105549
 ] 

Owen O'Malley commented on HIVE-7663:
-

+1 although you need to look at the unit test failures.

 OrcRecordUpdater needs to implement getStats
 

 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7663.patch


 OrcRecordUpdater.getStats currently returns null.  It needs to track the 
 stats and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)

2014-08-21 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7680:
--

Attachment: HIVE-7680.2.patch

prev build #430 failed with message
HIVE-7680 is not Patch Available. Exiting.
+ exit 1
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/430/console

Attached patch #2 again

 Do not throw SQLException for HiveStatement getMoreResults and 
 setEscapeProcessing(false)
 -

 Key: HIVE-7680
 URL: https://issues.apache.org/jira/browse/HIVE-7680
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.13.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7680.2.patch, HIVE-7680.2.patch, HIVE-7680.patch


 1. Some JDBC clients call method setEscapeProcessing(false)  (e.g. SQL 
 Workbench)
 Looks like setEscapeProcessing(false) should do nothing.So, lets do  nothing 
 instead of throwing SQLException
 2. getMoreResults is needed in case Statements returns several ReseltSet.
 Hive does not support Multiple ResultSets. So this method can safely always 
 return false.
 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell 
 us how many rows were inserted. According to JDBC spec it should return  -1 
 if the current result is a ResultSet object or there are no more results 
 if this method returns 0 then in case of execution insert statement JDBC 
 client shows 0 rows were inserted which is not true.
 if this method returns -1 then JDBC client runs insert statements and  shows 
 that it was executed successfully, no result were returned. 
 I think the latter behaviour is more correct.
 4. Some methods in Statement class should throw 
 SQLFeatureNotSupportedException if they are not supported.  Current 
 implementation throws SQLException instead which means database access error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used


[ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105602#comment-14105602
 ] 

Ashutosh Chauhan commented on HIVE-7812:


Left some comments on RB.

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-7812.patch, HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24909: HIVE-7807: Refer to umask property using FsPermission.UMASK_LABEL

2014-08-21 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24909/#review51186
---

Ship it!


Ship It!

- Brock Noland


On Aug. 20, 2014, 8:37 p.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24909/
 ---
 
 (Updated Aug. 20, 2014, 8:37 p.m.)
 
 
 Review request for hive and Thejas Nair.
 
 
 Bugs: HIVE-7807
 https://issues.apache.org/jira/browse/HIVE-7807
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Refer to JIRA HIVE-7807 and HIVE-7001 for details.
 
 
 Diffs
 -
 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestUtilitiesDfs.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/test/org/apache/hadoop/hive/ql/exec/TestUtilities.java bf3fd88 
 
 Diff: https://reviews.apache.org/r/24909/diff/
 
 
 Testing
 ---
 
 Note: I had to create a separate test file for using MiniDFS in itests 
 module. Existing TestUtilities in ql needs hadoop-test and few other 
 dependencies which are not needed so far. So decided not to add extra test 
 dependencies in ql/pom.xml for just one test.
 
 
 Thanks,
 
 Venki Korukanti

[jira] [Updated] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete


 [ 
https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7646:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Eugene.

 Modify parser to support new grammar for Insert,Update,Delete
 -

 Key: HIVE-7646
 URL: https://issues.apache.org/jira/browse/HIVE-7646
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7646.1.patch, HIVE-7646.2.patch, HIVE-7646.3.patch, 
 HIVE-7646.patch


 need parser to recognize constructs such as :
 {code:sql}
 INSERT INTO Cust (Customer_Number, Balance, Address)
 VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave');
 {code}
 {code:sql}
 DELETE FROM Cust WHERE Balance  5.0
 {code}
 {code:sql}
 UPDATE Cust
 SET column1=value1,column2=value2,...
 WHERE some_column=some_value
 {code}
 also useful
 {code:sql}
 select a,b from values((1,2),(3,4)) as FOO(a,b)
 {code}
 This makes writing tests easier.
 Some references:
 http://dev.mysql.com/doc/refman/5.6/en/insert.html
 http://msdn.microsoft.com/en-us/library/dd776382.aspx
 http://www.postgresql.org/docs/9.1/static/sql-values.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning


[ 
https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105624#comment-14105624
 ] 

Ashutosh Chauhan commented on HIVE-7281:


Filing a separate ticket and unlinking it from this one is fine, only that it 
will be left to the mercy of someone who will do it : )  
Fixing a root cause is always better (in this case deleting error-prone 
DummyPartitions) but since immediate bug can be fixed by current patch. +1

 DbTxnManager acquiring wrong level of lock for dynamic partitioning
 ---

 Key: HIVE-7281
 URL: https://issues.apache.org/jira/browse/HIVE-7281
 Project: Hive
  Issue Type: Bug
  Components: Locking, Transactions
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7281.patch


 Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic 
 partitioning.  But this is not adequate.  This will not prevent drop 
 operations on partitions being written to.  The lock should be at the table 
 level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7830) CBO: Some UDF(case, lead, lag..) doesn't get translated correctly

2014-08-21 Thread Laljo John Pullokkaran (JIRA)

Laljo John Pullokkaran created HIVE-7830:


 Summary: CBO: Some UDF(case, lead, lag..) doesn't get translated 
correctly
 Key: HIVE-7830
 URL: https://issues.apache.org/jira/browse/HIVE-7830
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7827) [CBO] null expr in select list is not handled correctly


 [ 
https://issues.apache.org/jira/browse/HIVE-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7827:
---

Status: Patch Available  (was: Open)

 [CBO] null expr in select list is not handled correctly
 ---

 Key: HIVE-7827
 URL: https://issues.apache.org/jira/browse/HIVE-7827
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7827.patch


 select null from t1 fails



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7829:
---

Attachment: HIVE-7829.1.patch

 Entity.getLocation can throw an NPE
 ---

 Key: HIVE-7829
 URL: https://issues.apache.org/jira/browse/HIVE-7829
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7829.1.patch, HIVE-7892.patch


 It's possible for the getDataLocation methods which Entity.getLocation calls 
 to return null and as such NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row


 [ 
https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7571:
-

Status: Open  (was: Patch Available)

Found an issue in that OrcRecordUpdater isn't properly selecting the object 
inspector to use.

 RecordUpdater should read virtual columns from row
 --

 Key: HIVE-7571
 URL: https://issues.apache.org/jira/browse/HIVE-7571
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch


 Currently RecordUpdater.update and delete take rowid and original transaction 
 as parameters.  These values are already present in the row as part of the 
 new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer 
 from there.  And the writer will already have to handle skipping ROW__ID when 
 writing, so it needs to be aware of that column anyone.
 We could instead read the values from ROW__ID and then remove it from the 
 object inspector in FileSinkOperator, but this will be hard in the 
 vectorization case where rows are being dealt with 10k at a time.
 For these reasons it makes more sense to do this work in the writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24918: HIVE-7791 - Enable tests on Spark branch (1) [Sparch Branch]

2014-08-21 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24918/#review51187
---


Hey Brock, thanks, just some pretty minor comments below.


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/24918/#comment89218

I'm just curious, was this to fix a test?

Also is it necessary to catch and wrap in RuntimeException here?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
https://reviews.apache.org/r/24918/#comment89216

This is a bit strange, as the resolve() method returns null.  Not sure if 
we should assign to a variable as of now?


- Szehon Ho


On Aug. 21, 2014, 12:26 a.m., Brock Noland wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24918/
 ---
 
 (Updated Aug. 21, 2014, 12:26 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Enable tests
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties ecb8b74 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 d16f1be 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 dc621cf 
   ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java 
 026f4e0 
   ql/src/test/results/clientpositive/spark/alter_merge_orc.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/alter_merge_stats_orc.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket3.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket4.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/count.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/create_merge_compressed.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ctas.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/custom_input_output_format.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24918/diff/
 
 
 Testing
 ---
 
 Verified output vs MR
 
 
 Thanks,
 
 Brock Noland

Re: Review Request 24934: HIVE-7829 - Entity.getLocation can throw an NPE

2014-08-21 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24934/#review51189
---

Ship it!


Ship It!

- Szehon Ho


On Aug. 21, 2014, 3:47 p.m., Brock Noland wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24934/
 ---
 
 (Updated Aug. 21, 2014, 3:47 p.m.)
 
 
 Review request for hive and Szehon Ho.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Very simple change to return null if location cannot be obtained
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java aafeaab 
 
 Diff: https://reviews.apache.org/r/24934/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Brock Noland

[jira] [Commented] (HIVE-7829) Entity.getLocation can throw an NPE


[ 
https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105642#comment-14105642
 ] 

Szehon Ho commented on HIVE-7829:
-

+1 pending tests

 Entity.getLocation can throw an NPE
 ---

 Key: HIVE-7829
 URL: https://issues.apache.org/jira/browse/HIVE-7829
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7829.1.patch, HIVE-7892.patch


 It's possible for the getDataLocation methods which Entity.getLocation calls 
 to return null and as such NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7736:
--

Attachment: HIVE-7736.3.patch

regenerate the patch (rebase), wait for QA tests 

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, 
 HIVE-7736.3.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7807) Refer to umask property using FsPermission.UMASK_LABEL.


 [ 
https://issues.apache.org/jira/browse/HIVE-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7807:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you very much Venki! I have committed this to trunk!

 Refer to umask property using FsPermission.UMASK_LABEL.
 ---

 Key: HIVE-7807
 URL: https://issues.apache.org/jira/browse/HIVE-7807
 Project: Hive
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.14.0

 Attachments: HIVE-7807.1.patch


 Currently in 
 {{org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission}} umask 
 property is referred using {{fs.permissions.umask-mode}} which is only 
 available in Hadoop 2.x. Property dfs.umaskmode is used in 1.x for the same 
 purpose. Also dfs.umaskmode was not deprecated in 1.x according to 
 HADOOP-8727. This JIRA is to change umask property references to 
 {{FsPermission.UMASK_LABEL}} which always points to proper property in latest 
 Hadoop in each version (0.23.x, 1.x, 2.x)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Attachment: HIVE-7654.7.patch

regenerate the patch (rebase), wait for QA tests

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7820) union_null.q is not deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105658#comment-14105658
 ] 

Szehon Ho commented on HIVE-7820:
-

+1, thanks Brock

 union_null.q is not deterministic 
 --

 Key: HIVE-7820
 URL: https://issues.apache.org/jira/browse/HIVE-7820
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch


 union_null.q selects 10 rows from a subquery which returns man rows. Since 
 the subquery does not have an order by the 10 results returned vary.
 This problem exists on trunk and spark. We'll fix on trunk and merge to spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7831) Research commented out unset in Utiltities

Brock Noland created HIVE-7831:
--

 Summary: Research commented out unset in Utiltities
 Key: HIVE-7831
 URL: https://issues.apache.org/jira/browse/HIVE-7831
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7671) What happened to HCatalog?


 [ 
https://issues.apache.org/jira/browse/HIVE-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-7671:


Assignee: Alan Gates

 What happened to HCatalog?
 --

 Key: HIVE-7671
 URL: https://issues.apache.org/jira/browse/HIVE-7671
 Project: Hive
  Issue Type: Bug
Reporter: Sebb
Assignee: Alan Gates

 According to the Incubator website, HCatalog graduated to become part of Hive 
 in Feb 2013, yet I could find no references to HCatalog on the Hive website, 
 and there are still downloads on the Incubator mirrors:
 https://dist.apache.org/repos/dist/release/incubator/hcatalog/
 The Incubator HCatalog website redirects to Hive, so it would help if there 
 were some mention of what happened to it.
 Also if the HCatalog downloads are no longer relevant, they should be deleted 
 from the incubator mirror.
 They will continue to be available from the archives website if there is a 
 need to keep links to them for historic purposes:
 http://archive.apache.org/dist/incubator/hcatalog/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105681#comment-14105681
 ] 

Alan Gates commented on HIVE-7828:
--

+1, looks correct.

 TestCLIDriver.parquet_join.q is failing on trunk
 

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7663) OrcRecordUpdater needs to implement getStats


[ 
https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105690#comment-14105690
 ] 

Alan Gates commented on HIVE-7663:
--

TestDDLWithRemoteMetastoreSecondNamenode and TestHCatLoader pass fine for me 
when I run them locally.  The other two fail for me on trunk and with the 
patch, both on my mac and on linux.  I don't think any of those test anything I 
changed, since nothing but the streaming library is using the OrcRecordUpdater 
at this point.

 OrcRecordUpdater needs to implement getStats
 

 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7663.patch


 OrcRecordUpdater.getStats currently returns null.  It needs to track the 
 stats and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Status: Open  (was: Patch Available)

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Attachment: HIVE-7654.8.patch

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7736:
--

Status: Open  (was: Patch Available)

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, 
 HIVE-7736.3.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Status: Patch Available  (was: Open)

wait for QA

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Hive on Tez Counters

2014-08-21 Thread Siddharth Seth

I'll let Hive folks answer the questions about the Hive counters.

In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been
fixed in 0.5.0.

COMMITTED_HEAP_BYTES just represents the memory available to the JVM
(Runtime.getRuntime().totalMemory()). This will only vary if the VM is
started with a different Xms and Xmx option.

In terms of Tez, the application logs are currently the best place. Hive
may expose these in a more accessible manner though.


On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad 
sumasai.shivapra...@gmail.com wrote:

 Hi,

 Needed info on where I can get detailed job counters for Hive on Tez. Am
 running this on a HDP cluster with Hive 0.13 and see only the following job
 counters through Hive Tez in Yarn application logs which I got through(
 yarn logs -applicationId ...) .

 a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS
 is the only counter present in MapOperator
 b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate
 c. What does COMMITTED_HEAP_BYTES indicate?
 d. Is there any other place I should be checking the counters?

 [[File System Counters
 FILE: BYTES_READ=512,
 FILE: BYTES_WRITTEN=3079881,
 FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS:
 BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS:
 LARGE_READ_OPS=0, HDFS: WRITE_OPS=0]

 [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543,
 GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*,
 PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248,
 COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543,
 OUTPUT_RECORDS=222543,
 OUTPUT_BYTES=23543896,
 OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369,
 ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
 ADDITIONAL_SPILL_COUNT=0]


 [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter
 DESERIALIZE_ERRORS=0]]

 Thanks
 Suma

[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7736:
--

Status: Patch Available  (was: Open)

wait for QA tests

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, 
 HIVE-7736.3.patch, HIVE-7736.4.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table


 [ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7736:
--

Attachment: HIVE-7736.4.patch

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, 
 HIVE-7736.3.patch, HIVE-7736.4.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe


[ 
https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105711#comment-14105711
 ] 

Hive QA commented on HIVE-7735:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663414/HIVE-7735.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/444/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/444/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-444/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663414

 Implement Char, Varchar in ParquetSerDe
 ---

 Key: HIVE-7735
 URL: https://issues.apache.org/jira/browse/HIVE-7735
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
  Labels: Parquet
 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, 
 HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch


 This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe.
 Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105720#comment-14105720
]

Szehon Ho commented on HIVE-7384:
-

Thanks for the comment, I had a similar thought initially, but then saw that
sortByKey does a re-partitioning (range-partition), as it has to achieve total
order. I think we need something that does sorting within a partition.

Research into reduce-side join [Spark Branch]
-

Key: HIVE-7384
URL: https://issues.apache.org/jira/browse/HIVE-7384
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt,
sales_products.txt, sales_stores.txt

Hive's join operator is very sophisticated, especially for reduce-side join.
While we expect that other types of join, such as map-side join and SMB
map-side join, will work out of the box with our design, there may be some
complication in reduce-side join, which extensively utilizes key tag and
shuffle behavior. Our design principle prefers to making Hive implementation
work out of box also, which might requires new functionality from Spark. The
tasks is to research into this area, identifying requirements for Spark
community and the work to be done on Hive to make reduce-side join work.
A design doc might be needed for this. For more information, please refer to
the overall design doc on wiki.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105752#comment-14105752
 ] 

Brock Noland commented on HIVE-7828:


Thanks Alan

 TestCLIDriver.parquet_join.q is failing on trunk
 

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7702:
---

Attachment: HIVE-7702.1-spark.patch

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7702:
---

Status: Open  (was: Patch Available)

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7702:
---

Status: Patch Available  (was: Open)

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7831) Research commented out unset in Utiltities


 [ 
https://issues.apache.org/jira/browse/HIVE-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7831:
---

Description: 
We did the following in HIVE-7370

{noformat}
// TODO HIVE-7831
// conf.unset(FsPermission.UMASK_LABEL);
{noformat}

We should understand that.

 Research commented out unset in Utiltities
 --

 Key: HIVE-7831
 URL: https://issues.apache.org/jira/browse/HIVE-7831
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 We did the following in HIVE-7370
 {noformat}
 // TODO HIVE-7831
 // conf.unset(FsPermission.UMASK_LABEL);
 {noformat}
 We should understand that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105759#comment-14105759
 ] 

Chinna Rao Lalam commented on HIVE-7702:


insert_into2.q.out  is corrected..

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7222) Support timestamp column statistics in ORC and extend PPD for timestamp


 [ 
https://issues.apache.org/jira/browse/HIVE-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7222:
-

Attachment: HIVE-7222.1.patch

Renamed the patch for Hive QA to pickup.

 Support timestamp column statistics in ORC and extend PPD for timestamp
 ---

 Key: HIVE-7222
 URL: https://issues.apache.org/jira/browse/HIVE-7222
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Daniel Dai
  Labels: orcfile
 Attachments: HIVE-7222-1.patch, HIVE-7222.1.patch


 Add column statistics for timestamp columns in ORC. Also extend predicate 
 pushdown to support timestamp column evaluation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7735) Implement Char, Varchar in ParquetSerDe


 [ 
https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7735:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Mohit for the contribution!

 Implement Char, Varchar in ParquetSerDe
 ---

 Key: HIVE-7735
 URL: https://issues.apache.org/jira/browse/HIVE-7735
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, 
 HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch


 This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe.
 Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

Prasanth J created HIVE-7832:


 Summary: Do ORC dictionary check at a finer level and preserve 
encoding across stripes
 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J


Currently ORC dictionary check happens while writing the stripe. Just before 
writing stripe if ratio of dictionary entries to total non-null rows is greater 
than threshold then the dictionary is discarded. Also, the decision of using 
dictionary or not is preserved across stripes. This sometimes leads to costly 
insertion cost of O(logn) for each stripes when there are too many distinct 
keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes


 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.1.patch

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2014-08-21 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105813#comment-14105813
 ] 

Mostafa Mokhtar commented on HIVE-7723:
---

Ping!
[~gopalv] [~hagleitn]

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
 HIVE-7723.4.patch, HIVE-7723.5.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in

[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes


 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Status: Patch Available  (was: Open)

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q


[ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105822#comment-14105822
 ] 

Brock Noland commented on HIVE-7821:


Hi [~chinnalalam] just an FYI that HIVE-7793 is available! :)

 StarterProject: enable groupby4.q
 -

 Key: HIVE-7821
 URL: https://issues.apache.org/jira/browse/HIVE-7821
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Suhas Satish





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]