[jira] [Assigned] (HIVE-7541) Support union all on Spark

2014-07-30 Thread Na Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang reassigned HIVE-7541:
-

Assignee: Na Yang

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang

 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078962#comment-14078962
 ] 

Lefty Leverenz commented on HIVE-7436:
--

bq.  Is HADOOP_CLASSPATH documented anywhere for Hive?

Grepping the Hive wiki reveals three docs that mention HADOOP_CLASSPATH, but 
none for Hive:

* [HCatalog InputOutput -- Running MapReduce with HCatalog (see first example) 
| 
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog]
* [Install WebHCat -- Hadoop Distributed Cache (see templeton.override.jars, 
which is the last config in the section) | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+InstallWebHCat#WebHCatInstallWebHCat-HadoopDistributedCache]
* [WebHCat Configuration -- Configuration Variables (see 
templeton.override.jars, which is 5th in the table) | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

 Load Spark configuration into Hive driver
 -

 Key: HIVE-7436
 URL: https://issues.apache.org/jira/browse/HIVE-7436
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, 
 HIVE-7436-Spark.3.patch


 load Spark configuration into Hive driver, there are 3 ways to setup spark 
 configurations:
 #  Java property.
 #  Configure properties in spark configuration file(spark-defaults.conf).
 #  Hive configuration file(hive-site.xml).
 The below configuration has more priority, and would overwrite previous 
 configuration with the same property name.
 Please refer to [http://spark.apache.org/docs/latest/configuration.html] for 
 all configurable properties of spark, and you can configure spark 
 configuration in Hive through following ways:
 # Configure through spark configuration file.
 #* Create spark-defaults.conf, and place it in the /etc/spark/conf 
 configuration directory. configure properties in spark-defaults.conf in java 
 properties format.
 #* Create the $SPARK_CONF_DIR environment variable and set it to the location 
 of spark-defaults.conf.
 export SPARK_CONF_DIR=/etc/spark/conf
 #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
 export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
 # Configure through hive configuration file.
 #* edit hive-site.xml in hive conf directory, configure properties in 
 spark-defaults.conf in xml format.
 Hive driver default spark properties:
 ||name||default value||description||
 |spark.master|local|Spark master url.|
 |spark.app.name|Hive on Spark|Default Spark application name.|
 NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7497) Fix some default values in HiveConf

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078973#comment-14078973
 ] 

Lefty Leverenz commented on HIVE-7497:
--

Good, that makes sense.  Thanks [~dongc].

(I'd fix your env smiley but it's fun -- let the parenthesis remain open.)

 Fix some default values in HiveConf
 ---

 Key: HIVE-7497
 URL: https://issues.apache.org/jira/browse/HIVE-7497
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Dong Chen
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7497.1.patch, HIVE-7497.patch


 HIVE-5160 resolves an env variable at runtime via calling System.getenv(). As 
 long as the variable is not defined when you run the build null is returned 
 and the path is not placed in the hive-default,template. However if it is 
 defined it will populate hive-default.template with a path which will be 
 different based on the user running the build. We should use 
 $\{system:HIVE_CONF_DIR\} instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078977#comment-14078977
 ] 

Hive QA commented on HIVE-7029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658529/HIVE-7029.7.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5835 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/97/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/97/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-97/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658529

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7553) avoid the scheduling maintenance window for every jar change

2014-07-30 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-7553:
--

 Summary: avoid the scheduling maintenance window for every jar 
change
 Key: HIVE-7553
 URL: https://issues.apache.org/jira/browse/HIVE-7553
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


When user needs to refresh existing or add a new jar to HS2, it needs to 
restart it. As HS2 is service exposed to clients, this requires scheduling 
maintenance window for every jar change. It would be great if we could avoid 
that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

2014-07-30 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079007#comment-14079007
 ] 

Chengxiang Li commented on HIVE-7436:
-

[~xuefuz] HADOOP_CONF_DIR is added to HADOOP_CLASSPATH in hadoop-config.sh, so 
as HIVE_CONF_DIR in hive-config.sh. if we only load spark configuration file 
from classpath, there are 2 choices:
# export SPARK_CONF_DIR, and add it to HADOOP_CLASSPATH manually.
# commit a patch which would add SPARK_CONF_DIR to HADOOP_CLASSPATH in hive 
scripts(such as hive-config.sh). export SPARK_CONF_DIR. 

my concern about supporting load spark configuration file from SPARK_CONF_DIR 
in implementation level is that:
# HADOOP/HIVE/HIVE on TEZ only load configuration file from classpath actually.
# it may introduce more complexity, like what should we do if different spark 
configuration file available on SPARK_CONF_DIR and HADOOP_CLASSPATH both?

The way how to configure Hive on Tez is similar as current Hive on Spark. [Hive 
on Tez 
Configuration|http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html]



 Load Spark configuration into Hive driver
 -

 Key: HIVE-7436
 URL: https://issues.apache.org/jira/browse/HIVE-7436
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, 
 HIVE-7436-Spark.3.patch


 load Spark configuration into Hive driver, there are 3 ways to setup spark 
 configurations:
 #  Java property.
 #  Configure properties in spark configuration file(spark-defaults.conf).
 #  Hive configuration file(hive-site.xml).
 The below configuration has more priority, and would overwrite previous 
 configuration with the same property name.
 Please refer to [http://spark.apache.org/docs/latest/configuration.html] for 
 all configurable properties of spark, and you can configure spark 
 configuration in Hive through following ways:
 # Configure through spark configuration file.
 #* Create spark-defaults.conf, and place it in the /etc/spark/conf 
 configuration directory. configure properties in spark-defaults.conf in java 
 properties format.
 #* Create the $SPARK_CONF_DIR environment variable and set it to the location 
 of spark-defaults.conf.
 export SPARK_CONF_DIR=/etc/spark/conf
 #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
 export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
 # Configure through hive configuration file.
 #* edit hive-site.xml in hive conf directory, configure properties in 
 spark-defaults.conf in xml format.
 Hive driver default spark properties:
 ||name||default value||description||
 |spark.master|local|Spark master url.|
 |spark.app.name|Hive on Spark|Default Spark application name.|
 NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079025#comment-14079025
 ] 

Hive QA commented on HIVE-7519:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658531/HIVE-7519.1.patch

{color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 5838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_role_grant2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_print_header
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_non_string_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testsequencefile
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/98/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/98/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-98/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 31 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658531

 Refactor QTestUtil to remove its duplication with QFileClient for qtest setup 
 and teardown 
 ---

 Key: HIVE-7519
 URL: https://issues.apache.org/jira/browse/HIVE-7519
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7519.1.patch, HIVE-7519.patch


 QTestUtil hard codes creation and dropping of source tables for qtests. 
 QFileClient does the same thing but in a better way, uses q_test_init.sql and 
 q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes 
 sense to refactor it to use QFileClient's approach. This will also remove 
 duplication of code addressing same purpose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7544) Changes related to TEZ-1288 (FastTezSerialization)

2014-07-30 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-7544:
---

Attachment: HIVE-7544.1.patch

 Changes related to TEZ-1288 (FastTezSerialization)
 --

 Key: HIVE-7544
 URL: https://issues.apache.org/jira/browse/HIVE-7544
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 0.14.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: HIVE-7544.1.patch


 Add ability to make use of TezBytesWritableSerialization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-4934) ntile function has to be the last thing in the select list

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke reassigned HIVE-4934:
--

Assignee: Lars Francke

 ntile function has to be the last thing in the select list
 --

 Key: HIVE-4934
 URL: https://issues.apache.org/jira/browse/HIVE-4934
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
 FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Only COMPLETE mode supported for NTile function
 SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
 ...works...
 {code}
 I'm not sure if that is a bug or necessary. Either way the error message is 
 not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
 cursory glance at the code didn't help me either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-4934) ntile function has to be the last thing in the select list

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-4934.


Resolution: Fixed

This was a misunderstanding on my part. I'll add a sentence to the 
documentation to clear this up for other.

 ntile function has to be the last thing in the select list
 --

 Key: HIVE-4934
 URL: https://issues.apache.org/jira/browse/HIVE-4934
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
 FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Only COMPLETE mode supported for NTile function
 SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
 ...works...
 {code}
 I'm not sure if that is a bug or necessary. Either way the error message is 
 not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
 cursory glance at the code didn't help me either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4934) Improve documentation of OVER clause

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-4934:
---

Description: 
{code}
CREATE TABLE test (foo INT);
SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
Only COMPLETE mode supported for NTile function

SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
...works...
{code}

I'm not sure if that is a bug or necessary. Either way the error message is not 
helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory 
glance at the code didn't help me either.

Edit: It is not a bug, it wasn't clear to me that the OVER clause only applies 
to the directly preceding function.

  was:
{code}
CREATE TABLE test (foo INT);
SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
Only COMPLETE mode supported for NTile function

SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
...works...
{code}

I'm not sure if that is a bug or necessary. Either way the error message is not 
helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory 
glance at the code didn't help me either.


 Improve documentation of OVER clause
 

 Key: HIVE-4934
 URL: https://issues.apache.org/jira/browse/HIVE-4934
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
 FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Only COMPLETE mode supported for NTile function
 SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
 ...works...
 {code}
 I'm not sure if that is a bug or necessary. Either way the error message is 
 not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
 cursory glance at the code didn't help me either.
 Edit: It is not a bug, it wasn't clear to me that the OVER clause only 
 applies to the directly preceding function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4934) Improve documentation of OVER clause

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-4934:
---

Summary: Improve documentation of OVER clause  (was: ntile function has to 
be the last thing in the select list)

 Improve documentation of OVER clause
 

 Key: HIVE-4934
 URL: https://issues.apache.org/jira/browse/HIVE-4934
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
 FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Only COMPLETE mode supported for NTile function
 SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
 ...works...
 {code}
 I'm not sure if that is a bug or necessary. Either way the error message is 
 not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
 cursory glance at the code didn't help me either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23799: HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli

2014-07-30 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23799/
---

(Updated July 30, 2014, 8:30 a.m.)


Review request for hive.


Changes
---

1. use hadoop.io.utils to close stream
2. change integrated test due to code changes
3. add quotedCsv format instead of option according to discussion
4. add one constructor parameter to specify the status of quoted


Bugs: HIVE-7390
https://issues.apache.org/jira/browse/HIVE-7390


Repository: hive-git


Description
---

HIVE-7390: refactor csv output format with in RFC mode and add one more option 
to support formatting as the csv format in hive cli


Diffs (updated)
-

  beeline/pom.xml 6ec1d1aff3f35c097aa6054aae84faf2d63854f1 
  beeline/src/java/org/apache/hive/beeline/BeeLine.java 
528a98e29c23421f9352bdf7c5edd3a9fae0e3ea 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
7853c3f38f3c3fb9ae0b9939c714f1dc940ba053 
  beeline/src/main/resources/BeeLine.properties 
390d062b8dc52dfa790c7351f3db44c1e0dd7e37 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
bd97aff5959fd9040fc0f0a1f6b782f2aa6f 
  pom.xml b5a5697e6a3b689c2b244ba0338be541261eaa3d 

Diff: https://reviews.apache.org/r/23799/diff/


Testing
---


Thanks,

cheng xu



[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079067#comment-14079067
 ] 

Hive QA commented on HIVE-7432:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658545/HIVE-7432.patch

{color:red}ERROR:{color} -1 due to 66 failed/errored test(s), 5838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_literal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeArrays
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeBytes
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeEnums
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeFixed
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeMapWithNullablePrimitiveValues
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeMapsWithPrimitiveKeys
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableEnums
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeRecords
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeUnions
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeVoidType
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyCaching
org.apache.hadoop.hive.serde2.avro.TestAvroObjectInspectorGenerator.convertsNullableEnum
org.apache.hadoop.hive.serde2.avro.TestAvroObjectInspectorGenerator.objectInspectorsAreCached
org.apache.hadoop.hive.serde2.avro.TestAvroSerde.initializeDoesNotReuseSchemasFromConf
org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS
org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.getTypeFromNullableTypePositiveCase
org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.isNullableTypeAcceptsNullableUnions
org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.noneOptionWorksForSpecifyingSchemas
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeArraysWithNullableComplexElements
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeArraysWithNullablePrimitiveElements
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeBooleans
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeBytes
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeDecimals
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeDoubles
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeEnums
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeFixed
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeFloats
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeInts
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeListOfDecimals
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeLists
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapOfDecimals
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMaps
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapsWithNullableComplexValues
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapsWithNullablePrimitiveValues
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableBytes
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableDecimals
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableEnums
org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableFixed

[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079068#comment-14079068
 ] 

Hive QA commented on HIVE-7509:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658568/HIVE-7509.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/100/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/100/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-100/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-100/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java'
Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
hwi/target common/target common/src/gen service/target contrib/target 
serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1614583.

At revision 1614583.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658568

 Fast stripe level merging for ORC
 

[jira] [Updated] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7390:
---

Attachment: HIVE-7390.4.patch

code changes according to the discussion

 Make quote character optional and configurable in BeeLine CSV/TSV output
 

 Key: HIVE-7390
 URL: https://issues.apache.org/jira/browse/HIVE-7390
 Project: Hive
  Issue Type: New Feature
  Components: Clients
Affects Versions: 0.13.1
Reporter: Jim Halfpenny
Assignee: Ferdinand Xu
 Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
 HIVE-7390.4.patch, HIVE-7390.patch


 Currently when either the CSV or TSV output formats are used in beeline each 
 column is wrapped in single quotes. Quote wrapping of columns should be 
 optional and the user should be able to choose the character used to wrap the 
 columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4933) Document how aliases work with the OVER clause

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-4933:
---

Summary: Document how aliases work with the OVER clause  (was: Can't use 
alias directly before OVER clause)

 Document how aliases work with the OVER clause
 --

 Key: HIVE-4933
 URL: https://issues.apache.org/jira/browse/HIVE-4933
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test;
 MismatchedTokenException(175!=110)
   at 
 org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983)
   at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 
 'bar' in from clause{code}
 The same happens without the {{AS}} but it works when leaving out the alias 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4933) Document how aliases work with the OVER clause

2014-07-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079077#comment-14079077
 ] 

Lars Francke commented on HIVE-4933:


The proper usage turns out to be

{code:sql}
SELECT SUM(foo) OVER (PARTITION BY foo) AS bar FROM test;
{code}

I have added documentation to the Wiki for this.

 Document how aliases work with the OVER clause
 --

 Key: HIVE-4933
 URL: https://issues.apache.org/jira/browse/HIVE-4933
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test;
 MismatchedTokenException(175!=110)
   at 
 org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983)
   at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 
 'bar' in from clause{code}
 The same happens without the {{AS}} but it works when leaving out the alias 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-4933) Document how aliases work with the OVER clause

2014-07-30 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke reassigned HIVE-4933:
--

Assignee: Lars Francke

 Document how aliases work with the OVER clause
 --

 Key: HIVE-4933
 URL: https://issues.apache.org/jira/browse/HIVE-4933
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test;
 MismatchedTokenException(175!=110)
   at 
 org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284)
   at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983)
   at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 
 'bar' in from clause{code}
 The same happens without the {{AS}} but it works when leaving out the alias 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7327) Refactoring: make Hive map side data processing reusable

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-7327.
---

Resolution: Won't Fix

Closed as not fix. Will reopen if need comes back.

 Refactoring: make Hive map side data processing reusable
 

 Key: HIVE-7327
 URL: https://issues.apache.org/jira/browse/HIVE-7327
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 ExecMapper is Hive's mapper implementation for MapReduce. Table rows are read 
 by MR framework and processed by ExecMapper.map() method, which invokes 
 Hive's map-side operator tree starting from MapOperator. This task is to 
 extract the map-side data processing offered by the operator tree so that it 
 can be used by other execution engine such as Spark. This is purely 
 refactoring the existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7328) Refactoring: make Hive reduce side data processing reusable

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-7328.
---

Resolution: Won't Fix

Closed as will not fix. Will reopen if need comes back.

 Refactoring: make Hive reduce side data processing reusable
 ---

 Key: HIVE-7328
 URL: https://issues.apache.org/jira/browse/HIVE-7328
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 ExecReducer is Hive's reducer implementation for MapReduce. Table rows are 
 shuffled by MR framework to ExecReducer and further processed by 
 ExecReducer.reduce() method, which invokes Hive's reduce-side operator tree 
 starting. This task is to extract the reduce-side data processing offered by 
 the operator tree so that it can be reused by other execution engine such as 
 Spark. This is purely refactoring the existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7552) Collect spark job statistic through spark metrics[Spark Branch]

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7552:


Description: 
MR/Tez use counters to collect job statistic information, while Spark does not 
use accumulator to do the same thing. Instead, Spark store task metrics 
information in TaskMetrics and send it back to scheduler. We  could get spark 
job statistic information through combine all TaskMetrics with SparkListener.
NO PRECOMMIT TESTS. This is for spark-branch only.

  was:
MR/Tez use counters to collect job statistic information, while Spark has a 
configurable metrics system based on the Coda Hale Metrics Library. We  could 
collect spark job statistic information through spark metrics system in hive 
driver side.
NO PRECOMMIT TESTS. This is for spark-branch only.


 Collect spark job statistic through spark metrics[Spark Branch]
 ---

 Key: HIVE-7552
 URL: https://issues.apache.org/jira/browse/HIVE-7552
 Project: Hive
  Issue Type: New Feature
  Components: Spark
Reporter: Chengxiang Li

 MR/Tez use counters to collect job statistic information, while Spark does 
 not use accumulator to do the same thing. Instead, Spark store task metrics 
 information in TaskMetrics and send it back to scheduler. We  could get spark 
 job statistic information through combine all TaskMetrics with SparkListener.
 NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079131#comment-14079131
 ] 

Xuefu Zhang commented on HIVE-7436:
---

[~chengxiang li], I guess for now expecting spark-defaults.conf from hadoop 
classpath is fine for now, though we might need to go back to revisit and 
rebrainstorm on this. Note that we don't have to follow exactly what Tez did on 
every aspect, but I agree it can serve as a good reference point, giving users 
a similar experience.


 Load Spark configuration into Hive driver
 -

 Key: HIVE-7436
 URL: https://issues.apache.org/jira/browse/HIVE-7436
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, 
 HIVE-7436-Spark.3.patch


 load Spark configuration into Hive driver, there are 3 ways to setup spark 
 configurations:
 #  Java property.
 #  Configure properties in spark configuration file(spark-defaults.conf).
 #  Hive configuration file(hive-site.xml).
 The below configuration has more priority, and would overwrite previous 
 configuration with the same property name.
 Please refer to [http://spark.apache.org/docs/latest/configuration.html] for 
 all configurable properties of spark, and you can configure spark 
 configuration in Hive through following ways:
 # Configure through spark configuration file.
 #* Create spark-defaults.conf, and place it in the /etc/spark/conf 
 configuration directory. configure properties in spark-defaults.conf in java 
 properties format.
 #* Create the $SPARK_CONF_DIR environment variable and set it to the location 
 of spark-defaults.conf.
 export SPARK_CONF_DIR=/etc/spark/conf
 #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
 export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
 # Configure through hive configuration file.
 #* edit hive-site.xml in hive conf directory, configure properties in 
 spark-defaults.conf in xml format.
 Hive driver default spark properties:
 ||name||default value||description||
 |spark.master|local|Spark master url.|
 |spark.app.name|Hive on Spark|Default Spark application name.|
 NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079150#comment-14079150
 ] 

Xuefu Zhang commented on HIVE-7436:
---

One more question: where did you see that tez-site.xml is read from classpath 
by Hive, in the code or documentation somewhere? I wasn't able to find either.

 Load Spark configuration into Hive driver
 -

 Key: HIVE-7436
 URL: https://issues.apache.org/jira/browse/HIVE-7436
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, 
 HIVE-7436-Spark.3.patch


 load Spark configuration into Hive driver, there are 3 ways to setup spark 
 configurations:
 #  Java property.
 #  Configure properties in spark configuration file(spark-defaults.conf).
 #  Hive configuration file(hive-site.xml).
 The below configuration has more priority, and would overwrite previous 
 configuration with the same property name.
 Please refer to [http://spark.apache.org/docs/latest/configuration.html] for 
 all configurable properties of spark, and you can configure spark 
 configuration in Hive through following ways:
 # Configure through spark configuration file.
 #* Create spark-defaults.conf, and place it in the /etc/spark/conf 
 configuration directory. configure properties in spark-defaults.conf in java 
 properties format.
 #* Create the $SPARK_CONF_DIR environment variable and set it to the location 
 of spark-defaults.conf.
 export SPARK_CONF_DIR=/etc/spark/conf
 #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
 export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
 # Configure through hive configuration file.
 #* edit hive-site.xml in hive conf directory, configure properties in 
 spark-defaults.conf in xml format.
 Hive driver default spark properties:
 ||name||default value||description||
 |spark.master|local|Spark master url.|
 |spark.app.name|Hive on Spark|Default Spark application name.|
 NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23799: HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli

2014-07-30 Thread Lars Francke

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23799/#review49091
---


In general this feels a bit awkward. I think better CSV/TSV support is a good 
idea but quotedCsv seems misleading as the old csv and tsv now quote as 
well if the separator is contained in the column value.


beeline/src/java/org/apache/hive/beeline/BeeLine.java
https://reviews.apache.org/r/23799/#comment85924

Missing space here and next line



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85920

remove this and call to getSeparator, can just be separator.



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85915

Can be converted to a variable arity function (e.g. String... vals)



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85916

Rename to writer?



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85917

Same as above: Can be converted to variable arity method



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85918

...variable arity



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85919

Remove this and probably replace the call to isSingleQuoted with just 
singleQuoted, no need to go through a simple getter



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85923

Missing spaces around the else



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85922

I'd either remove the getter and setters entirely or they need changing so 
that things are properly updated when separator/singleQuoted/csvPreference are 
changed.

Example: Someone passes in a CsvPreference with a different separator than 
the one set in here.

I think part of this patch needs to be the removal of all these simple 
(getter/)setters.

If you don't want that then you need some verification logic that things 
make sense.



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java
https://reviews.apache.org/r/23799/#comment85921

This is not a getter but a setter.


- Lars Francke


On July 30, 2014, 8:30 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23799/
 ---
 
 (Updated July 30, 2014, 8:30 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7390
 https://issues.apache.org/jira/browse/HIVE-7390
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7390: refactor csv output format with in RFC mode and add one more 
 option to support formatting as the csv format in hive cli
 
 
 Diffs
 -
 
   beeline/pom.xml 6ec1d1aff3f35c097aa6054aae84faf2d63854f1 
   beeline/src/java/org/apache/hive/beeline/BeeLine.java 
 528a98e29c23421f9352bdf7c5edd3a9fae0e3ea 
   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
 7853c3f38f3c3fb9ae0b9939c714f1dc940ba053 
   beeline/src/main/resources/BeeLine.properties 
 390d062b8dc52dfa790c7351f3db44c1e0dd7e37 
   
 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
  bd97aff5959fd9040fc0f0a1f6b782f2aa6f 
   pom.xml b5a5697e6a3b689c2b244ba0338be541261eaa3d 
 
 Diff: https://reviews.apache.org/r/23799/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 cheng xu
 




[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079180#comment-14079180
 ] 

Hive QA commented on HIVE-7532:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658566/HIVE-7532.2.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5823 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/101/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/101/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-101/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658566

 allow disabling direct sql per query with external metastore
 

 Key: HIVE-7532
 URL: https://issues.apache.org/jira/browse/HIVE-7532
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.patch.txt


 Currently with external metastore, direct sql can only be disabled via 
 metastore config globally. Perhaps it makes sense to have the ability to 
 propagate the setting per query from client to override the metastore 
 setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079216#comment-14079216
 ] 

Lars Francke commented on HIVE-7390:


As noted in my review I'm not too sure about adding another format especially 
if it's called quotedCSV because that implies that the others aren't using 
quoting but they actually are when needed.

The old way sometimes produces invalid CSV (when quoting or delimiter chars 
exist in the data) so I think it's a good idea to fix this (and super-csv seems 
to solve that). I'm not sure if preserving the old functionality is worth 
anything. And if you do then maybe deprecate it and name it `deprecatedCSV` or 
something like that.

I'd be in favor of two options instead (similar to what was suggested 
originally)
* Delimiter
* Quoting character

Maybe even a third: Quoting mode. I'm in favor of always adding quotes as it 
makes parsing easier (no need to check for quoted/unquoted columns etc.). If 
not adding that I'd vote in favor of changing the current quoting mode to the 
AllwaysQuote mode.

 Make quote character optional and configurable in BeeLine CSV/TSV output
 

 Key: HIVE-7390
 URL: https://issues.apache.org/jira/browse/HIVE-7390
 Project: Hive
  Issue Type: New Feature
  Components: Clients
Affects Versions: 0.13.1
Reporter: Jim Halfpenny
Assignee: Ferdinand Xu
 Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
 HIVE-7390.4.patch, HIVE-7390.patch


 Currently when either the CSV or TSV output formats are used in beeline each 
 column is wrapped in single quotes. Quote wrapping of columns should be 
 optional and the user should be able to choose the character used to wrap the 
 columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079292#comment-14079292
 ] 

Hive QA commented on HIVE-7547:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658571/HIVE-7547.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5825 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/102/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/102/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-102/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658571

 Add ipAddress and userName to ExecHook
 --

 Key: HIVE-7547
 URL: https://issues.apache.org/jira/browse/HIVE-7547
 Project: Hive
  Issue Type: New Feature
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7547.2.patch, HIVE-7547.patch


 Auditing tools should be able to know about the ipAddress and userName of the 
 user executing operations.  
 These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


hive udf cannot recognize generic method

2014-07-30 Thread Dan Fan
Hi there

I am writing a hive UDF function. The input could be string, int, double etc.
The return is based on the data type. I was trying to use the generic method, 
however, hive seems not recognize it.
Here is the piece of code I have as example.


  public T T evaluate(final T s, final String column_name, final int bitmap) 
throws Exception {


 if (s instanceof Double)

return (T) new Double(-1.0);

 Else if( s instance of Integer)

Return (T) new Integer(-1) ;

…..

}


Does anyone know if hive supports the generic method ? Or I have to override 
the evaluate method for each type of input.


Thanks


Dan



[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079334#comment-14079334
 ] 

Hive QA commented on HIVE-6437:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658572/HIVE-6437.6.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5838 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/103/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/103/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658572

 DefaultHiveAuthorizationProvider should not initialize a new HiveConf
 -

 Key: HIVE-6437
 URL: https://issues.apache.org/jira/browse/HIVE-6437
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0
Reporter: Harsh J
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, 
 HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, 
 HIVE-6437.6.patch.txt


 During a HS2 connection, every SessionState got initializes a new 
 DefaultHiveAuthorizationProvider object (on stock configs).
 In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
 may prove too expensive, and unnecessary to do, since SessionState itself 
 sends in a fully applied HiveConf to it in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7554:
--

 Summary: Parquet Hive should resolve column names in case 
insensitive manner
 Key: HIVE-7554
 URL: https://issues.apache.org/jira/browse/HIVE-7554
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7554:
---

Attachment: HIVE-7554.patch

 Parquet Hive should resolve column names in case insensitive manner
 ---

 Key: HIVE-7554
 URL: https://issues.apache.org/jira/browse/HIVE-7554
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7554.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner

2014-07-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079398#comment-14079398
 ] 

Brock Noland commented on HIVE-7554:


Patch cleans up ws.

 Parquet Hive should resolve column names in case insensitive manner
 ---

 Key: HIVE-7554
 URL: https://issues.apache.org/jira/browse/HIVE-7554
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7554.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079418#comment-14079418
 ] 

Hive QA commented on HIVE-7446:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658576/HIVE-7446.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5840 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658576

 Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
 --

 Key: HIVE-7446
 URL: https://issues.apache.org/jira/browse/HIVE-7446
 Project: Hive
  Issue Type: New Feature
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7446.patch


 HIVE-6806 adds native support for creating hive table stored as Avro. It 
 would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed 
 tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079544#comment-14079544
 ] 

Thejas M Nair commented on HIVE-6437:
-

[~navis] The latest patch also has this change in 
SQLStdHiveAccessController.java to make admin role comparison case sensitive. 
But role names are not case sensitive in sql std auth mode (also documented in 
the wiki).
{code}
-if (!HiveMetaStore.ADMIN.equalsIgnoreCase(role.getRoleName())) {
+if (!HiveMetaStore.ADMIN.equals(role.getRoleName())) {
{code}

 DefaultHiveAuthorizationProvider should not initialize a new HiveConf
 -

 Key: HIVE-6437
 URL: https://issues.apache.org/jira/browse/HIVE-6437
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0
Reporter: Harsh J
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, 
 HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, 
 HIVE-6437.6.patch.txt


 During a HS2 connection, every SessionState got initializes a new 
 DefaultHiveAuthorizationProvider object (on stock configs).
 In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
 may prove too expensive, and unnecessary to do, since SessionState itself 
 sends in a fully applied HiveConf to it in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7545) Tableau connecting with MapR ODBC driver cannot get more than 43 columns

2014-07-30 Thread Venkata krishnan Sowrirajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata krishnan Sowrirajan resolved HIVE-7545.
---

Resolution: Invalid

 Tableau connecting with MapR ODBC driver cannot get more than 43 columns
 

 Key: HIVE-7545
 URL: https://issues.apache.org/jira/browse/HIVE-7545
 Project: Hive
  Issue Type: Bug
 Environment: Tableau connecting using MapR ODBC driver - Windows
Reporter: Venkata krishnan Sowrirajan
 Fix For: 0.13.1


 Hive table with 170 columns and 1million rows.
 When I queried like 170 columns in a hive table with tableau using MapR ODBC 
 driver. It cannot query more than 43 columns. After that its giving out an 
 error saying 
 [MapR][HiveODBC] (35) Error from Hive: error code: '10007' error message: 
 'Error while compiling statement: FAILED: SemanticException [Error 10007]: 
 Ambiguous column reference c_43'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7549) Code cleanup of Task.java and HiveInputFormat.java

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079564#comment-14079564
 ] 

Hive QA commented on HIVE-7549:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658575/HIVE-7549.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5838 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/105/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/105/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-105/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658575

 Code cleanup of Task.java and HiveInputFormat.java
 --

 Key: HIVE-7549
 URL: https://issues.apache.org/jira/browse/HIVE-7549
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Attachments: HIVE-7549.patch


 While working on Hive + Spark I noticed some ugly code which I've seen before 
 but neglected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7509:
-

Attachment: HIVE-7509.5.patch

Thanks [~leftylev] for your comments. I fixed them in .5 patch. 

 Fast stripe level merging for ORC
 -

 Key: HIVE-7509
 URL: https://issues.apache.org/jira/browse/HIVE-7509
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
 HIVE-7509.4.patch, HIVE-7509.5.patch


 Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
 through CONCATENATE command and conditional merge task. This fast merging is 
 ideal for merging many small ORC files to a larger file without decompressing 
 and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079594#comment-14079594
 ] 

Thejas M Nair commented on HIVE-6437:
-

Can you also please update the reviewboard with new patch ?


 DefaultHiveAuthorizationProvider should not initialize a new HiveConf
 -

 Key: HIVE-6437
 URL: https://issues.apache.org/jira/browse/HIVE-6437
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0
Reporter: Harsh J
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, 
 HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, 
 HIVE-6437.6.patch.txt


 During a HS2 connection, every SessionState got initializes a new 
 DefaultHiveAuthorizationProvider object (on stock configs).
 In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
 may prove too expensive, and unnecessary to do, since SessionState itself 
 sends in a fully applied HiveConf to it in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23953: HIVE-7519: Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown

2014-07-30 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23953/#review49132
---

Ship it!


Looks good to me, pending test fixes.

- Szehon Ho


On July 29, 2014, 11:46 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23953/
 ---
 
 (Updated July 29, 2014, 11:46 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7519
 https://issues.apache.org/jira/browse/HIVE-7519
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7519: Refactor QTestUtil to remove its duplication with QFileClient for 
 qtest setup and teardown
 
 
 Diffs
 -
 
   ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 
 33f227fe6eb0ea6df936775f02e4339ed496f6ad 
   data/conf/hive-site.xml fe8080addcadac4d52868866457dd038ea8d3d91 
   data/conf/tez/hive-site.xml 0c99bb6914bd26de26cef77cf29cf37f070098dc 
   data/scripts/q_test_cleanup.sql 31bd7205d85916ea352f715f2fd1462efc788208 
   data/scripts/q_test_init.sql 12afdf391132e3fdd219aaa581e1f2e210d6dee2 
   hbase-handler/src/test/templates/TestHBaseCliDriver.vm 
 01d596aa6591ddccff016436c7f31324b3896d00 
   hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm 
 45c73389cb26d0d461080cc146c5d74aee199c4e 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestLocationQueries.java
  9edd7f30ff91bf7e01a2f52699192994fe0829f5 
   itests/qtest/pom.xml 249956fc170c0cef2b8f98454fa952c498b9e29e 
   itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
 96a0de2829c2ec065b7835b12c4932d1278f9a84 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
 2fefa067791bd74412c0b4efb697dc0d8bb03cd7 
   ql/src/test/templates/TestCliDriver.vm 
 4776c75c16329c7d3f6f1a032eef192d553cc3cc 
   ql/src/test/templates/TestCompareCliDriver.vm 
 f6f43b847fdd4039328632ef70d841fce9006d6d 
   ql/src/test/templates/TestNegativeCliDriver.vm 
 991d5ac1b2fde66dbe60b39c853916577449b1a4 
   ql/src/test/templates/TestParse.vm c476536940dc3a48000bf4e60e0b551ec7904d63 
   ql/src/test/templates/TestParseNegative.vm 
 f62f17e4df5c1439d3787fc5c361804121bfcaf1 
 
 Diff: https://reviews.apache.org/r/23953/diff/
 
 
 Testing
 ---
 
 qTests.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown

2014-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079606#comment-14079606
 ] 

Szehon Ho commented on HIVE-7519:
-

+1, pending tests.  This is good code cleanup.

 Refactor QTestUtil to remove its duplication with QFileClient for qtest setup 
 and teardown 
 ---

 Key: HIVE-7519
 URL: https://issues.apache.org/jira/browse/HIVE-7519
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7519.1.patch, HIVE-7519.patch


 QTestUtil hard codes creation and dropping of source tables for qtests. 
 QFileClient does the same thing but in a better way, uses q_test_init.sql and 
 q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes 
 sense to refactor it to use QFileClient's approach. This will also remove 
 duplication of code addressing same purpose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7550) Extend cached evaluation to multiple expressions

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079706#comment-14079706
 ] 

Hive QA commented on HIVE-7550:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658580/HIVE-7550.1.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5838 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs
org.apache.hadoop.hive.cli.TestCompareCliDriver.testCompareCliDriver_vectorized_math_funcs
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/106/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/106/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-106/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658580

 Extend cached evaluation to multiple expressions
 

 Key: HIVE-7550
 URL: https://issues.apache.org/jira/browse/HIVE-7550
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7550.1.patch.txt


 Currently, hive.cache.expr.evaluation caches per expression. But cache 
 context might be shared for multiple expressions. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7555) inner join is being resolves as cartesian product

2014-07-30 Thread J. Tipan Verella (JIRA)
J. Tipan Verella created HIVE-7555:
--

 Summary: inner join is being resolves as cartesian product
 Key: HIVE-7555
 URL: https://issues.apache.org/jira/browse/HIVE-7555
 Project: Hive
  Issue Type: Bug
 Environment: CentOS
Reporter: J. Tipan Verella


I believe this is a bug, because I do not seem to be able to find a way around 
the following stackoverflow question, 

http://stackoverflow.com/questions/25020190/hive-query-returns-cartesian-product-instead-of-inner-join


The issue is as follows (repeated from SO for convenience).
This is type of query I am sending to HIVE:

SELECT BigTable.nicefield,LargeTable.* 
FROM LargeTable INNER JOIN BigTable 
ON (
LargeTable.joinfield1of4 = BigTable.joinfield1of4 
AND LargeTable.joinfield2of4 = BigTable.joinfield2of4 
)   
WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15 AND 
BigTable.joinfield3of4=20140726 AND BigTable.joinfield4of4=15
AND LargeTable.filterfiled1of2=123456
AND LargeTable.filterfiled2of2=98765
AND LargeTable.joinfield2of4=12 
AND LargeTable.joinfield1of4='iwanttolikehive'   

It returns `2418025` rows.  The issue is that 

SELECT *  
FROM LargeTable 
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND filterfiled1of2=123456 
AND filterfiled2of2=98765
AND joinfield2of4=12 
AND joinfield1of4='iwanttolikehive'

returns `1555` rows, and so does:

SELECT *  
FROM BigTable 
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND joinfield2of4=12 
AND joinfield1of4='iwanttolikehive'


Note that **1555^2 = 2418025**.

Feel free to discard this issue if it is not a bug, but please provide a 
solution on SO.

Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: hive udf cannot recognize generic method

2014-07-30 Thread Jason Dere
Sounds like you are using the older style UDF class.  In that case, yes you 
would have to override evaluate() for each type of input.
You could also try overriding the GenericUDF class - that would allow you to do 
a single method, though it may be a bit more complicated (can look at the Hive 
code for some examples)


On Jul 30, 2014, at 7:43 AM, Dan Fan d...@appnexus.com wrote:

 Hi there 
 
 I am writing a hive UDF function. The input could be string, int, double etc.
 The return is based on the data type. I was trying to use the generic method, 
 however, hive seems not recognize it. 
 Here is the piece of code I have as example.
 
   public T T evaluate(final T s, final String column_name, final int 
 bitmap) throws Exception {
 
  if (s instanceof Double)
 return (T) new Double(-1.0);
  Else if( s instance of Integer)
 Return (T) new Integer(-1) ;  
 …..
 }
 
 Does anyone know if hive supports the generic method ? Or I have to override 
 the evaluate method for each type of input. 
 
 Thanks 
 
 Dan
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-7549) Code cleanup of Task.java and HiveInputFormat.java

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7549:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for the review Ashutosh! I have committed this to trunk.

 Code cleanup of Task.java and HiveInputFormat.java
 --

 Key: HIVE-7549
 URL: https://issues.apache.org/jira/browse/HIVE-7549
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7549.patch


 While working on Hive + Spark I noticed some ugly code which I've seen before 
 but neglected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079776#comment-14079776
 ] 

Szehon Ho commented on HIVE-7390:
-

Thanks for the details, I was just reading the earlier comments and wrongly 
assumed that the two valid CSV options ones are double-quotes, and no quotes at 
all.  You're right that normal quote mode still means quotes sometimes, so my 
proposed naming didnt make sense, sorry about that Ferdinand.

So we should:
# Fix the current CSV to conform by using super-csv (like the patch I 
originally looked at in HIVE-7434).  No debate on that.
# See what CSV options (if any) we are going to expose

I'd still try to keep it simple if possible.  Can we expose quote mode only?  
(always, normal).  Im not sure if delimiter, quote character would add that 
much value, but I'm not heavy CSV user.  Thoughts?

 Make quote character optional and configurable in BeeLine CSV/TSV output
 

 Key: HIVE-7390
 URL: https://issues.apache.org/jira/browse/HIVE-7390
 Project: Hive
  Issue Type: New Feature
  Components: Clients
Affects Versions: 0.13.1
Reporter: Jim Halfpenny
Assignee: Ferdinand Xu
 Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
 HIVE-7390.4.patch, HIVE-7390.patch


 Currently when either the CSV or TSV output formats are used in beeline each 
 column is wrapped in single quotes. Quote wrapping of columns should be 
 optional and the user should be able to choose the character used to wrap the 
 columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook

2014-07-30 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24084/#review49141
---



itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
https://reviews.apache.org/r/24084/#comment85986

if the hook does not run these two NPE. Let's have an assertion first for 
not null



ql/src/java/org/apache/hadoop/hive/ql/Driver.java
https://reviews.apache.org/r/24084/#comment85987

Let's put this in javadoc format



service/src/java/org/apache/hive/service/cli/CLIService.java
https://reviews.apache.org/r/24084/#comment85988

If this should not happen, should we throw these?


- Brock Noland


On July 30, 2014, 2:13 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24084/
 ---
 
 (Updated July 30, 2014, 2:13 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7547
 https://issues.apache.org/jira/browse/HIVE-7547
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Passing the ipAddress and userName (already calculated in ThriftCLIService 
 for other purposes) through several layers down to the hooks.
 
 
 Diffs
 -
 
   
 itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
  PRE-CREATION 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 
   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
 de54ca1 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 9785e95 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 5c87bcb 
 
 Diff: https://reviews.apache.org/r/24084/diff/
 
 
 Testing
 ---
 
 Added tests in both kerberos and non-kerberos mode.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Why does SMB join generate hash table locally, even if input tables are large?

2014-07-30 Thread Pala M Muthaia
+hive-users


On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia mchett...@rocketfuelinc.com
 wrote:

 Hi,

 I am testing SMB join for 2 large tables. The tables are bucketed and
 sorted on the join column. I notice that even though the table is large,
 Hive attempts to generate hash table for the 'small' table locally,
  similar to map join. Since the table is large in my case, the client runs
 out of memory and the query fails.

 I am using Hive 0.12 with the following settings:

 set hive.optimize.bucketmapjoin=true;
 set hive.optimize.bucketmapjoin.sortedmerge=true;
 set hive.input.format =
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

 My test query does a simple join and a select, no subqueries/nested
 queries etc.

 I understand why a (bucket) map join requires hash table generation, but
 why is that included for an SMB join? Shouldn't a SMB join just spin up one
 mapper for each bucket and perform a sort merge join directly on the mapper?


 Thanks,
 pala






[jira] [Created] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-7556:
-

 Summary: Fix code style, license header, tabs, etc. [Spark Branch]
 Key: HIVE-7556
 URL: https://issues.apache.org/jira/browse/HIVE-7556
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7556.patch





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7556:
--

Attachment: HIVE-7556.patch

 Fix code style, license header, tabs, etc. [Spark Branch]
 -

 Key: HIVE-7556
 URL: https://issues.apache.org/jira/browse/HIVE-7556
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7556.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7556:
--

Status: Patch Available  (was: Open)

 Fix code style, license header, tabs, etc. [Spark Branch]
 -

 Key: HIVE-7556
 URL: https://issues.apache.org/jira/browse/HIVE-7556
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7556.patch


 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7556:
--

Description: NO PRECOMMIT TESTS. This is for spark branch only.

 Fix code style, license header, tabs, etc. [Spark Branch]
 -

 Key: HIVE-7556
 URL: https://issues.apache.org/jira/browse/HIVE-7556
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7556.patch


 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7556:
--

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to spark branch.

 Fix code style, license header, tabs, etc. [Spark Branch]
 -

 Key: HIVE-7556
 URL: https://issues.apache.org/jira/browse/HIVE-7556
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: spark-branch

 Attachments: HIVE-7556.patch


 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079821#comment-14079821
 ] 

Matt McCline commented on HIVE-7029:


Temporarily off (Tez) dynpart_sort_opt_vectorization.q test.  Created 
https://issues.apache.org/jira/browse/HIVE-7557 to cover that issue.

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7557) When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails

2014-07-30 Thread Matt McCline (JIRA)
Matt McCline created HIVE-7557:
--

 Summary: When reduce is vectorized, 
dynpart_sort_opt_vectorization.q under Tez fails
 Key: HIVE-7557
 URL: https://issues.apache.org/jira/browse/HIVE-7557
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Rajesh Balamohan



Turned off dynpart_sort_opt_vectorization.q (Tez) since it fails when reduce is 
vectorized to get HIVE-7029 checked in.

Stack trace:
{code}
Container released by application, 
AttemptID:attempt_1406747677386_0003_2_00_00_2 Info:Error: 
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) [Error getting row data with exception 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
at 
org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
 ]
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
at 
org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) [Error getting row data with exception 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
at 
org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
 ]
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:382)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
... 6 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing vector batch (tag=0) [Error getting row data with exception 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: In Progress  (was: Patch Available)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, 
 HIVE-7029.8.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Attachment: HIVE-7029.8.patch

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, 
 HIVE-7029.8.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: Patch Available  (was: In Progress)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, 
 HIVE-7029.8.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-7526:
---

Attachment: HIVE-7526.3.patch

An attempt to fix the last patch by moving groupBy op to ShuffleTran.
Also, since now SparkTran::transform may have input/output value types other 
than BytesWritable, we need to make it generic as well..

Also added a CompTran class, which is basically a composition of 
transformations. It offers better type compatibility than ChainedTran.

This is NOT the perfect solution, and may subject to further change.

 Research to use groupby transformation to replace Hive existing 
 partitionByKey and SparkCollector combination
 -

 Key: HIVE-7526
 URL: https://issues.apache.org/jira/browse/HIVE-7526
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, HIVE-7526.patch


 Currently SparkClient shuffles data by calling paritionByKey(). This 
 transformation outputs key, value tuples. However, Hive's ExecMapper 
 expects key, iteratorvalue tuples, and Spark's groupByKey() seems 
 outputing this directly. Thus, using groupByKey, we may be able to avoid its 
 own key clustering mechanism (in HiveReduceFunction). This research is to 
 have a try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7348) Beeline could not parse ; separated queries provided with -e option

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079827#comment-14079827
 ] 

Hive QA commented on HIVE-7348:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658582/HIVE-7348.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineHiveConfVariable
org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineHiveVariable
org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineMultiHiveVariable
org.apache.hive.beeline.TestBeeLineWithArgs.testNullDefault
org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmpty
org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmptyCmdArg
org.apache.hive.beeline.TestBeeLineWithArgs.testNullNonEmpty
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgradeDryRun
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/107/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/107/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658582

 Beeline could not parse ; separated queries provided with -e option
 ---

 Key: HIVE-7348
 URL: https://issues.apache.org/jira/browse/HIVE-7348
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7348.patch


 Beeline could not parse ; separated queries provided with -e option. This 
 works fine on hive cli.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7558) HCatLoader reuses credentials across jobs

2014-07-30 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HIVE-7558:
--

 Summary: HCatLoader reuses credentials across jobs
 Key: HIVE-7558
 URL: https://issues.apache.org/jira/browse/HIVE-7558
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Thiruvel Thirumoolan
 Fix For: 0.14.0


HCatLoader reuses credentials of stage1 in stage2 for some of the pig queries. 
This causes stage-2 to fail, if stage-2 runs for more than 10 mins. Pig queries 
which loads data using HCatLoader, filters only by partition columns and does 
an order by will run into this problem. Exceptions will be very similar to the 
following:

2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
ERROR 2997: Unable to recreate exception from backed error: 
AttemptID:attemptid Info:RemoteTrace: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1095)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
at $Proxy7.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
at $Proxy7.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
 at LocalTrace: 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache
at 
org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
at 
org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1476)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1472)
at java.security.AccessController.doPrivileged(Native Method)
at 

[jira] [Assigned] (HIVE-7558) HCatLoader reuses credentials across jobs

2014-07-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan reassigned HIVE-7558:
--

Assignee: Thiruvel Thirumoolan

 HCatLoader reuses credentials across jobs
 -

 Key: HIVE-7558
 URL: https://issues.apache.org/jira/browse/HIVE-7558
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.14.0


 HCatLoader reuses credentials of stage1 in stage2 for some of the pig 
 queries. This causes stage-2 to fail, if stage-2 runs for more than 10 mins. 
 Pig queries which loads data using HCatLoader, filters only by partition 
 columns and does an order by will run into this problem. Exceptions will be 
 very similar to the following:
 2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
 ERROR 2997: Unable to recreate exception from backed error: 
 AttemptID:attemptid Info:RemoteTrace: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
 (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache
   at org.apache.hadoop.ipc.Client.call(Client.java:1095)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
   at $Proxy7.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
   at $Proxy7.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734)
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
  at LocalTrace: 
   org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in 
 cache
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224)
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
   at 
 org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
   at 

[jira] [Updated] (HIVE-7558) HCatLoader reuses credentials across jobs

2014-07-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-7558:
---

Attachment: HIVE-7558.patch

Attaching patch. Do not copy job's credentials in HCatLoader's objects.

 HCatLoader reuses credentials across jobs
 -

 Key: HIVE-7558
 URL: https://issues.apache.org/jira/browse/HIVE-7558
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.14.0

 Attachments: HIVE-7558.patch


 HCatLoader reuses credentials of stage1 in stage2 for some of the pig 
 queries. This causes stage-2 to fail, if stage-2 runs for more than 10 mins. 
 Pig queries which loads data using HCatLoader, filters only by partition 
 columns and does an order by will run into this problem. Exceptions will be 
 very similar to the following:
 2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
 ERROR 2997: Unable to recreate exception from backed error: 
 AttemptID:attemptid Info:RemoteTrace: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
 (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache
   at org.apache.hadoop.ipc.Client.call(Client.java:1095)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
   at $Proxy7.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
   at $Proxy7.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734)
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
  at LocalTrace: 
   org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in 
 cache
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224)
   at 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
   at 
 org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
   at 
 

[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079912#comment-14079912
 ] 

Thejas M Nair commented on HIVE-7547:
-

[~szehon] SessionState already provides username and IP address. (IP address 
part was added recently as part of HIVE-7416).
I think SessionState is a good place to store and retrieve this session 
information.


 Add ipAddress and userName to ExecHook
 --

 Key: HIVE-7547
 URL: https://issues.apache.org/jira/browse/HIVE-7547
 Project: Hive
  Issue Type: New Feature
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7547.2.patch, HIVE-7547.patch


 Auditing tools should be able to know about the ipAddress and userName of the 
 user executing operations.  
 These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7443) Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079933#comment-14079933
 ] 

Hive QA commented on HIVE-7443:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658595/HIVE-7443.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5838 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/108/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/108/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-108/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658595

 Fix HiveConnection to communicate with Kerberized Hive JDBC server and 
 alternative JDKs
 ---

 Key: HIVE-7443
 URL: https://issues.apache.org/jira/browse/HIVE-7443
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Security
Affects Versions: 0.12.0, 0.13.1
 Environment: Kerberos
 Run Hive server2 and client with IBM JDK7.1
Reporter: Yu Gao
Assignee: Yu Gao
 Attachments: HIVE-7443.patch


 Hive Kerberos authentication has been enabled in my cluster. I ran kinit to 
 initialize the current login user's ticket cache successfully, and then tried 
 to use beeline to connect to Hive Server2, but failed. After I manually added 
 some logging to catch the failure exception, this is what I got that caused 
 the failure:
 beeline  !connect 
 jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM
  org.apache.hive.jdbc.HiveDriver
 scan complete in 2ms
 Connecting to 
 jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM
 Enter password for 
 jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM:
 14/07/17 15:12:45 ERROR jdbc.HiveConnection: Failed to open client transport
 javax.security.sasl.SaslException: Failed to open client transport [Caused by 
 java.io.IOException: Could not instantiate SASL transport]
 at 
 org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:78)
 at 
 org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:342)
 at 
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200)
 at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:178)
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
 at java.sql.DriverManager.getConnection(DriverManager.java:582)
 at java.sql.DriverManager.getConnection(DriverManager.java:198)
 at 
 org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
 at 
 org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186)
 at org.apache.hive.beeline.Commands.connect(Commands.java:959)
 at org.apache.hive.beeline.Commands.connect(Commands.java:880)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
 at java.lang.reflect.Method.invoke(Method.java:619)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:44)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:801)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
 at java.lang.reflect.Method.invoke(Method.java:619)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.io.IOException: Could not instantiate SASL transport
 at 
 

[jira] [Commented] (HIVE-6988) Hive changes for tez-0.5.x compatibility

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079937#comment-14079937
 ] 

Hive QA commented on HIVE-6988:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658598/HIVE-6988.6.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/109/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/109/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-109/

Messages:
{noformat}
 This message was trimmed, see log for full details 
As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE KW_ARRAY using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_NOT SmallintLiteral using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_NOT TinyintLiteral using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN LPAREN Number using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:115:5: 
Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:127:5: 
Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:138:5: 
Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:149:5: 
Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:166:7: 
Decision can match input such as STAR using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 
6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_DATE StringLiteral using multiple 
alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_NULL using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_FALSE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_TRUE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_INTO using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL 
KW_VIEW using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as KW_BETWEEN KW_MAP LPAREN using multiple 
alternatives: 8, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): 

[jira] [Commented] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079952#comment-14079952
 ] 

Brock Noland commented on HIVE-7526:


Thank you [~csun]! May I ask you to upload patch to https://reviews.apache.org 
and post link here?

 Research to use groupby transformation to replace Hive existing 
 partitionByKey and SparkCollector combination
 -

 Key: HIVE-7526
 URL: https://issues.apache.org/jira/browse/HIVE-7526
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
 Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, HIVE-7526.patch


 Currently SparkClient shuffles data by calling paritionByKey(). This 
 transformation outputs key, value tuples. However, Hive's ExecMapper 
 expects key, iteratorvalue tuples, and Spark's groupByKey() seems 
 outputing this directly. Thus, using groupByKey, we may be able to avoid its 
 own key clustering mechanism (in HiveReduceFunction). This research is to 
 have a try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7545) Tableau connecting with MapR ODBC driver cannot get more than 43 columns

2014-07-30 Thread George Chow (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079955#comment-14079955
 ] 

George Chow commented on HIVE-7545:
---

Is it possible to include the query so that it can be repro with a similar 
table? The error message looks to originate from Hive (SQLOperations::prepare).




 Tableau connecting with MapR ODBC driver cannot get more than 43 columns
 

 Key: HIVE-7545
 URL: https://issues.apache.org/jira/browse/HIVE-7545
 Project: Hive
  Issue Type: Bug
 Environment: Tableau connecting using MapR ODBC driver - Windows
Reporter: Venkata krishnan Sowrirajan
 Fix For: 0.13.1


 Hive table with 170 columns and 1million rows.
 When I queried like 170 columns in a hive table with tableau using MapR ODBC 
 driver. It cannot query more than 43 columns. After that its giving out an 
 error saying 
 [MapR][HiveODBC] (35) Error from Hive: error code: '10007' error message: 
 'Error while compiling statement: FAILED: SemanticException [Error 10007]: 
 Ambiguous column reference c_43'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7559) Move configuration from SparkClient to HiveConf

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7559:
--

 Summary: Move configuration from SparkClient to HiveConf
 Key: HIVE-7559
 URL: https://issues.apache.org/jira/browse/HIVE-7559
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor


The SparkClient class has some configuration keys and defaults. These should be 
moved to HiveConf.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7560) Fix exception handling in POC code

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7560:
--

 Summary: Fix exception handling in POC code
 Key: HIVE-7560
 URL: https://issues.apache.org/jira/browse/HIVE-7560
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland


The POC code just printed exceptions to stderr. We should either:

1) LOG at INFO/WARN/ERROR
2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7561) Move from assert to Guava Preconditions.* in Hive on Spark

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7561:
--

 Summary: Move from assert to Guava Preconditions.* in Hive on Spark
 Key: HIVE-7561
 URL: https://issues.apache.org/jira/browse/HIVE-7561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland


Hive uses the assert keyword all over the place. The problem is that 
assertions are rarely enabled since they have to be specifically enabled. In 
the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7561) Move from assert to Guava Preconditions.* in Hive on Spark

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7561:
---

Labels: StarterProject  (was: newbie)

 Move from assert to Guava Preconditions.* in Hive on Spark
 --

 Key: HIVE-7561
 URL: https://issues.apache.org/jira/browse/HIVE-7561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
  Labels: StarterProject

 Hive uses the assert keyword all over the place. The problem is that 
 assertions are rarely enabled since they have to be specifically enabled. In 
 the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7560) Fix exception handling in POC code

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7560:
---

Labels: StarterProject  (was: newbie)

 Fix exception handling in POC code
 --

 Key: HIVE-7560
 URL: https://issues.apache.org/jira/browse/HIVE-7560
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
  Labels: StarterProject

 The POC code just printed exceptions to stderr. We should either:
 1) LOG at INFO/WARN/ERROR
 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7559) Move configuration from SparkClient to HiveConf

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7559:
---

Labels: StarterProject  (was: newbie)

 Move configuration from SparkClient to HiveConf
 ---

 Key: HIVE-7559
 URL: https://issues.apache.org/jira/browse/HIVE-7559
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor
  Labels: StarterProject

 The SparkClient class has some configuration keys and defaults. These should 
 be moved to HiveConf.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7560) StarterProject: Fix exception handling in POC code

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7560:
---

Summary: StarterProject: Fix exception handling in POC code  (was: Fix 
exception handling in POC code)

 StarterProject: Fix exception handling in POC code
 --

 Key: HIVE-7560
 URL: https://issues.apache.org/jira/browse/HIVE-7560
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
  Labels: StarterProject

 The POC code just printed exceptions to stderr. We should either:
 1) LOG at INFO/WARN/ERROR
 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7561) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7561:
---

Summary: StarterProject: Move from assert to Guava Preconditions.* in Hive 
on Spark  (was: Move from assert to Guava Preconditions.* in Hive on Spark)

 StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark
 --

 Key: HIVE-7561
 URL: https://issues.apache.org/jira/browse/HIVE-7561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
  Labels: StarterProject

 Hive uses the assert keyword all over the place. The problem is that 
 assertions are rarely enabled since they have to be specifically enabled. In 
 the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7559) StarterProject: Move configuration from SparkClient to HiveConf

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7559:
---

Summary: StarterProject: Move configuration from SparkClient to HiveConf  
(was: Move configuration from SparkClient to HiveConf)

 StarterProject: Move configuration from SparkClient to HiveConf
 ---

 Key: HIVE-7559
 URL: https://issues.apache.org/jira/browse/HIVE-7559
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor
  Labels: StarterProject

 The SparkClient class has some configuration keys and defaults. These should 
 be moved to HiveConf.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7503) Support Hive's multi-table insert query with Spark

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-7503:
-

Assignee: Xuefu Zhang

 Support Hive's multi-table insert query with Spark
 --

 Key: HIVE-7503
 URL: https://issues.apache.org/jira/browse/HIVE-7503
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert.  When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7503) Support Hive's multi-table insert query with Spark

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080076#comment-14080076
 ] 

Xuefu Zhang commented on HIVE-7503:
---

Assigned to myself for initial research.

 Support Hive's multi-table insert query with Spark
 --

 Key: HIVE-7503
 URL: https://issues.apache.org/jira/browse/HIVE-7503
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert.  When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-07-30 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reopened HIVE-7506:
--


 MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
 table (or a partition of a table)
 --

 Key: HIVE-7506
 URL: https://issues.apache.org/jira/browse/HIVE-7506
 Project: Hive
  Issue Type: New Feature
  Components: Database/Schema
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Critical
   Original Estimate: 252h
  Remaining Estimate: 252h

 Two motivations:
 (1) CBO depends heavily on the statistics of a column in a table (or a 
 partition of a table). If we would like to test whether CBO chooses the best 
 plan under different statistics, it would be time consuming if we load the 
 whole table and create the statistics from ground up.
 (2) As database runs,  the statistics of a column in a table (or a partition 
 of a table) may change. We need a way or a mechanism to synchronize. 
 We propose the following command to achieve that:
 ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
 STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7488) pass column names being used for inputs to authorization api

2014-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080118#comment-14080118
 ] 

Jason Dere commented on HIVE-7488:
--

+1. Test failures not related?

 pass column names being used for inputs to authorization api
 

 Key: HIVE-7488
 URL: https://issues.apache.org/jira/browse/HIVE-7488
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7488.1.patch, HIVE-7488.2.patch, 
 HIVE-7488.3.patch.txt, HIVE-7488.4.patch, HIVE-7488.5.patch, HIVE-7488.6.patch


 HivePrivilegeObject in the authorization api has support for columns, but the 
 columns being used are not being populated for non grant-revoke queries.
 This is for enabling any implementation of the api to use this column 
 information for its authorization decisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-07-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080125#comment-14080125
 ] 

Gunther Hagleitner commented on HIVE-7506:
--

[~damien.carol] I think the use for this is different that analyze. The ability 
to update certain stats without scanning any data or without hacking the 
backend db is useful in a number of cases. It helps (esp for CBO work) to set 
up unit tests quickly and verify both cbo and the stats subsystem. It also 
helps when experimenting with the system if you're just trying out hive/hadoop 
on a small cluster. Finally it gives you a quick and clean way to fix things 
when something went wrong wrt stats in your environment.

 MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
 table (or a partition of a table)
 --

 Key: HIVE-7506
 URL: https://issues.apache.org/jira/browse/HIVE-7506
 Project: Hive
  Issue Type: New Feature
  Components: Database/Schema
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
   Original Estimate: 252h
  Remaining Estimate: 252h

 Two motivations:
 (1) CBO depends heavily on the statistics of a column in a table (or a 
 partition of a table). If we would like to test whether CBO chooses the best 
 plan under different statistics, it would be time consuming if we load the 
 whole table and create the statistics from ground up.
 (2) As database runs,  the statistics of a column in a table (or a partition 
 of a table) may change. We need a way or a mechanism to synchronize. 
 We propose the following command to achieve that:
 ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
 STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080128#comment-14080128
 ] 

Lars Francke commented on HIVE-7390:


You summed it up nicely, thanks.

The original intention of this issue was to make the quote character optional 
and configurable so Jim must have had a use-case for that. I can't think of a 
good one atm.

I can however think of a good reason for a configurable delimiter. Comma, 
semicolon or tab occur relatively frequently in data but some other character 
(\001 or |) might not occur in the data and being able to pick this as the 
delimiter allows to make parsing way simpler (just split on delimiter instead 
of looking for quoted strings etc.). This is especially interesting when you 
then want to mount another table on that data in Hive or post-process in any 
other simple way where you don't have access to a full fledged CSV parsing 
library.

So: Picking the delimiter is often very helpful in avoiding a whole class of 
parsing issues and allows to just split on the delimiter.

I think that we can easily catch most common issues with two changes:

1. Fix current CSV and TSV. As you say: No debate on that
2. Allow delimiter to be specified and keep normal quoting mode

That allows everyone who really understands his data to avoid quoting and 
everyone else can get properly formatted CSVs for a full CSV parser. In the 
same vein I think that {{surroundingSpacesNeedQuotes}} should stay disabled.

But as I said: This is kinda hijacking Jim's original issue...

 Make quote character optional and configurable in BeeLine CSV/TSV output
 

 Key: HIVE-7390
 URL: https://issues.apache.org/jira/browse/HIVE-7390
 Project: Hive
  Issue Type: New Feature
  Components: Clients
Affects Versions: 0.13.1
Reporter: Jim Halfpenny
Assignee: Ferdinand Xu
 Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
 HIVE-7390.4.patch, HIVE-7390.patch


 Currently when either the CSV or TSV output formats are used in beeline each 
 column is wrapped in single quotes. Quote wrapping of columns should be 
 optional and the user should be able to choose the character used to wrap the 
 columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Affects Version/s: tez-branch

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

This patch works with tez-0.5 only. Since only the tez branch has been upgraded 
to that version, this is only applicable to that hive branch.

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Component/s: Tez

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080178#comment-14080178
 ] 

Hive QA commented on HIVE-7509:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658680/HIVE-7509.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5842 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-110/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658680

 Fast stripe level merging for ORC
 -

 Key: HIVE-7509
 URL: https://issues.apache.org/jira/browse/HIVE-7509
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
 HIVE-7509.4.patch, HIVE-7509.5.patch


 Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
 through CONCATENATE command and conditional merge task. This fast merging is 
 ideal for merging many small ORC files to a larger file without decompressing 
 and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

 Support grouped splits in Tez partitioned broadcast join
 

 Key: HIVE-7096
 URL: https://issues.apache.org/jira/browse/HIVE-7096
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
 HIVE-7096.4.patch, HIVE-7096.tez.branch.patch


 Same checks for schema + deser + file format done in HiveSplitGenerator need 
 to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080204#comment-14080204
 ] 

Lefty Leverenz commented on HIVE-7509:
--

Good doc fixes, thanks [~prasanth_j].

+1 for docs only.

 Fast stripe level merging for ORC
 -

 Key: HIVE-7509
 URL: https://issues.apache.org/jira/browse/HIVE-7509
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
 HIVE-7509.4.patch, HIVE-7509.5.patch


 Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
 through CONCATENATE command and conditional merge task. This fast merging is 
 ideal for merging many small ORC files to a larger file without decompressing 
 and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24084/
---

(Updated July 30, 2014, 11:40 p.m.)


Review request for hive.


Changes
---

Incorporating Brock and Thejas review comments.  As Thejas pointed out, turns 
out ipAddress is already stored in sessionState, so using that and code becomes 
a lot cleaner.  

However, the ipAddress calculated in TSetIpAddressProcessor doesnt work in 
kerberos mode, so fixing it so its set in all modes.


Bugs: HIVE-7547
https://issues.apache.org/jira/browse/HIVE-7547


Repository: hive-git


Description
---

Passing the ipAddress and userName (already calculated in ThriftCLIService for 
other purposes) through several layers down to the hooks.


Diffs (updated)
-

  
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
 PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 

Diff: https://reviews.apache.org/r/24084/diff/


Testing
---

Added tests in both kerberos and non-kerberos mode.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7547:


Attachment: HIVE-7547.3.patch

Thanks Thejas for pointing that out.  I refactored the code to use SessionState.

The SessionState's ipAddress didnt seem to be set for Kerberos mode, so I'm 
also changing how its being set to work for all modes.  Let me know if its not 
right.

 Add ipAddress and userName to ExecHook
 --

 Key: HIVE-7547
 URL: https://issues.apache.org/jira/browse/HIVE-7547
 Project: Hive
  Issue Type: New Feature
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.patch


 Auditing tools should be able to know about the ipAddress and userName of the 
 user executing operations.  
 These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24084/
---

(Updated July 30, 2014, 11:46 p.m.)


Review request for hive.


Bugs: HIVE-7547
https://issues.apache.org/jira/browse/HIVE-7547


Repository: hive-git


Description
---

Passing the ipAddress and userName (already calculated in ThriftCLIService for 
other purposes) through several layers down to the hooks.


Diffs (updated)
-

  
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
 PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 

Diff: https://reviews.apache.org/r/24084/diff/


Testing
---

Added tests in both kerberos and non-kerberos mode.


Thanks,

Szehon Ho



[jira] [Created] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7562:
--

 Summary: Cleanup ExecReducer
 Key: HIVE-7562
 URL: https://issues.apache.org/jira/browse/HIVE-7562
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
 Attachments: HIVE-7562.patch

ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7547:


Attachment: HIVE-7547.4.patch

 Add ipAddress and userName to ExecHook
 --

 Key: HIVE-7547
 URL: https://issues.apache.org/jira/browse/HIVE-7547
 Project: Hive
  Issue Type: New Feature
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.4.patch, 
 HIVE-7547.patch


 Auditing tools should be able to know about the ipAddress and userName of the 
 user executing operations.  
 These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7562:
---

Attachment: HIVE-7562.patch

 Cleanup ExecReducer
 ---

 Key: HIVE-7562
 URL: https://issues.apache.org/jira/browse/HIVE-7562
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
 Attachments: HIVE-7562.patch


 ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >