[jira] [Created] (HIVE-4966) Introduce Collect_Map UDAF

2013-07-31 Thread Harish Butani (JIRA)
Harish Butani created HIVE-4966:
---

 Summary: Introduce Collect_Map UDAF
 Key: HIVE-4966
 URL: https://issues.apache.org/jira/browse/HIVE-4966
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani


Similar to Collect_Set. For e.g. on a Txn table
{noformat}
Txn(customer, product, amt)

select customer, collect_map(product, amt)
from txn
group by customer
{noformat}

Would give you an activity map for each customer.

Other thoughts:
- have explode do the inverse on maps just as it does for sets today.
- introduce a table function that outputs each value as a column. So in the 
e.g. above you get an activity matrix instead of a map. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724921#comment-13724921
 ] 

Hive QA commented on HIVE-2564:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595112/HIVE-2564.2.patch

{color:green}SUCCESS:{color} +1 2750 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/257/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/257/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
  Labels: patch
 Attachments: HIVE-2564.1.patch, HIVE-2564.2.patch, hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck

2013-07-31 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724939#comment-13724939
 ] 

Chris Drome commented on HIVE-4574:
---

Thanks for your thoughts. We are concerned with 0.10 at this point, but your 
point is taken. Looking forward to HIVE-1511!

 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
 --

 Key: HIVE-4574
 URL: https://issues.apache.org/jira/browse/HIVE-4574
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4574.1.patch


 In open jdk7, XMLEncoder.writeObject call leads to calls to 
 java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe 
 because it uses a static WeakHashMap that would get used from multiple 
 threads. See -
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46
 Concurrent access to HashMap implementation that are not thread safe can 
 sometimes result in infinite-loops and other problems. If jdk7 is in use, it 
 makes sense to synchronize calls to XMLEncoder.writeObject .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-31 Thread Alexey Zotov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724946#comment-13724946
 ] 

Alexey Zotov commented on HIVE-3442:


Yep, I can add this info to 
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe page. But proposed 
approach have a defect: if DataNode (some_datanode_address:50075) is down you 
won't have an ability to query data from Hive. I'm working on improvement of 
this approach.


 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: 

[jira] [Commented] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725042#comment-13725042
 ] 

Hudson commented on HIVE-4920:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #38 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/38/])
HIVE-4920 PTest2 handle Spot Price increases gracefully and improve rsync 
paralllelsim (Brock Noland via egc)

Submitted by:   Brock Noland
Reviewed by:Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508707)
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/client/PTestClient.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/request/TestStartRequest.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/ExecutionController.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/TestExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Constants.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Drone.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ExecutionPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutorBuilder.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JIRAService.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JUnitReportParser.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/LogDirectoryCleaner.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Phase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PrepPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ReportingPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/ExecutionContextConfiguration.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/QFileTestBatch.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudComputeService.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/AbstractSSHCommand.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/RSyncCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/SSHCommandExecutor.java
* /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm
* /hive/trunk/testutils/ptest2/src/main/resources/log4j.properties
* /hive/trunk/testutils/ptest2/src/main/resources/source-prep.vm
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/api/server/TestTestExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/AbstractTestPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockLocalCommandFactory.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockRSyncCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockSSHCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.testExecute.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingQFile.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingUnitTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingQFileTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingUnitTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestHostExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestPhase.java
* 

[jira] [Commented] (HIVE-4962) fix eclipse template broken by HIVE-3256

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725075#comment-13725075
 ] 

Hudson commented on HIVE-4962:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #110 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/110/])
Hive-4962 Fix eclipse templates broken from ASM changes (Yin Huai via egc)

Submitted by: Yin Huai  
Reviewed by: Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508706)
* /hive/trunk/eclipse-templates/.classpath
* /hive/trunk/eclipse-templates/.classpath._hbase


 fix eclipse template broken by HIVE-3256
 

 Key: HIVE-4962
 URL: https://issues.apache.org/jira/browse/HIVE-4962
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-4962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF

2013-07-31 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725298#comment-13725298
 ] 

Carter Shanklin commented on HIVE-4966:
---

Hi Harish,

I recently found need for a collect_array UDF that would maintain ordering and 
duplicates. I actually just changed a few things out of collect_set. Do you 
think that a collect_array would be generally useful? If so, would it make 
sense to combine these into one UDAF to minimize code duplication?

 Introduce Collect_Map UDAF
 --

 Key: HIVE-4966
 URL: https://issues.apache.org/jira/browse/HIVE-4966
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani

 Similar to Collect_Set. For e.g. on a Txn table
 {noformat}
 Txn(customer, product, amt)
 select customer, collect_map(product, amt)
 from txn
 group by customer
 {noformat}
 Would give you an activity map for each customer.
 Other thoughts:
 - have explode do the inverse on maps just as it does for sets today.
 - introduce a table function that outputs each value as a column. So in the 
 e.g. above you get an activity matrix instead of a map. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: TestCliDriver Failed Test

2013-07-31 Thread Brock Noland
Try doing a very-clean. I think you just have an old version of DN in your
ivy cache and based on my experience with ivy it's cannot handle that.


On Wed, Jul 31, 2013 at 9:37 AM, nikolaus.st...@researchgate.net wrote:

 Hi,

 When running the following command:

 ant test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true

 on a clean hive-trunk checkout, I get the following failed test:

 test:
  [echo] Project: ql
 [junit] WARNING: multiple versions of ant detected in path for junit
 [junit]  jar:file:/usr/share/ant/lib/**
 ant.jar!/org/apache/tools/ant/**Project.class
 [junit]  and jar:file:/Users/niko/Repos/**
 hive-trunk/build/ivy/lib/**hadoop0.20S.shim/ant-1.6.5.**
 jar!/org/apache/tools/ant/**Project.class
 [junit] Hive history file=/Users/niko/Repos/hive-**
 trunk/build/ql/tmp/hive_job_**log_604cbdc7-f546-4a74-bba2-**
 43f7c2885811_1343059998.txt
 [junit] 2013-07-31 07:19:49.366 java[15847:1203] Unable to load realm
 info from SCDynamicStore
 [junit] Exception: java.lang.RuntimeException: Unable to instantiate
 org.apache.hadoop.hive.**metastore.HiveMetaStoreClient
 [junit] Running org.apache.hadoop.hive.cli.**TestCliDriver
 [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
 [junit] org.apache.hadoop.hive.ql.**metadata.HiveException:
 java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.*
 *metastore.HiveMetaStoreClient
 [junit] at org.apache.hadoop.hive.ql.**
 metadata.Hive.dropTable(Hive.**java:875)
 [junit] at org.apache.hadoop.hive.ql.**
 metadata.Hive.dropTable(Hive.**java:851)
 [junit] at org.apache.hadoop.hive.ql.**
 QTestUtil.cleanUp(QTestUtil.**java:513)
 [junit] at org.apache.hadoop.hive.cli.**TestCliDriver.clinit(**
 TestCliDriver.java:48)
 [junit] at java.lang.Class.forName0(**Native Method)
 [junit] at java.lang.Class.forName(Class.**java:171)
 [junit] at org.apache.tools.ant.taskdefs.**optional.junit.**
 JUnitTestRunner.run(**JUnitTestRunner.java:373)
 [junit] at org.apache.tools.ant.taskdefs.**optional.junit.**
 JUnitTestRunner.launch(**JUnitTestRunner.java:1052)
 [junit] at org.apache.tools.ant.taskdefs.**optional.junit.**
 JUnitTestRunner.main(**JUnitTestRunner.java:906)
 [junit] Caused by: java.lang.RuntimeException: Unable to instantiate
 org.apache.hadoop.hive.**metastore.HiveMetaStoreClient
 [junit] at org.apache.hadoop.hive.**metastore.MetaStoreUtils.**
 newInstance(MetaStoreUtils.**java:1212)
 [junit] at org.apache.hadoop.hive.**metastore.**
 RetryingMetaStoreClient.init**(RetryingMetaStoreClient.java:**51)
 [junit] at org.apache.hadoop.hive.**metastore.**
 RetryingMetaStoreClient.**getProxy(**RetryingMetaStoreClient.java:**61)
 [junit] at org.apache.hadoop.hive.ql.**metadata.Hive.**
 createMetaStoreClient(Hive.**java:2357)
 [junit] at org.apache.hadoop.hive.ql.**metadata.Hive.getMSC(Hive.*
 *java:2368)
 [junit] at org.apache.hadoop.hive.ql.**
 metadata.Hive.dropTable(Hive.**java:869)
 [junit] ... 8 more
 [junit] Caused by: java.lang.reflect.**InvocationTargetException
 [junit] at 
 sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
 Method)
 [junit] at sun.reflect.**NativeConstructorAccessorImpl.**
 newInstance(**NativeConstructorAccessorImpl.**java:39)
 [junit] at sun.reflect.**DelegatingConstructorAccessorI**
 mpl.newInstance(**DelegatingConstructorAccessorI**mpl.java:27)
 [junit] at java.lang.reflect.Constructor.**
 newInstance(Constructor.java:**513)
 [junit] at org.apache.hadoop.hive.**metastore.MetaStoreUtils.**
 newInstance(MetaStoreUtils.**java:1210)
 [junit] ... 13 more
 [junit] Caused by: javax.jdo.**JDOFatalInternalException: Unexpected
 exception caught.
 [junit] NestedThrowables:
 [junit] java.lang.reflect.**InvocationTargetException
 [junit] at javax.jdo.JDOHelper.**invokeGetPersistenceManagerFac**
 toryOnImplementation(**JDOHelper.java:1193)
 [junit] at javax.jdo.JDOHelper.**getPersistenceManagerFactory(**
 JDOHelper.java:808)
 [junit] at javax.jdo.JDOHelper.**getPersistenceManagerFactory(**
 JDOHelper.java:701)
 [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.getPMF(*
 *ObjectStore.java:266)
 [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.**
 getPersistenceManager(**ObjectStore.java:295)
 [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.**
 initialize(ObjectStore.java:**228)
 [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.setConf(
 **ObjectStore.java:203)
 [junit] at org.apache.hadoop.util.**ReflectionUtils.setConf(**
 ReflectionUtils.java:62)
 [junit] at org.apache.hadoop.util.**ReflectionUtils.newInstance(**
 ReflectionUtils.java:117)
 [junit] at org.apache.hadoop.hive.**metastore.RetryingRawStore.**
 

[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725332#comment-13725332
 ] 

Hudson commented on HIVE-4525:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3256) Update asm version in Hive

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725328#comment-13725328
 ] 

Hudson commented on HIVE-3256:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-3256: Update asm version in Hive (Ashutosh Chauhan via Brock Noland) 
(brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508506)
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/metastore/ivy.xml


 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Fix For: 0.12.0

 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725330#comment-13725330
 ] 

Hudson commented on HIVE-3264:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-3264 : Add support for binary dataype to AvroSerde (Eli Reisman  Mark 
Wagner via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508528)
* /hive/trunk/data/files/csv.txt
* /hive/trunk/ql/src/test/queries/clientpositive/avro_nullable_fields.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_nullable_fields.q.out
* /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java


 Add support for binary dataype to AvroSerde
 ---

 Key: HIVE-3264
 URL: https://issues.apache.org/jira/browse/HIVE-3264
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
 HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch


 When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
 byte array type is converted an array of small ints.  Now that HIVE-2380 is 
 in, this step isn't necessary and we can convert both Avro's bytes type and 
 probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725327#comment-13725327
 ] 

Hudson commented on HIVE-4920:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-4920 PTest2 handle Spot Price increases gracefully and improve rsync 
paralllelsim (Brock Noland via egc)

Submitted by:   Brock Noland
Reviewed by:Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508707)
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/client/PTestClient.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/request/TestStartRequest.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/ExecutionController.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/TestExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Constants.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Drone.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ExecutionPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutorBuilder.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JIRAService.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JUnitReportParser.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/LogDirectoryCleaner.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Phase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PrepPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ReportingPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/ExecutionContextConfiguration.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/QFileTestBatch.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudComputeService.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/AbstractSSHCommand.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/RSyncCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/SSHCommandExecutor.java
* /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm
* /hive/trunk/testutils/ptest2/src/main/resources/log4j.properties
* /hive/trunk/testutils/ptest2/src/main/resources/source-prep.vm
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/api/server/TestTestExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/AbstractTestPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockLocalCommandFactory.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockRSyncCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockSSHCommandExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.testExecute.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingQFile.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingUnitTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingQFileTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingUnitTest.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestHostExecutor.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestPhase.java
* 

[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725329#comment-13725329
 ] 

Hudson commented on HIVE-4928:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-4928 : Date literals do not work properly in partition spec clause (Jason 
Dere via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q
* /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out


 Date literals do not work properly in partition spec clause
 ---

 Key: HIVE-4928
 URL: https://issues.apache.org/jira/browse/HIVE-4928
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch


 The partition spec parsing doesn't do any actual real evaluation of the 
 values in the partition spec, instead just taking the text value of the 
 ASTNode representing the partition value. This works fine for string/numeric 
 literals (expression tree below):
 (TOK_PARTVAL region 99)
 But not for Date literals which are of form DATE '-mm-dd' (expression 
 tree below:
 (TOK_DATELITERAL '1999-12-31')
 In this case the parser/analyzer uses TOK_DATELITERAL as the partition 
 column value, when it should really get value of the child of the DATELITERAL 
 token.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4962) fix eclipse template broken by HIVE-3256

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725333#comment-13725333
 ] 

Hudson commented on HIVE-4962:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
Hive-4962 Fix eclipse templates broken from ASM changes (Yin Huai via egc)

Submitted by: Yin Huai  
Reviewed by: Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508706)
* /hive/trunk/eclipse-templates/.classpath
* /hive/trunk/eclipse-templates/.classpath._hbase


 fix eclipse template broken by HIVE-3256
 

 Key: HIVE-4962
 URL: https://issues.apache.org/jira/browse/HIVE-4962
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-4962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725331#comment-13725331
 ] 

Hudson commented on HIVE-2702:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types 
both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out


 Enhance listPartitionsByFilter to add support for integral types both for 
 equality and non-equality
 ---

 Key: HIVE-2702
 URL: https://issues.apache.org/jira/browse/HIVE-2702
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Aniket Mokashi
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, 
 HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, 
 HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, 
 HIVE-2702.patch, HIVE-2702-v0.patch


 listPartitionsByFilter supports only non-string partitions. This is because 
 its explicitly specified in generateJDOFilterOverPartitions in 
 ExpressionTree.java. 
 //Can only support partitions whose types are string
   if( ! table.getPartitionKeys().get(partitionColumnIndex).
   
 getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) {
 throw new MetaException
 (Filtering is supported only on partition keys of type string);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-31 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4952:
--

Attachment: HIVE-4952.D11889.2.patch

yhuai updated the revision HIVE-4952 [jira] When hive.join.emit.interval is 
small, queries optimized by Correlation Optimizer may generate wrong results.

- Merge remote-tracking branch 'origin/trunk' into HIVE-4952
- Merge branch 'trunk' of https://github.com/apache/hive into HIVE-4952
- update comments

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11889

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11889?vs=36531id=36657#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  ql/src/test/queries/clientpositive/correlationoptimizer15.q
  ql/src/test/results/clientpositive/correlationoptimizer15.q.out

To: JIRA, yhuai


 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
 replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-31 Thread agateaaa
Thanks Nitin

There arent too many connections in close_wait state only 1 or two when we
run into this. Most likely its because of dropped connection.

I could not find any read or write timeouts we can set for the thrift
server which will tell thrift to hold on to the client connection.
 See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
to have been implemented yet. We do have set a client connection timeout
but cannot find
an equivalent setting for the server.

We have  a suspicion that this happens when we run two client processes
which modify two distinct partitions of the same hive table. We put in a
workaround so that the two hive client processes never run together and so
far things look ok but we will keep monitoring.

Could it be because hive metastore server is not thread safe, would running
two alter table statements on two distinct partitions of the same table
using two client connections cause problems like these, where hive
metastore server closes or drops a wrong client connection and leaves the
other hanging?

Agateaaa




On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 The mentioned flow is called when you have unsecure mode of thrift
 metastore client-server connection. So one way to avoid this is have a
 secure way.

 code
 public boolean process(final TProtocol in, final TProtocol out)
 throwsTException {
 setIpAddress(in);
 ...
 ...
 ...
 @Override
  protected void setIpAddress(final TProtocol in) {
 TUGIContainingTransport ugiTrans =
 (TUGIContainingTransport)in.getTransport();
 Socket socket = ugiTrans.getSocket();
 if (socket != null) {
   setIpAddress(socket);

 /code


 From the above code snippet, it looks like the null pointer exception is
 not handled if the getSocket returns null.

 can you check whats the ulimit setting on the server? If its set to default
 can you set it to unlimited and restart hcat server. (This is just a wild
 guess).

 also the getSocket method suggests If the underlying TTransport is an
 instance of TSocket, it returns the Socket object which it contains.
 Otherwise it returns null.

 so someone from thirft gurus need to tell us whats happening. I have no
 knowledge of this depth

 may be Ashutosh or Thejas will be able to help on this.




 From the netstat close_wait, it looks like the hive metastore server has
 not closed the connection (do not know why yet), may be the hive dev guys
 can help.Are there too many connections in close_wait state?



 On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote:

  Looking at the hive metastore server logs see errors like these:
 
  2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
  (TThreadPoolServer.java:run(182)) - Error occurred during processing of
  message.
  java.lang.NullPointerException
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
  at
 
 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  approx same time as we see timeout or connection reset errors.
 
  Dont know if this is the cause or the side affect of he connection
  timeout/connection reset errors. Does anybody have any pointers or
  suggestions ?
 
  Thanks
 
 
  On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:
 
   Thanks Nitin!
  
   We have simiar setup (identical hcatalog and hive server versions) on a
   another production environment and dont see any errors (its been
 running
  ok
   for a few months)
  
   Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
   0.10 soon.
  
   I did see that the last time we ran into this problem doing a
 netstat-ntp
   | grep :1 see that server was holding on to one socket connection
  in
   CLOSE_WAIT state for a long time
(hive metastore server is running on port 1). Dont know if thats
   relevant here or not
  
   Can you suggest any hive configuration settings we can tweak or
  networking
   tools/tips, we can use to narrow this down ?
  
   Thanks
   Agateaaa
  
  
  
  
   On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com
  wrote:
  
   Is there any chance you can do a update on test environment with
  hcat-0.5
   and hive-0(11 or 10) and see if you can reproduce the issue?
  
   We used to see this error when there was load on hcat server or some
   network issue connecting to the server(second one was rare occurrence)
  
  
   On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com
 wrote:
  
   Hi All:
  
   We are running into frequent problem using HCatalog 0.4.1 

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: (was: HIVE-4870.patch)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: HIVE-4870.patch

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Status: Open  (was: Patch Available)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Open  (was: Patch Available)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.1.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Patch Available  (was: Open)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.1.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Attachment: HIVE-4950.1.patch

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.1.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Open  (was: Patch Available)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.1.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-4950.
--

Resolution: Not A Problem

Suspending child is supported already using --debug:childSuspend

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Attachment: (was: HIVE-4950.1.patch)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Attachment: (was: HIVE-4950.patch)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-07-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725486#comment-13725486
 ] 

Sergey Shelukhin commented on HIVE-4051:


{quote}Has anyone tested this patch on Derby, PostgreSQL, or Oracle? Until it's 
verified to work on these DBs I think this new code should be disabled by 
default.{quote}
I tested on Derby and MySQL so far.
Note that full fallback is there, so it could have a 3-position switch or two 
settings - current on/off being the same, and the on, but turn off [for some 
grace period?] on first error-setting. The latter could be the default, so in 
case if it fails it goes back to DN and doesn't introduce a lot of extra load. 
What do you think?

 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-07-31 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-4967:
--

 Summary: Don't serialize unnecessary fields in query plan
 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


There are quite a few fields which need not to be serialized since they are 
initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-07-31 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4967:
---

Attachment: HIVE-4967.patch

Patch which adds transient keyword to all such fields.

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-07-31 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4967:
---

Status: Patch Available  (was: Open)

Ready for review. Already ran through full test suite. All tests passed.

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Status: Open  (was: Patch Available)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.1.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: HIVE-4870.1.patch

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.1.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: (was: HIVE-4870.patch)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.1.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725513#comment-13725513
 ] 

Hive QA commented on HIVE-4952:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595206/HIVE-4952.D11889.2.patch

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/259/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/259/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
 replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4968) Broken plan in MapJoin

2013-07-31 Thread Yin Huai (JIRA)
Yin Huai created HIVE-4968:
--

 Summary: Broken plan in MapJoin
 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai


{code:Sql}
SELECT tmp3.key, tmp3.value, tmp3.count
FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
  FROM (SELECT key, value
FROM src) tmp1
  JOIN (SELECT count(*) as count
FROM src) tmp2
  ) tmp3;
{\code}
The plan is executable.

{code:sql}
SELECT tmp3.key, tmp3.value, tmp3.count
FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
  FROM (SELECT *
FROM src) tmp1
  JOIN (SELECT count(*) as count
FROM src) tmp2
  ) tmp3;
{\code}
The plan is executable.

{code}
SELECT tmp4.key, tmp4.value, tmp4.count
FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
  FROM (SELECT *
FROM (SELECT key, value
  FROM src) tmp1 ) tmp2
  JOIN (SELECT count(*) as count
FROM src) tmp3
  ) tmp4;
{\code}
The plan is not executable.

The plan related to the MapJoin is
{code}
 Stage: Stage-5
Map Reduce Local Work
  Alias - Map Local Tables:
tmp4:tmp2:tmp1:src 
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
tmp4:tmp2:tmp1:src 
  TableScan
alias: src
Select Operator
  expressions:
expr: key
type: string
expr: value
type: string
  outputColumnNames: _col0, _col1
  HashTable Sink Operator
condition expressions:
  0 
  1 {_col0}
handleSkewJoin: false
keys:
  0 []
  1 []
Position of Big Table: 1

  Stage: Stage-4
Map Reduce
  Alias - Map Operator Tree:
$INTNAME 
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 
1 {_col0}
  handleSkewJoin: false
  keys:
0 []
1 []
  outputColumnNames: _col2
  Position of Big Table: 1
  Select Operator
expressions:
  expr: _col0
  type: string
  expr: _col1
  type: string
  expr: _col2
  type: bigint
outputColumnNames: _col0, _col1, _col2
File Output Operator
  compressed: false
  GlobalTableId: 0
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Local Work:
Map Reduce Local Work
{\code}
The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
_col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-31 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Attachment: HIVE-4388.patch

Uploading to get a full test run.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4968) Broken plan in MapJoin

2013-07-31 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725551#comment-13725551
 ] 

Yin Huai commented on HIVE-4968:


I have difficulty to summarize this problem in a concise and precise way... 
Will update the summary once I find a good one.

 Broken plan in MapJoin
 --

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai

 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-07-31 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4789:
---

Attachment: HIVE-4789.2.patch.txt

I applied Sean's patch and then test the tests with overwrite turned on. 
Attaching here to get another test run.

 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-07-31 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4789:
---

Status: Patch Available  (was: Open)

 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions

2013-07-31 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4954:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Thank you for your contribution! I have committed this to trunk.

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725609#comment-13725609
 ] 

Hive QA commented on HIVE-4870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595226/HIVE-4870.1.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 2748 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/260/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/260/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.1.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Tony Murphy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4794:
--

Attachment: HIVE-4794.3.patch

Updated comments.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Tony Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725623#comment-13725623
 ] 

Tony Murphy commented on HIVE-4794:
---

ran tests now that all dependent patches are in and merge with trunk complete. 
tests pass 100%.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 13021: Vectorization Tests

2013-07-31 Thread tony murphy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13021/
---

(Updated July 31, 2013, 7:26 p.m.)


Review request for hive, Eric Hanson, Jitendra Pandey, Remus Rusanu, and 
Sarvesh Sakalanaga.


Changes
---

updated comments


Bugs: HIVE-4794
https://issues.apache.org/jira/browse/HIVE-4794


Repository: hive-git


Description
---

These test cover all types, aggregates, and operators currently supported for 
vectorization. The queries are executed over a specially crafted data set which 
covers all the interesting classes of batch for each type: all nulls, repeating 
value, no nulls, and random values, to fully exercise the vectorization stack. 
The queries were stabilized against a text test oracle in order to validate 
results.

This patch depends on: 
HIVE-4525
HIVE-4922
HIVE-4931


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java 97436c5 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 79390a9 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/AllVectorTypesRecord.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/OrcFileGenerator.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_0.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_12.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_13.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_14.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_15.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_16.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_9.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_0.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_14.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/13021/diff/


Testing
---


Thanks,

tony murphy



[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Tony Murphy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4794:
--

Status: Patch Available  (was: Open)

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Tony Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725632#comment-13725632
 ] 

Tony Murphy commented on HIVE-4794:
---

I ran the tests in the vectorization branch which just pulled from trunk. as 
far as i know we don't have precommit testing for branches yet.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Tony Murphy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4794:
--

Attachment: HIVE-4794.3-vectorization.patch

fix patch format to get precommit patching

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 HIVE-4794.3-vectorization.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725677#comment-13725677
 ] 

Yin Huai commented on HIVE-4968:


New summary has been updated

 When deduplicate multiple SelectOperators, we should update RowResolver 
 accordinly
 --

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai

 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4968:
---

Summary: When deduplicate multiple SelectOperators, we should update 
RowResolver accordinly  (was: Broken plan in MapJoin)

 When deduplicate multiple SelectOperators, we should update RowResolver 
 accordinly
 --

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai

 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: TestCliDriver Failed Test

2013-07-31 Thread nikolaus . stahl
Thanks Noland. I tried that but now I'm getting more errors (see  
below). It seems that the java compiler isn't recognizing the package  
for this test. Here's the relevant output, after running the same test  
as before with the very-clean option (I.e.: ant very-clean test  
-Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true ):


set-test-classpath:

compile-test:
 [echo] Project: ql
[javac] Compiling 105 source files to  
/Users/niko/Repos/hive-trunk/build/ql/test/classes
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:21: package org.apache.hadoop.hive.metastore does not  
exist
[javac] import static  
org.apache.hadoop.hive.metastore.MetaStoreUtils.DEFAULT_DATABASE_NAME;

[javac]   ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:21: static import only from classes and  
interfaces
[javac] import static  
org.apache.hadoop.hive.metastore.MetaStoreUtils.DEFAULT_DATABASE_NAME;

[javac] ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:55: package org.apache.hadoop.hive.cli does not  
exist

[javac] import org.apache.hadoop.hive.cli.CliDriver;
[javac]  ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:56: package org.apache.hadoop.hive.cli does not  
exist

[javac] import org.apache.hadoop.hive.cli.CliSessionState;
[javac]  ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:57: package org.apache.hadoop.hive.common.io does not  
exist

[javac] import org.apache.hadoop.hive.common.io.CachingPrintStream;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:58: package org.apache.hadoop.hive.conf does not  
exist

[javac] import org.apache.hadoop.hive.conf.HiveConf;
[javac]   ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:59: package org.apache.hadoop.hive.metastore does not  
exist

[javac] import org.apache.hadoop.hive.metastore.MetaStoreUtils;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:60: package org.apache.hadoop.hive.metastore.api does not  
exist

[javac] import org.apache.hadoop.hive.metastore.api.Index;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:76: package org.apache.hadoop.hive.serde does not  
exist

[javac] import org.apache.hadoop.hive.serde.serdeConstants;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:77: package org.apache.hadoop.hive.serde2.thrift does not  
exist

[javac] import org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:78: package org.apache.hadoop.hive.serde2.thrift.test does not  
exist

[javac] import org.apache.hadoop.hive.serde2.thrift.test.Complex;
[javac] ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:79: package org.apache.hadoop.hive.shims does not  
exist

[javac] import org.apache.hadoop.hive.shims.HadoopShims;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:80: package org.apache.hadoop.hive.shims does not  
exist

[javac] import org.apache.hadoop.hive.shims.ShimLoader;
[javac]^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:112: cannot find  
symbol

[javac] symbol  : class HiveConf
[javac] location: class org.apache.hadoop.hive.ql.QTestUtil
[javac]   protected HiveConf conf;
[javac] ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:117: cannot find  
symbol

[javac] symbol  : class CliDriver
[javac] location: class org.apache.hadoop.hive.ql.QTestUtil
[javac]   private CliDriver cliDriver;
[javac]   ^
[javac]  
/Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:118: package HadoopShims does not  
exist

[javac]   private HadoopShims.MiniMrShim mr = null;
[javac]  ^
[javac]  

[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-07-31 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725695#comment-13725695
 ] 

Xuefu Zhang commented on HIVE-4844:
---

[~jdere]:

1. I'm not sure which way is the better, but I feel that adding additional 
columns seems cleaner in my opinion.

2. I could be off the topic on inheritance. I guess what I tried to say was 
some types, for instance, string, CHAR, and VARCHAR are very similar and may 
share a lot of implementations. This would also apply to DECIMAL and 
DECIMAL(p,s) also. However, I haven't figured out the implications yet. Please 
share your insights. 

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-07-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725702#comment-13725702
 ] 

Edward Capriolo commented on HIVE-4844:
---

Ideally it would be best if by default the field has no parameters to not have 
to store any additional data in the metastore. 

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4968:
--

Attachment: HIVE-4968.D11901.1.patch

yhuai requested code review of HIVE-4968 [jira] When deduplicate multiple 
SelectOperators, we should update RowResolver accordinly.

Reviewers: JIRA

Merge remote-tracking branch 'origin/trunk' into HIVE-4968

SELECT tmp3.key, tmp3.value, tmp3.count
FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
  FROM (SELECT key, value
FROM src) tmp1
  JOIN (SELECT count(*) as count
FROM src) tmp2
  ) tmp3;

The plan is executable.

SELECT tmp3.key, tmp3.value, tmp3.count
FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
  FROM (SELECT *
FROM src) tmp1
  JOIN (SELECT count(*) as count
FROM src) tmp2
  ) tmp3;

The plan is executable.

SELECT tmp4.key, tmp4.value, tmp4.count
FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
  FROM (SELECT *
FROM (SELECT key, value
  FROM src) tmp1 ) tmp2
  JOIN (SELECT count(*) as count
FROM src) tmp3
  ) tmp4;

The plan is not executable.

The plan related to the MapJoin is

 Stage: Stage-5
Map Reduce Local Work
  Alias - Map Local Tables:
tmp4:tmp2:tmp1:src
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
tmp4:tmp2:tmp1:src
  TableScan
alias: src
Select Operator
  expressions:
expr: key
type: string
expr: value
type: string
  outputColumnNames: _col0, _col1
  HashTable Sink Operator
condition expressions:
  0
  1 {_col0}
handleSkewJoin: false
keys:
  0 []
  1 []
Position of Big Table: 1

  Stage: Stage-4
Map Reduce
  Alias - Map Operator Tree:
$INTNAME
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0
1 {_col0}
  handleSkewJoin: false
  keys:
0 []
1 []
  outputColumnNames: _col2
  Position of Big Table: 1
  Select Operator
expressions:
  expr: _col0
  type: string
  expr: _col1
  type: string
  expr: _col2
  type: bigint
outputColumnNames: _col0, _col1, _col2
File Output Operator
  compressed: false
  GlobalTableId: 0
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Local Work:
Map Reduce Local Work

The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
_col2'

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11901

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
  ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q
  ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28407/

To: JIRA, yhuai


 When deduplicate multiple SelectOperators, we should update RowResolver 
 accordinly
 --

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   

[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4968:
---

Status: Patch Available  (was: Open)

 When deduplicate multiple SelectOperators, we should update RowResolver 
 accordinly
 --

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4968:
---

Summary: When deduplicating multiple SelectOperators, we should update 
RowResolver accordinly  (was: When deduplicate multiple SelectOperators, we 
should update RowResolver accordinly)

 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725754#comment-13725754
 ] 

Hive QA commented on HIVE-4870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595226/HIVE-4870.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 2749 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/262/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/262/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.1.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725783#comment-13725783
 ] 

Phabricator commented on HIVE-4968:
---

ashutoshc has accepted the revision HIVE-4968 [jira] When deduplicate multiple 
SelectOperators, we should update RowResolver accordinly.

  Looks good. Some minor comments.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java:399 You are not 
using this method. Lets not add this.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java:104 
Can you add a comment saying something like we need to set row resolver of 
parent from the child which is in parse context to preserve column mappings.
  Feel free to improve on the wording here.

REVISION DETAIL
  https://reviews.facebook.net/D11901

BRANCH
  HIVE-4968

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, yhuai


 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4968:
---

Status: Open  (was: Patch Available)

 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4960) lastAlias in CommonJoinOperator is not used

2013-07-31 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725787#comment-13725787
 ] 

Phabricator commented on HIVE-4960:
---

ashutoshc has accepted the revision HIVE-4960 [jira] lastAlias in 
CommonJoinOperator is not used.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D11895

BRANCH
  HIVE-4960

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, yhuai


 lastAlias in CommonJoinOperator is not used
 ---

 Key: HIVE-4960
 URL: https://issues.apache.org/jira/browse/HIVE-4960
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
 Attachments: HIVE-4960.D11895.1.patch


 In CommonJoinOperator, there is object called lastAlias. The initial value of 
 this object is 'null'. After tracing the usage of this object, I found that 
 there is no place to change the value of this object. Also, it is only used 
 in processOp in JoinOperator and MapJoinOperator as
 {code}
 if ((lastAlias == null) || (!lastAlias.equals(alias))) {
   nextSz = joinEmitInterval;
 }
 {\code}
 Since lastAlias will always be null, we will assign joinEmitInterval to 
 nextSz every time we get a row. Later in processOp, we have 
 {code}
 nextSz = getNextSize(nextSz);
 {\code}
 Because we reset the value of nextSz to joinEmitInterval every time we get a 
 row, seems that getNextSize will not be used as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)
Venki Korukanti created HIVE-4969:
-

 Summary: HCatalog HBaseHCatStorageHandler is not returning all the 
data
 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical



Repro steps:
1) Create an HCatalog table mapped to HBase table.

hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
 STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
 TBLPROPERTIES('hbase.table.name' ='studentHBase',  
   'hbase.columns.mapping' =
':key,onecf:name,twocf:age,threecf:gpa');


2) Load the following data from Pig.

cat student_data
1^Asarah laertes^A23^A2.40
2^Atom allen^A72^A1.57
3^Abob ovid^A61^A2.67
4^Aethan nixon^A38^A2.15
5^Acalvin robinson^A28^A2.53
6^Airene ovid^A65^A2.56
7^Ayuri garcia^A36^A1.65
8^Acalvin nixon^A41^A1.04
9^Ajessica davidson^A48^A2.11
10^Akatie king^A39^A1.05


grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float);

grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();

3) Now from HBase do a scan on the studentHBase table
hbase(main):026:0 scan 'studentPig', {LIMIT = 5}

4) From pig access the data in table
grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
grunt STORE A INTO '/user/root/studentPig';


5) Verify the output written in StudentPig
hadoop fs -cat /user/root/studentPig/part-r-0
1  23
2  72
3  61
4  38
5  28
6  65
7  36
8  41
9  48
10 39

The data returned only two fields (rownum and age).


Problem:
While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
row in Result (org.apache.hadoop.hbase.client.Result) object and processes the 
KeyValue fields in it. After processing it creates another Result object out of 
the processed KeyValue array. Problem here is KeyValue array is not sorted. 
Result object expects the input KeyValue array to have sorted elements. When we 
call the Result.getValue() it returns no value for some of the fields as it 
does a binary search on unordered array.










--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-4969:
--

Description: 
Repro steps:
1) Create an HCatalog table mapped to HBase table.

hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
 STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
 TBLPROPERTIES('hbase.table.name' ='studentHBase',  
   'hbase.columns.mapping' =
':key,onecf:name,twocf:age,threecf:gpa');


2) Load the following data from Pig.

cat student_data
1^Asarah laertes^A23^A2.40
2^Atom allen^A72^A1.57
3^Abob ovid^A61^A2.67
4^Aethan nixon^A38^A2.15
5^Acalvin robinson^A28^A2.53
6^Airene ovid^A65^A2.56
7^Ayuri garcia^A36^A1.65
8^Acalvin nixon^A41^A1.04
9^Ajessica davidson^A48^A2.11
10^Akatie king^A39^A1.05


grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float);

grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();

3) Now from HBase do a scan on the studentHBase table
hbase(main):026:0 scan 'studentPig', {LIMIT = 5}

4) From pig access the data in table
grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
grunt STORE A INTO '/user/root/studentPig';


5) Verify the output written in StudentPig
hadoop fs -cat /user/root/studentPig/part-r-0
1  23
2  72
3  61
4  38
5  28
6  65
7  36
8  41
9  48
10 39

The data returned has only two fields (rownum and age).


Problem:
While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
row in Result (org.apache.hadoop.hbase.client.Result) object and processes the 
KeyValue fields in it. After processing, it creates another Result object out 
of the processed KeyValue array. Problem here is KeyValue array is not sorted. 
Result object expects the input KeyValue array to have sorted elements. When we 
call the Result.getValue() it returns no value for some of the fields as it 
does a binary search on un-ordered array.










  was:

Repro steps:
1) Create an HCatalog table mapped to HBase table.

hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
 STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
 TBLPROPERTIES('hbase.table.name' ='studentHBase',  
   'hbase.columns.mapping' =
':key,onecf:name,twocf:age,threecf:gpa');


2) Load the following data from Pig.

cat student_data
1^Asarah laertes^A23^A2.40
2^Atom allen^A72^A1.57
3^Abob ovid^A61^A2.67
4^Aethan nixon^A38^A2.15
5^Acalvin robinson^A28^A2.53
6^Airene ovid^A65^A2.56
7^Ayuri garcia^A36^A1.65
8^Acalvin nixon^A41^A1.04
9^Ajessica davidson^A48^A2.11
10^Akatie king^A39^A1.05


grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float);

grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();

3) Now from HBase do a scan on the studentHBase table
hbase(main):026:0 scan 'studentPig', {LIMIT = 5}

4) From pig access the data in table
grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
grunt STORE A INTO '/user/root/studentPig';


5) Verify the output written in StudentPig
hadoop fs -cat /user/root/studentPig/part-r-0
1  23
2  72
3  61
4  38
5  28
6  65
7  36
8  41
9  48
10 39

The data returned only two fields (rownum and age).


Problem:
While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
row in Result (org.apache.hadoop.hbase.client.Result) object and processes the 
KeyValue fields in it. After processing it creates another Result object out of 
the processed KeyValue array. Problem here is KeyValue array is not sorted. 
Result object expects the input KeyValue array to have sorted elements. When we 
call the Result.getValue() it returns no value for some of the fields as it 
does a binary search on unordered array.











 HCatalog HBaseHCatStorageHandler is not returning all the data
 --

 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical

 Repro steps:
 1) Create an HCatalog table mapped to HBase table.
 hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
  STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
  TBLPROPERTIES('hbase.table.name' ='studentHBase',  
'hbase.columns.mapping' =  

[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-4969:
--

Attachment: HIVE-4969-1.patch

 HCatalog HBaseHCatStorageHandler is not returning all the data
 --

 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical
 Fix For: 0.11.1, 0.12.0

 Attachments: HIVE-4969-1.patch


 Repro steps:
 1) Create an HCatalog table mapped to HBase table.
 hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
  STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
  TBLPROPERTIES('hbase.table.name' ='studentHBase',  
'hbase.columns.mapping' =
 ':key,onecf:name,twocf:age,threecf:gpa');
 2) Load the following data from Pig.
 cat student_data
 1^Asarah laertes^A23^A2.40
 2^Atom allen^A72^A1.57
 3^Abob ovid^A61^A2.67
 4^Aethan nixon^A38^A2.15
 5^Acalvin robinson^A28^A2.53
 6^Airene ovid^A65^A2.56
 7^Ayuri garcia^A36^A1.65
 8^Acalvin nixon^A41^A1.04
 9^Ajessica davidson^A48^A2.11
 10^Akatie king^A39^A1.05
 grunt A = LOAD 'student_data' AS 
 (rownum:int,name:chararray,age:int,gpa:float);
 grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
 3) Now from HBase do a scan on the studentHBase table
 hbase(main):026:0 scan 'studentPig', {LIMIT = 5}
 4) From pig access the data in table
 grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
 grunt STORE A INTO '/user/root/studentPig';
 5) Verify the output written in StudentPig
 hadoop fs -cat /user/root/studentPig/part-r-0
 1  23
 2  72
 3  61
 4  38
 5  28
 6  65
 7  36
 8  41
 9  48
 10 39
 The data returned has only two fields (rownum and age).
 Problem:
 While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
 row in Result (org.apache.hadoop.hbase.client.Result) object and processes 
 the KeyValue fields in it. After processing, it creates another Result object 
 out of the processed KeyValue array. Problem here is KeyValue array is not 
 sorted. Result object expects the input KeyValue array to have sorted 
 elements. When we call the Result.getValue() it returns no value for some of 
 the fields as it does a binary search on un-ordered array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-4969:
--

Attachment: (was: HIVE-4969-1.patch)

 HCatalog HBaseHCatStorageHandler is not returning all the data
 --

 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical
 Fix For: 0.11.1, 0.12.0


 Repro steps:
 1) Create an HCatalog table mapped to HBase table.
 hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
  STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
  TBLPROPERTIES('hbase.table.name' ='studentHBase',  
'hbase.columns.mapping' =
 ':key,onecf:name,twocf:age,threecf:gpa');
 2) Load the following data from Pig.
 cat student_data
 1^Asarah laertes^A23^A2.40
 2^Atom allen^A72^A1.57
 3^Abob ovid^A61^A2.67
 4^Aethan nixon^A38^A2.15
 5^Acalvin robinson^A28^A2.53
 6^Airene ovid^A65^A2.56
 7^Ayuri garcia^A36^A1.65
 8^Acalvin nixon^A41^A1.04
 9^Ajessica davidson^A48^A2.11
 10^Akatie king^A39^A1.05
 grunt A = LOAD 'student_data' AS 
 (rownum:int,name:chararray,age:int,gpa:float);
 grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
 3) Now from HBase do a scan on the studentHBase table
 hbase(main):026:0 scan 'studentPig', {LIMIT = 5}
 4) From pig access the data in table
 grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
 grunt STORE A INTO '/user/root/studentPig';
 5) Verify the output written in StudentPig
 hadoop fs -cat /user/root/studentPig/part-r-0
 1  23
 2  72
 3  61
 4  38
 5  28
 6  65
 7  36
 8  41
 9  48
 10 39
 The data returned has only two fields (rownum and age).
 Problem:
 While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
 row in Result (org.apache.hadoop.hbase.client.Result) object and processes 
 the KeyValue fields in it. After processing, it creates another Result object 
 out of the processed KeyValue array. Problem here is KeyValue array is not 
 sorted. Result object expects the input KeyValue array to have sorted 
 elements. When we call the Result.getValue() it returns no value for some of 
 the fields as it does a binary search on un-ordered array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725807#comment-13725807
 ] 

Venki Korukanti commented on HIVE-4969:
---

attached a patch to sort KeyValue array before creating HBase Result object.

 HCatalog HBaseHCatStorageHandler is not returning all the data
 --

 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical
 Fix For: 0.11.1, 0.12.0

 Attachments: HIVE-4969-1.patch


 Repro steps:
 1) Create an HCatalog table mapped to HBase table.
 hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
  STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
  TBLPROPERTIES('hbase.table.name' ='studentHBase',  
'hbase.columns.mapping' =
 ':key,onecf:name,twocf:age,threecf:gpa');
 2) Load the following data from Pig.
 cat student_data
 1^Asarah laertes^A23^A2.40
 2^Atom allen^A72^A1.57
 3^Abob ovid^A61^A2.67
 4^Aethan nixon^A38^A2.15
 5^Acalvin robinson^A28^A2.53
 6^Airene ovid^A65^A2.56
 7^Ayuri garcia^A36^A1.65
 8^Acalvin nixon^A41^A1.04
 9^Ajessica davidson^A48^A2.11
 10^Akatie king^A39^A1.05
 grunt A = LOAD 'student_data' AS 
 (rownum:int,name:chararray,age:int,gpa:float);
 grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
 3) Now from HBase do a scan on the studentHBase table
 hbase(main):026:0 scan 'studentPig', {LIMIT = 5}
 4) From pig access the data in table
 grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
 grunt STORE A INTO '/user/root/studentPig';
 5) Verify the output written in StudentPig
 hadoop fs -cat /user/root/studentPig/part-r-0
 1  23
 2  72
 3  61
 4  38
 5  28
 6  65
 7  36
 8  41
 9  48
 10 39
 The data returned has only two fields (rownum and age).
 Problem:
 While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
 row in Result (org.apache.hadoop.hbase.client.Result) object and processes 
 the KeyValue fields in it. After processing, it creates another Result object 
 out of the processed KeyValue array. Problem here is KeyValue array is not 
 sorted. Result object expects the input KeyValue array to have sorted 
 elements. When we call the Result.getValue() it returns no value for some of 
 the fields as it does a binary search on un-ordered array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data

2013-07-31 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-4969:
--

Attachment: HIVE-4969-1.patch

 HCatalog HBaseHCatStorageHandler is not returning all the data
 --

 Key: HIVE-4969
 URL: https://issues.apache.org/jira/browse/HIVE-4969
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Venki Korukanti
Priority: Critical
 Fix For: 0.11.1, 0.12.0

 Attachments: HIVE-4969-1.patch


 Repro steps:
 1) Create an HCatalog table mapped to HBase table.
 hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
  STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
  TBLPROPERTIES('hbase.table.name' ='studentHBase',  
'hbase.columns.mapping' =
 ':key,onecf:name,twocf:age,threecf:gpa');
 2) Load the following data from Pig.
 cat student_data
 1^Asarah laertes^A23^A2.40
 2^Atom allen^A72^A1.57
 3^Abob ovid^A61^A2.67
 4^Aethan nixon^A38^A2.15
 5^Acalvin robinson^A28^A2.53
 6^Airene ovid^A65^A2.56
 7^Ayuri garcia^A36^A1.65
 8^Acalvin nixon^A41^A1.04
 9^Ajessica davidson^A48^A2.11
 10^Akatie king^A39^A1.05
 grunt A = LOAD 'student_data' AS 
 (rownum:int,name:chararray,age:int,gpa:float);
 grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
 3) Now from HBase do a scan on the studentHBase table
 hbase(main):026:0 scan 'studentPig', {LIMIT = 5}
 4) From pig access the data in table
 grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
 grunt STORE A INTO '/user/root/studentPig';
 5) Verify the output written in StudentPig
 hadoop fs -cat /user/root/studentPig/part-r-0
 1  23
 2  72
 3  61
 4  38
 5  28
 6  65
 7  36
 8  41
 9  48
 10 39
 The data returned has only two fields (rownum and age).
 Problem:
 While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
 row in Result (org.apache.hadoop.hbase.client.Result) object and processes 
 the KeyValue fields in it. After processing, it creates another Result object 
 out of the processed KeyValue array. Problem here is KeyValue array is not 
 sorted. Result object expects the input KeyValue array to have sorted 
 elements. When we call the Result.getValue() it returns no value for some of 
 the fields as it does a binary search on un-ordered array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725845#comment-13725845
 ] 

Hive QA commented on HIVE-4967:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595224/HIVE-4967.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2749 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/263/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/263/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4968:
--

Attachment: HIVE-4968.D11901.2.patch

yhuai updated the revision HIVE-4968 [jira] When deduplicate multiple 
SelectOperators, we should update RowResolver accordinly.

  addressed Ashutosh's comments

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11901

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11901?vs=36669id=36693#toc

BRANCH
  HIVE-4968

ARCANIST PROJECT
  hive

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
  ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q
  ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out

To: JIRA, ashutoshc, yhuai


 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4968:
---

Status: Patch Available  (was: Open)

addressed Ashutosh's comments

 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF

2013-07-31 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725882#comment-13725882
 ] 

Harish Butani commented on HIVE-4966:
-

For my understanding, are you adding a new function collect_array or are you 
enhancing collect_set to have a dedup=true/false option. 

The signatures of collect_map and collect_set/array are different. So we have 
to expose them as separate fns.

But open to sharing a single implementation. Makes sense. What specifically do 
you have in mind? 


 Introduce Collect_Map UDAF
 --

 Key: HIVE-4966
 URL: https://issues.apache.org/jira/browse/HIVE-4966
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani

 Similar to Collect_Set. For e.g. on a Txn table
 {noformat}
 Txn(customer, product, amt)
 select customer, collect_map(product, amt)
 from txn
 group by customer
 {noformat}
 Would give you an activity map for each customer.
 Other thoughts:
 - have explode do the inverse on maps just as it does for sets today.
 - introduce a table function that outputs each value as a column. So in the 
 e.g. above you get an activity matrix instead of a map. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2482) Convenience UDFs for binary data type

2013-07-31 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-2482:
--

Attachment: HIVE-2482.1.patch

I've implemented the hex, encoding, and base64 UDFs along with unit tests.

I've also changed Unhex to return a binary instead of wrapping it's output as a 
string. This is an incompatible change, but I think it's ultimately the right 
thing to do.

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Attachments: HIVE-2482.1.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string
 * UDF's to convert to/from non-string types using a particular serde

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4970) BinaryConverter does not respect nulls

2013-07-31 Thread Mark Wagner (JIRA)
Mark Wagner created HIVE-4970:
-

 Summary: BinaryConverter does not respect nulls
 Key: HIVE-4970
 URL: https://issues.apache.org/jira/browse/HIVE-4970
 Project: Hive
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mark Wagner


Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not 
handle null values the same as the other converters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4970) BinaryConverter does not respect nulls

2013-07-31 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-4970:
--

Attachment: HIVE-4970.1.patch

This patch makes BinaryConverter match the other primitive converters

 BinaryConverter does not respect nulls
 --

 Key: HIVE-4970
 URL: https://issues.apache.org/jira/browse/HIVE-4970
 Project: Hive
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: HIVE-4970.1.patch


 Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not 
 handle null values the same as the other converters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4879) Window functions that imply order can only be registered at compile time

2013-07-31 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4879:
--

Attachment: HIVE-4879.4.patch.txt

 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt, HIVE-4879.4.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF

2013-07-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725943#comment-13725943
 ] 

Edward Capriolo commented on HIVE-4966:
---

I have a working collect here 

https://github.com/edwardcapriolo/hive-collect/blob/master/src/main/java/com/jointhegrid/udf/collect/GenericUDAFCollect.java

I was going to add it to hive but you can if you would like.

 Introduce Collect_Map UDAF
 --

 Key: HIVE-4966
 URL: https://issues.apache.org/jira/browse/HIVE-4966
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani

 Similar to Collect_Set. For e.g. on a Txn table
 {noformat}
 Txn(customer, product, amt)
 select customer, collect_map(product, amt)
 from txn
 group by customer
 {noformat}
 Would give you an activity map for each customer.
 Other thoughts:
 - have explode do the inverse on maps just as it does for sets today.
 - introduce a table function that outputs each value as a column. So in the 
 e.g. above you get an activity matrix instead of a map. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4970) BinaryConverter does not respect nulls

2013-07-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725944#comment-13725944
 ] 

Edward Capriolo commented on HIVE-4970:
---

Ideally we would like a q test to exersize the code and possibly a standard 
unit test as well.

 BinaryConverter does not respect nulls
 --

 Key: HIVE-4970
 URL: https://issues.apache.org/jira/browse/HIVE-4970
 Project: Hive
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: HIVE-4970.1.patch


 Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not 
 handle null values the same as the other converters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4971) Unit test failure in TestVectorTimestampExpressions

2013-07-31 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4971:
---

Description: Unit test testVectorUDFUnixTimeStampLong is failing 
TestVectorTimestampExpressions.  (was: Unit test is failing 
TestVectorTimestampExpressions,

failure message=expected:lt;-2gt; but was:lt;-1gt; 
type=junit.framework.AssertionFailedErrorjunit.framework.AssertionFailedError:
 expected:lt;-
2gt; but was:lt;-1gt;
  at junit.framework.Assert.fail(Assert.java:47)
  at junit.framework.Assert.failNotEquals(Assert.java:282)
  at junit.framework.Assert.assertEquals(Assert.java:64)
  at junit.framework.Assert.assertEquals(Assert.java:136)
  at junit.framework.Assert.assertEquals(Assert.java:142)
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.compareToUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:495)
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.verifyUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:513)
  at 
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:546))

 Unit test failure in TestVectorTimestampExpressions
 ---

 Key: HIVE-4971
 URL: https://issues.apache.org/jira/browse/HIVE-4971
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Gopal V

 Unit test testVectorUDFUnixTimeStampLong is failing 
 TestVectorTimestampExpressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-07-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725947#comment-13725947
 ] 

Edward Capriolo commented on HIVE-2482:
---

It is nice that you have written traditional junit tests which can stay but we 
normally do this with q file. You can look at the developer guide on the wiki 
to understand how to write these. I an help you along as well because...I do 
not want your UDFs ending up like mine :)

https://issues.apache.org/jira/browse/HIVE-1262 



 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Attachments: HIVE-2482.1.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string
 * UDF's to convert to/from non-string types using a particular serde

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4541) Run check-style on the branch and fix style issues.

2013-07-31 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4541:
---

Attachment: HIVE-4541.2.patch

Attached patch also fixes many issues in the templates.

 Run check-style on the branch and fix style issues.
 ---

 Key: HIVE-4541
 URL: https://issues.apache.org/jira/browse/HIVE-4541
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4541.1.patch, HIVE-4541.2.patch


 We should run check style on the entire branch and fix issues before the 
 branch is merged back to the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725951#comment-13725951
 ] 

Hive QA commented on HIVE-4388:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595231/HIVE-4388.patch

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/264/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/264/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4914) filtering via partition name should be done inside metastore server

2013-07-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-4914:
--

Assignee: Sergey Shelukhin

 filtering via partition name should be done inside metastore server
 ---

 Key: HIVE-4914
 URL: https://issues.apache.org/jira/browse/HIVE-4914
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Currently, if the filter pushdown is impossible (which is most cases), the 
 client gets all partition names from metastore, filters them, and asks for 
 partitions by names for the filtered set.
 Metastore server code should do that instead; it should check if pushdown is 
 possible and do it if so; otherwise it should do name-based filtering.
 Saves the roundtrip with all partition names from the server to client, and 
 also removes the need to have pushdown viability checking on both sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725986#comment-13725986
 ] 

Hive QA commented on HIVE-4789:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595235/HIVE-4789.2.patch.txt

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/265/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/265/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 10698: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-07-31 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10698/
---

(Updated Aug. 1, 2013, 3:23 a.m.)


Review request for hive and Carl Steinbach.


Changes
---

rebased patch, added more test cases for set and dfs commands


Bugs: HIVE-4395
https://issues.apache.org/jira/browse/HIVE-4395


Repository: hive-git


Description
---

Support fetch-from-start for hiveserver2 fetch operations. 
 - Handle new fetch orientation for various HS2 operations.
 - Added support to reset the read position in Hive driver
 - Enabled scroll cursors with support for positioning cursor to start of 
resultset


Diffs (updated)
-

  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 00f4351 
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 61985d1 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 982ceb8 
  jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java 1042125 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 2a3ee24 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 2a6b944 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java df2ccf1 
  ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
  service/src/java/org/apache/hive/service/cli/operation/DfsOperation.java 
a8b8ed4 
  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 581e69c 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
af87a90 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 0fe01c0 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
bafe40c 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 eaf867e 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
d9d0e9c 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 2daa9cd 
  
service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
 0a8825e 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 6f4b8dc 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
976a1ef 

Diff: https://reviews.apache.org/r/10698/diff/


Testing
---

Added new JDBC test cases.


Thanks,

Prasad Mujumdar



[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-07-31 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4395:
--

Status: Patch Available  (was: Open)

rebased the patch

 Support TFetchOrientation.FIRST for HiveServer2 FetchResults
 

 Key: HIVE-4395
 URL: https://issues.apache.org/jira/browse/HIVE-4395
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch


 Currently HiveServer2 only support fetching next row 
 (TFetchOrientation.NEXT). This ticket is to implement support for 
 TFetchOrientation.FIRST that resets the fetch position at the begining of the 
 resultset. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-07-31 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4395:
--

Attachment: HIVE-4395.2.patch

 Support TFetchOrientation.FIRST for HiveServer2 FetchResults
 

 Key: HIVE-4395
 URL: https://issues.apache.org/jira/browse/HIVE-4395
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch


 Currently HiveServer2 only support fetching next row 
 (TFetchOrientation.NEXT). This ticket is to implement support for 
 TFetchOrientation.FIRST that resets the fetch position at the begining of the 
 resultset. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726015#comment-13726015
 ] 

Hive QA commented on HIVE-4794:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595252/HIVE-4794.3-vectorization.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 3490 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table_json
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_column
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/266/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/266/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 HIVE-4794.3-vectorization.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-31 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726019#comment-13726019
 ] 

Yin Huai commented on HIVE-2206:


thanks [~sershe]. I will make the change

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator

2013-07-31 Thread Yin Huai (JIRA)
Yin Huai created HIVE-4972:
--

 Summary: update code generated by thrift for DemuxOperator and 
MuxOperator
 Key: HIVE-4972
 URL: https://issues.apache.org/jira/browse/HIVE-4972
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai


HIVE-2206 introduces two new operators, which are DemuxOperator and 
MuxOperator. queryplan.thrift has been updated. But code generated by thrift 
should be also updated

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator

2013-07-31 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4972:
--

Attachment: HIVE-4972.D11907.1.patch

yhuai requested code review of HIVE-4972 [jira] update code generated by 
thrift for DemuxOperator and MuxOperator.

Reviewers: JIRA

initial commit

HIVE-2206 introduces two new operators, which are DemuxOperator and 
MuxOperator. queryplan.thrift has been updated. But code generated by thrift 
should be also updated

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11907

AFFECTED FILES
  ql/src/gen/thrift/gen-cpp/queryplan_types.cpp
  ql/src/gen/thrift/gen-cpp/queryplan_types.h
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/gen/thrift/gen-php/Types.php
  ql/src/gen/thrift/gen-py/queryplan/ttypes.py
  ql/src/gen/thrift/gen-rb/queryplan_types.rb

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28455/

To: JIRA, yhuai


 update code generated by thrift for DemuxOperator and MuxOperator
 -

 Key: HIVE-4972
 URL: https://issues.apache.org/jira/browse/HIVE-4972
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4972.D11907.1.patch


 HIVE-2206 introduces two new operators, which are DemuxOperator and 
 MuxOperator. queryplan.thrift has been updated. But code generated by thrift 
 should be also updated

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4972:
---

Status: Patch Available  (was: Open)

 update code generated by thrift for DemuxOperator and MuxOperator
 -

 Key: HIVE-4972
 URL: https://issues.apache.org/jira/browse/HIVE-4972
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4972.D11907.1.patch


 HIVE-2206 introduces two new operators, which are DemuxOperator and 
 MuxOperator. queryplan.thrift has been updated. But code generated by thrift 
 should be also updated

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator

2013-07-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4972:
---

Affects Version/s: 0.12.0

 update code generated by thrift for DemuxOperator and MuxOperator
 -

 Key: HIVE-4972
 URL: https://issues.apache.org/jira/browse/HIVE-4972
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4972.D11907.1.patch


 HIVE-2206 introduces two new operators, which are DemuxOperator and 
 MuxOperator. queryplan.thrift has been updated. But code generated by thrift 
 should be also updated

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-31 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726023#comment-13726023
 ] 

Yin Huai commented on HIVE-2206:


i opened https://issues.apache.org/jira/browse/HIVE-4972 to update code 
generated by thrift

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-07-31 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726038#comment-13726038
 ] 

Mark Wagner commented on HIVE-2482:
---

I'm aware of .q files and have used them before, but I figured that UDFs are 
nice and isolated so a unit test is more appropriate. I didn't realize all the 
other UDFs had their own .q tests. I'll update with a .q test.

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Attachments: HIVE-2482.1.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string
 * UDF's to convert to/from non-string types using a particular serde

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726048#comment-13726048
 ] 

Hive QA commented on HIVE-4794:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595252/HIVE-4794.3-vectorization.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 3490 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table_json
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_column
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/267/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/267/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 HIVE-4794.3-vectorization.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4843:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch, HIVE-4843.5.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-31 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726058#comment-13726058
 ] 

Gunther Hagleitner commented on HIVE-4843:
--

Committed to trunk. Thanks Vikram!

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch, HIVE-4843.5.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly

2013-07-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726060#comment-13726060
 ] 

Hive QA commented on HIVE-4968:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595301/HIVE-4968.D11901.2.patch

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/268/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/268/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 When deduplicating multiple SelectOperators, we should update RowResolver 
 accordinly
 

 Key: HIVE-4968
 URL: https://issues.apache.org/jira/browse/HIVE-4968
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch


 {code:Sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT key, value
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code:sql}
 SELECT tmp3.key, tmp3.value, tmp3.count
 FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count
   FROM (SELECT *
 FROM src) tmp1
   JOIN (SELECT count(*) as count
 FROM src) tmp2
   ) tmp3;
 {\code}
 The plan is executable.
 {code}
 SELECT tmp4.key, tmp4.value, tmp4.count
 FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
   FROM (SELECT *
 FROM (SELECT key, value
   FROM src) tmp1 ) tmp2
   JOIN (SELECT count(*) as count
 FROM src) tmp3
   ) tmp4;
 {\code}
 The plan is not executable.
 The plan related to the MapJoin is
 {code}
  Stage: Stage-5
 Map Reduce Local Work
   Alias - Map Local Tables:
 tmp4:tmp2:tmp1:src 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 tmp4:tmp2:tmp1:src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: _col0, _col1
   HashTable Sink Operator
 condition expressions:
   0 
   1 {_col0}
 handleSkewJoin: false
 keys:
   0 []
   1 []
 Position of Big Table: 1
   Stage: Stage-4
 Map Reduce
   Alias - Map Operator Tree:
 $INTNAME 
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 
 1 {_col0}
   handleSkewJoin: false
   keys:
 0 []
 1 []
   outputColumnNames: _col2
   Position of Big Table: 1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: string
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Local Work:
 Map Reduce Local Work
 {\code}
 The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, 
 _col2'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4973) Compiler should captures UDF as part of read entities

2013-07-31 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-4973:
-

 Summary: Compiler should captures UDF as part of read entities
 Key: HIVE-4973
 URL: https://issues.apache.org/jira/browse/HIVE-4973
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar


The compiler doesn't capture UDF accessed by a query in the read/write entity. 
It will be a useful information to external plugin hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4827) Merge a Map-only task to its child task

2013-07-31 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726065#comment-13726065
 ] 

Gunther Hagleitner commented on HIVE-4827:
--

Committed to trunk. Thanks Yin!

 Merge a Map-only task to its child task
 ---

 Key: HIVE-4827
 URL: https://issues.apache.org/jira/browse/HIVE-4827
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, 
 HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch, HIVE-4827.7.patch, 
 HIVE-4827.8.patch


 When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a 
 Map-only job (MapJoin) to its following MapReduce job. But this merge only 
 happens when the MapReduce job has a single input. With Correlation Optimizer 
 (HIVE-2206), it is possible that the MapReduce job can have multiple inputs 
 (for multiple operation paths). It is desired to improve CommonJoinResolver 
 to merge a Map-only job to the corresponding Map task of the MapReduce job.
 Example:
 {code:sql}
 set hive.optimize.correlation=true;
 set hive.auto.convert.join=true;
 set hive.optimize.mapjoin.mapreduce=true;
 SELECT tmp1.key, count(*)
 FROM (SELECT x1.key1 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   GROUP BY x1.key1) tmp1
 JOIN (SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2)
   GROUP BY x2.key2) tmp2
 ON (tmp1.key = tmp2.key)
 GROUP BY tmp1.key;
 {\code}
 In this query, join operations inside tmp1 and tmp2 will be converted to two 
 MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of 
 tmp1 and tmp2, and the last aggregation will be executed in the same 
 MapReduce job (Reduce side). Since this MapReduce job has two inputs, right 
 now, CommonJoinResolver cannot attach two MapJoins to the Map side of a 
 MapReduce job.
 Another example:
 {code:sql}
 SELECT tmp1.key
 FROM (SELECT x1.key2 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   UNION ALL
   SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
 {\code}
 For this case, we will have three Map-only jobs (two for MapJoins and one for 
 Union). It will be good to use a single Map-only job to execute this query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >