[jira] [Created] (HIVE-4966) Introduce Collect_Map UDAF
Harish Butani created HIVE-4966: --- Summary: Introduce Collect_Map UDAF Key: HIVE-4966 URL: https://issues.apache.org/jira/browse/HIVE-4966 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Similar to Collect_Set. For e.g. on a Txn table {noformat} Txn(customer, product, amt) select customer, collect_map(product, amt) from txn group by customer {noformat} Would give you an activity map for each customer. Other thoughts: - have explode do the inverse on maps just as it does for sets today. - introduce a table function that outputs each value as a column. So in the e.g. above you get an activity matrix instead of a map. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724921#comment-13724921 ] Hive QA commented on HIVE-2564: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595112/HIVE-2564.2.patch {color:green}SUCCESS:{color} +1 2750 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/257/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/257/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Labels: patch Attachments: HIVE-2564.1.patch, HIVE-2564.2.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
[ https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724939#comment-13724939 ] Chris Drome commented on HIVE-4574: --- Thanks for your thoughts. We are concerned with 0.10 at this point, but your point is taken. Looking forward to HIVE-1511! XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck -- Key: HIVE-4574 URL: https://issues.apache.org/jira/browse/HIVE-4574 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4574.1.patch In open jdk7, XMLEncoder.writeObject call leads to calls to java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe because it uses a static WeakHashMap that would get used from multiple threads. See - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46 Concurrent access to HashMap implementation that are not thread safe can sometimes result in infinite-loops and other problems. If jdk7 is in use, it makes sense to synchronize calls to XMLEncoder.writeObject . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724946#comment-13724946 ] Alexey Zotov commented on HIVE-3442: Yep, I can add this info to https://cwiki.apache.org/confluence/display/Hive/AvroSerDe page. But proposed approach have a defect: if DataNode (some_datanode_address:50075) is down you won't have an ability to query data from Hive. I'm working on improvement of this approach. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED:
[jira] [Commented] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725042#comment-13725042 ] Hudson commented on HIVE-4920: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #38 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/38/]) HIVE-4920 PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim (Brock Noland via egc) Submitted by: Brock Noland Reviewed by:Edward Capriolo (ecapriolo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508707) * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/client/PTestClient.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/request/TestStartRequest.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/ExecutionController.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/TestExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Constants.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Drone.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ExecutionPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutorBuilder.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JIRAService.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JUnitReportParser.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/LogDirectoryCleaner.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Phase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PrepPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ReportingPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/ExecutionContextConfiguration.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/QFileTestBatch.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudComputeService.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/AbstractSSHCommand.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/RSyncCommandExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/SSHCommandExecutor.java * /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm * /hive/trunk/testutils/ptest2/src/main/resources/log4j.properties * /hive/trunk/testutils/ptest2/src/main/resources/source-prep.vm * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/api/server/TestTestExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/AbstractTestPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockLocalCommandFactory.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockRSyncCommandExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockSSHCommandExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.testExecute.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingQFile.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingUnitTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingQFileTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingUnitTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestHostExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestPhase.java *
[jira] [Commented] (HIVE-4962) fix eclipse template broken by HIVE-3256
[ https://issues.apache.org/jira/browse/HIVE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725075#comment-13725075 ] Hudson commented on HIVE-4962: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #110 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/110/]) Hive-4962 Fix eclipse templates broken from ASM changes (Yin Huai via egc) Submitted by: Yin Huai Reviewed by: Edward Capriolo (ecapriolo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508706) * /hive/trunk/eclipse-templates/.classpath * /hive/trunk/eclipse-templates/.classpath._hbase fix eclipse template broken by HIVE-3256 Key: HIVE-4962 URL: https://issues.apache.org/jira/browse/HIVE-4962 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-4962.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF
[ https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725298#comment-13725298 ] Carter Shanklin commented on HIVE-4966: --- Hi Harish, I recently found need for a collect_array UDF that would maintain ordering and duplicates. I actually just changed a few things out of collect_set. Do you think that a collect_array would be generally useful? If so, would it make sense to combine these into one UDAF to minimize code duplication? Introduce Collect_Map UDAF -- Key: HIVE-4966 URL: https://issues.apache.org/jira/browse/HIVE-4966 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Similar to Collect_Set. For e.g. on a Txn table {noformat} Txn(customer, product, amt) select customer, collect_map(product, amt) from txn group by customer {noformat} Would give you an activity map for each customer. Other thoughts: - have explode do the inverse on maps just as it does for sets today. - introduce a table function that outputs each value as a column. So in the e.g. above you get an activity matrix instead of a map. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: TestCliDriver Failed Test
Try doing a very-clean. I think you just have an old version of DN in your ivy cache and based on my experience with ivy it's cannot handle that. On Wed, Jul 31, 2013 at 9:37 AM, nikolaus.st...@researchgate.net wrote: Hi, When running the following command: ant test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true on a clean hive-trunk checkout, I get the following failed test: test: [echo] Project: ql [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/share/ant/lib/** ant.jar!/org/apache/tools/ant/**Project.class [junit] and jar:file:/Users/niko/Repos/** hive-trunk/build/ivy/lib/**hadoop0.20S.shim/ant-1.6.5.** jar!/org/apache/tools/ant/**Project.class [junit] Hive history file=/Users/niko/Repos/hive-** trunk/build/ql/tmp/hive_job_**log_604cbdc7-f546-4a74-bba2-** 43f7c2885811_1343059998.txt [junit] 2013-07-31 07:19:49.366 java[15847:1203] Unable to load realm info from SCDynamicStore [junit] Exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.**metastore.HiveMetaStoreClient [junit] Running org.apache.hadoop.hive.cli.**TestCliDriver [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] org.apache.hadoop.hive.ql.**metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.* *metastore.HiveMetaStoreClient [junit] at org.apache.hadoop.hive.ql.** metadata.Hive.dropTable(Hive.**java:875) [junit] at org.apache.hadoop.hive.ql.** metadata.Hive.dropTable(Hive.**java:851) [junit] at org.apache.hadoop.hive.ql.** QTestUtil.cleanUp(QTestUtil.**java:513) [junit] at org.apache.hadoop.hive.cli.**TestCliDriver.clinit(** TestCliDriver.java:48) [junit] at java.lang.Class.forName0(**Native Method) [junit] at java.lang.Class.forName(Class.**java:171) [junit] at org.apache.tools.ant.taskdefs.**optional.junit.** JUnitTestRunner.run(**JUnitTestRunner.java:373) [junit] at org.apache.tools.ant.taskdefs.**optional.junit.** JUnitTestRunner.launch(**JUnitTestRunner.java:1052) [junit] at org.apache.tools.ant.taskdefs.**optional.junit.** JUnitTestRunner.main(**JUnitTestRunner.java:906) [junit] Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.**metastore.HiveMetaStoreClient [junit] at org.apache.hadoop.hive.**metastore.MetaStoreUtils.** newInstance(MetaStoreUtils.**java:1212) [junit] at org.apache.hadoop.hive.**metastore.** RetryingMetaStoreClient.init**(RetryingMetaStoreClient.java:**51) [junit] at org.apache.hadoop.hive.**metastore.** RetryingMetaStoreClient.**getProxy(**RetryingMetaStoreClient.java:**61) [junit] at org.apache.hadoop.hive.ql.**metadata.Hive.** createMetaStoreClient(Hive.**java:2357) [junit] at org.apache.hadoop.hive.ql.**metadata.Hive.getMSC(Hive.* *java:2368) [junit] at org.apache.hadoop.hive.ql.** metadata.Hive.dropTable(Hive.**java:869) [junit] ... 8 more [junit] Caused by: java.lang.reflect.**InvocationTargetException [junit] at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native Method) [junit] at sun.reflect.**NativeConstructorAccessorImpl.** newInstance(**NativeConstructorAccessorImpl.**java:39) [junit] at sun.reflect.**DelegatingConstructorAccessorI** mpl.newInstance(**DelegatingConstructorAccessorI**mpl.java:27) [junit] at java.lang.reflect.Constructor.** newInstance(Constructor.java:**513) [junit] at org.apache.hadoop.hive.**metastore.MetaStoreUtils.** newInstance(MetaStoreUtils.**java:1210) [junit] ... 13 more [junit] Caused by: javax.jdo.**JDOFatalInternalException: Unexpected exception caught. [junit] NestedThrowables: [junit] java.lang.reflect.**InvocationTargetException [junit] at javax.jdo.JDOHelper.**invokeGetPersistenceManagerFac** toryOnImplementation(**JDOHelper.java:1193) [junit] at javax.jdo.JDOHelper.**getPersistenceManagerFactory(** JDOHelper.java:808) [junit] at javax.jdo.JDOHelper.**getPersistenceManagerFactory(** JDOHelper.java:701) [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.getPMF(* *ObjectStore.java:266) [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.** getPersistenceManager(**ObjectStore.java:295) [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.** initialize(ObjectStore.java:**228) [junit] at org.apache.hadoop.hive.**metastore.ObjectStore.setConf( **ObjectStore.java:203) [junit] at org.apache.hadoop.util.**ReflectionUtils.setConf(** ReflectionUtils.java:62) [junit] at org.apache.hadoop.util.**ReflectionUtils.newInstance(** ReflectionUtils.java:117) [junit] at org.apache.hadoop.hive.**metastore.RetryingRawStore.**
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725332#comment-13725332 ] Hudson commented on HIVE-4525: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725328#comment-13725328 ] Hudson commented on HIVE-3256: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-3256: Update asm version in Hive (Ashutosh Chauhan via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508506) * /hive/trunk/ivy/libraries.properties * /hive/trunk/metastore/ivy.xml Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Fix For: 0.12.0 Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725330#comment-13725330 ] Hudson commented on HIVE-3264: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-3264 : Add support for binary dataype to AvroSerde (Eli Reisman Mark Wagner via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508528) * /hive/trunk/data/files/csv.txt * /hive/trunk/ql/src/test/queries/clientpositive/avro_nullable_fields.q * /hive/trunk/ql/src/test/results/clientpositive/avro_nullable_fields.q.out * /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java Add support for binary dataype to AvroSerde --- Key: HIVE-3264 URL: https://issues.apache.org/jira/browse/HIVE-3264 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Jakob Homan Assignee: Eli Reisman Labels: patch Fix For: 0.12.0 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints. Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725327#comment-13725327 ] Hudson commented on HIVE-4920: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-4920 PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim (Brock Noland via egc) Submitted by: Brock Noland Reviewed by:Edward Capriolo (ecapriolo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508707) * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/client/PTestClient.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/request/TestStartRequest.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/ExecutionController.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/api/server/TestExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Constants.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Drone.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ExecutionPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/HostExecutorBuilder.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JIRAService.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/JUnitReportParser.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/LogDirectoryCleaner.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/Phase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PrepPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ReportingPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/ExecutionContextConfiguration.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/QFileTestBatch.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudComputeService.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/AbstractSSHCommand.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/RSyncCommandExecutor.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/ssh/SSHCommandExecutor.java * /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm * /hive/trunk/testutils/ptest2/src/main/resources/log4j.properties * /hive/trunk/testutils/ptest2/src/main/resources/source-prep.vm * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/api/server/TestTestExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/AbstractTestPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockLocalCommandFactory.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockRSyncCommandExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/MockSSHCommandExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.testExecute.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingQFile.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testFailingUnitTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingQFileTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.testPassingUnitTest.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestHostExecutor.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestPhase.java *
[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause
[ https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725329#comment-13725329 ] Hudson commented on HIVE-4928: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-4928 : Date literals do not work properly in partition spec clause (Jason Dere via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q * /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out Date literals do not work properly in partition spec clause --- Key: HIVE-4928 URL: https://issues.apache.org/jira/browse/HIVE-4928 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.12.0 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch The partition spec parsing doesn't do any actual real evaluation of the values in the partition spec, instead just taking the text value of the ASTNode representing the partition value. This works fine for string/numeric literals (expression tree below): (TOK_PARTVAL region 99) But not for Date literals which are of form DATE '-mm-dd' (expression tree below: (TOK_DATELITERAL '1999-12-31') In this case the parser/analyzer uses TOK_DATELITERAL as the partition column value, when it should really get value of the child of the DATELITERAL token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4962) fix eclipse template broken by HIVE-3256
[ https://issues.apache.org/jira/browse/HIVE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725333#comment-13725333 ] Hudson commented on HIVE-4962: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) Hive-4962 Fix eclipse templates broken from ASM changes (Yin Huai via egc) Submitted by: Yin Huai Reviewed by: Edward Capriolo (ecapriolo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508706) * /hive/trunk/eclipse-templates/.classpath * /hive/trunk/eclipse-templates/.classpath._hbase fix eclipse template broken by HIVE-3256 Key: HIVE-4962 URL: https://issues.apache.org/jira/browse/HIVE-4962 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-4962.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725331#comment-13725331 ] Hudson commented on HIVE-2702: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality --- Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4952: -- Attachment: HIVE-4952.D11889.2.patch yhuai updated the revision HIVE-4952 [jira] When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results. - Merge remote-tracking branch 'origin/trunk' into HIVE-4952 - Merge branch 'trunk' of https://github.com/apache/hive into HIVE-4952 - update comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11889 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11889?vs=36531id=36657#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java ql/src/test/queries/clientpositive/correlationoptimizer15.q ql/src/test/results/clientpositive/correlationoptimizer15.q.out To: JIRA, yhuai When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Thanks Nitin There arent too many connections in close_wait state only 1 or two when we run into this. Most likely its because of dropped connection. I could not find any read or write timeouts we can set for the thrift server which will tell thrift to hold on to the client connection. See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem to have been implemented yet. We do have set a client connection timeout but cannot find an equivalent setting for the server. We have a suspicion that this happens when we run two client processes which modify two distinct partitions of the same hive table. We put in a workaround so that the two hive client processes never run together and so far things look ok but we will keep monitoring. Could it be because hive metastore server is not thread safe, would running two alter table statements on two distinct partitions of the same table using two client connections cause problems like these, where hive metastore server closes or drops a wrong client connection and leaves the other hanging? Agateaaa On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. code public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); /code From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null. so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote: Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: (was: HIVE-4870.patch) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: HIVE-4870.patch Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Status: Open (was: Patch Available) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Open (was: Patch Available) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.1.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Patch Available (was: Open) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.1.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Attachment: HIVE-4950.1.patch Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.1.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Open (was: Patch Available) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.1.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran resolved HIVE-4950. -- Resolution: Not A Problem Suspending child is supported already using --debug:childSuspend Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Attachment: (was: HIVE-4950.1.patch) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Attachment: (was: HIVE-4950.patch) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725486#comment-13725486 ] Sergey Shelukhin commented on HIVE-4051: {quote}Has anyone tested this patch on Derby, PostgreSQL, or Oracle? Until it's verified to work on these DBs I think this new code should be disabled by default.{quote} I tested on Derby and MySQL so far. Note that full fallback is there, so it could have a 3-position switch or two settings - current on/off being the same, and the on, but turn off [for some grace period?] on first error-setting. The latter could be the default, so in case if it fails it goes back to DN and doesn't introduce a lot of extra load. What do you think? Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID` IS NOT NULL 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL) {code} This data is not detached or cached, so this operation is performed during every query plan for the partitions, even in the same hive client. The queries are automatically generated by JDO/DataNucleus which makes it nearly impossible to rewrite it into a single denormalized join operation process it locally. Attempts to optimize this with JDO fetch-groups did not bear fruit in improving the query count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4967) Don't serialize unnecessary fields in query plan
Ashutosh Chauhan created HIVE-4967: -- Summary: Don't serialize unnecessary fields in query plan Key: HIVE-4967 URL: https://issues.apache.org/jira/browse/HIVE-4967 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan There are quite a few fields which need not to be serialized since they are initialized anyways in backend. We need not to serialize them in our plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4967) Don't serialize unnecessary fields in query plan
[ https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4967: --- Attachment: HIVE-4967.patch Patch which adds transient keyword to all such fields. Don't serialize unnecessary fields in query plan Key: HIVE-4967 URL: https://issues.apache.org/jira/browse/HIVE-4967 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-4967.patch There are quite a few fields which need not to be serialized since they are initialized anyways in backend. We need not to serialize them in our plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4967) Don't serialize unnecessary fields in query plan
[ https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4967: --- Status: Patch Available (was: Open) Ready for review. Already ran through full test suite. All tests passed. Don't serialize unnecessary fields in query plan Key: HIVE-4967 URL: https://issues.apache.org/jira/browse/HIVE-4967 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-4967.patch There are quite a few fields which need not to be serialized since they are initialized anyways in backend. We need not to serialize them in our plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Status: Open (was: Patch Available) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Hadoop Flags: Reviewed Status: Patch Available (was: Open) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.1.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: HIVE-4870.1.patch Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.1.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: (was: HIVE-4870.patch) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.1.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725513#comment-13725513 ] Hive QA commented on HIVE-4952: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595206/HIVE-4952.D11889.2.patch {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/259/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/259/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4968) Broken plan in MapJoin
Yin Huai created HIVE-4968: -- Summary: Broken plan in MapJoin Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4388: --- Attachment: HIVE-4388.patch Uploading to get a full test run. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4968) Broken plan in MapJoin
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725551#comment-13725551 ] Yin Huai commented on HIVE-4968: I have difficulty to summarize this problem in a concise and precise way... Will update the summary once I find a good one. Broken plan in MapJoin -- Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4789) FetchOperator fails on partitioned Avro data
[ https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4789: --- Attachment: HIVE-4789.2.patch.txt I applied Sean's patch and then test the tests with overwrite turned on. Attaching here to get another test run. FetchOperator fails on partitioned Avro data Key: HIVE-4789 URL: https://issues.apache.org/jira/browse/HIVE-4789 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt HIVE-3953 fixed using partitioned avro tables for anything that used the MapOperator, but those that rely on FetchOperator still fail with the same error. e.g. {code} SELECT * FROM partitioned_avro LIMIT 5; SELECT * FROM partitioned_avro WHERE partition_col=value; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4789) FetchOperator fails on partitioned Avro data
[ https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4789: --- Status: Patch Available (was: Open) FetchOperator fails on partitioned Avro data Key: HIVE-4789 URL: https://issues.apache.org/jira/browse/HIVE-4789 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt HIVE-3953 fixed using partitioned avro tables for anything that used the MapOperator, but those that rely on FetchOperator still fail with the same error. e.g. {code} SELECT * FROM partitioned_avro LIMIT 5; SELECT * FROM partitioned_avro WHERE partition_col=value; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4954: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Thank you for your contribution! I have committed this to trunk. PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.12.0 Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725609#comment-13725609 ] Hive QA commented on HIVE-4870: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595226/HIVE-4870.1.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 2748 tests executed *Failed tests:* {noformat} org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/260/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/260/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.1.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4794: -- Attachment: HIVE-4794.3.patch Updated comments. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725623#comment-13725623 ] Tony Murphy commented on HIVE-4794: --- ran tests now that all dependent patches are in and merge with trunk complete. tests pass 100%. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 13021: Vectorization Tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13021/ --- (Updated July 31, 2013, 7:26 p.m.) Review request for hive, Eric Hanson, Jitendra Pandey, Remus Rusanu, and Sarvesh Sakalanaga. Changes --- updated comments Bugs: HIVE-4794 https://issues.apache.org/jira/browse/HIVE-4794 Repository: hive-git Description --- These test cover all types, aggregates, and operators currently supported for vectorization. The queries are executed over a specially crafted data set which covers all the interesting classes of batch for each type: all nulls, repeating value, no nulls, and random values, to fully exercise the vectorization stack. The queries were stabilized against a text test oracle in order to validate results. This patch depends on: HIVE-4525 HIVE-4922 HIVE-4931 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java 97436c5 ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 79390a9 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/AllVectorTypesRecord.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/OrcFileGenerator.java PRE-CREATION ql/src/test/queries/clientpositive/vectorization_0.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_1.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_10.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_11.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_12.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_13.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_14.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_15.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_16.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_2.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_3.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_4.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_5.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_6.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_7.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_8.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_9.q PRE-CREATION ql/src/test/results/clientpositive/vectorization_0.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_1.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_10.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_11.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_12.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_13.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_14.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_15.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_16.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_2.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_3.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_4.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_5.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_6.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_7.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_8.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_9.q.out PRE-CREATION Diff: https://reviews.apache.org/r/13021/diff/ Testing --- Thanks, tony murphy
[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4794: -- Status: Patch Available (was: Open) Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725632#comment-13725632 ] Tony Murphy commented on HIVE-4794: --- I ran the tests in the vectorization branch which just pulled from trunk. as far as i know we don't have precommit testing for branches yet. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4794: -- Attachment: HIVE-4794.3-vectorization.patch fix patch format to get precommit patching Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, HIVE-4794.3-vectorization.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725677#comment-13725677 ] Yin Huai commented on HIVE-4968: New summary has been updated When deduplicate multiple SelectOperators, we should update RowResolver accordinly -- Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4968: --- Summary: When deduplicate multiple SelectOperators, we should update RowResolver accordinly (was: Broken plan in MapJoin) When deduplicate multiple SelectOperators, we should update RowResolver accordinly -- Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: TestCliDriver Failed Test
Thanks Noland. I tried that but now I'm getting more errors (see below). It seems that the java compiler isn't recognizing the package for this test. Here's the relevant output, after running the same test as before with the very-clean option (I.e.: ant very-clean test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true ): set-test-classpath: compile-test: [echo] Project: ql [javac] Compiling 105 source files to /Users/niko/Repos/hive-trunk/build/ql/test/classes [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:21: package org.apache.hadoop.hive.metastore does not exist [javac] import static org.apache.hadoop.hive.metastore.MetaStoreUtils.DEFAULT_DATABASE_NAME; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:21: static import only from classes and interfaces [javac] import static org.apache.hadoop.hive.metastore.MetaStoreUtils.DEFAULT_DATABASE_NAME; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:55: package org.apache.hadoop.hive.cli does not exist [javac] import org.apache.hadoop.hive.cli.CliDriver; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:56: package org.apache.hadoop.hive.cli does not exist [javac] import org.apache.hadoop.hive.cli.CliSessionState; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:57: package org.apache.hadoop.hive.common.io does not exist [javac] import org.apache.hadoop.hive.common.io.CachingPrintStream; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:58: package org.apache.hadoop.hive.conf does not exist [javac] import org.apache.hadoop.hive.conf.HiveConf; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:59: package org.apache.hadoop.hive.metastore does not exist [javac] import org.apache.hadoop.hive.metastore.MetaStoreUtils; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:60: package org.apache.hadoop.hive.metastore.api does not exist [javac] import org.apache.hadoop.hive.metastore.api.Index; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:76: package org.apache.hadoop.hive.serde does not exist [javac] import org.apache.hadoop.hive.serde.serdeConstants; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:77: package org.apache.hadoop.hive.serde2.thrift does not exist [javac] import org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:78: package org.apache.hadoop.hive.serde2.thrift.test does not exist [javac] import org.apache.hadoop.hive.serde2.thrift.test.Complex; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:79: package org.apache.hadoop.hive.shims does not exist [javac] import org.apache.hadoop.hive.shims.HadoopShims; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:80: package org.apache.hadoop.hive.shims does not exist [javac] import org.apache.hadoop.hive.shims.ShimLoader; [javac]^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:112: cannot find symbol [javac] symbol : class HiveConf [javac] location: class org.apache.hadoop.hive.ql.QTestUtil [javac] protected HiveConf conf; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:117: cannot find symbol [javac] symbol : class CliDriver [javac] location: class org.apache.hadoop.hive.ql.QTestUtil [javac] private CliDriver cliDriver; [javac] ^ [javac] /Users/niko/Repos/hive-trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:118: package HadoopShims does not exist [javac] private HadoopShims.MiniMrShim mr = null; [javac] ^ [javac]
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725695#comment-13725695 ] Xuefu Zhang commented on HIVE-4844: --- [~jdere]: 1. I'm not sure which way is the better, but I feel that adding additional columns seems cleaner in my opinion. 2. I could be off the topic on inheritance. I guess what I tried to say was some types, for instance, string, CHAR, and VARCHAR are very similar and may share a lot of implementations. This would also apply to DECIMAL and DECIMAL(p,s) also. However, I haven't figured out the implications yet. Please share your insights. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725702#comment-13725702 ] Edward Capriolo commented on HIVE-4844: --- Ideally it would be best if by default the field has no parameters to not have to store any additional data in the metastore. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4968: -- Attachment: HIVE-4968.D11901.1.patch yhuai requested code review of HIVE-4968 [jira] When deduplicate multiple SelectOperators, we should update RowResolver accordinly. Reviewers: JIRA Merge remote-tracking branch 'origin/trunk' into HIVE-4968 SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; The plan is executable. SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; The plan is executable. SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; The plan is not executable. The plan related to the MapJoin is Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11901 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/28407/ To: JIRA, yhuai When deduplicate multiple SelectOperators, we should update RowResolver accordinly -- Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count
[jira] [Updated] (HIVE-4968) When deduplicate multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4968: --- Status: Patch Available (was: Open) When deduplicate multiple SelectOperators, we should update RowResolver accordinly -- Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4968: --- Summary: When deduplicating multiple SelectOperators, we should update RowResolver accordinly (was: When deduplicate multiple SelectOperators, we should update RowResolver accordinly) When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725754#comment-13725754 ] Hive QA commented on HIVE-4870: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595226/HIVE-4870.1.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 2749 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/262/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/262/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4870.1.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725783#comment-13725783 ] Phabricator commented on HIVE-4968: --- ashutoshc has accepted the revision HIVE-4968 [jira] When deduplicate multiple SelectOperators, we should update RowResolver accordinly. Looks good. Some minor comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java:399 You are not using this method. Lets not add this. ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java:104 Can you add a comment saying something like we need to set row resolver of parent from the child which is in parse context to preserve column mappings. Feel free to improve on the wording here. REVISION DETAIL https://reviews.facebook.net/D11901 BRANCH HIVE-4968 ARCANIST PROJECT hive To: JIRA, ashutoshc, yhuai When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4968: --- Status: Open (was: Patch Available) When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4960) lastAlias in CommonJoinOperator is not used
[ https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725787#comment-13725787 ] Phabricator commented on HIVE-4960: --- ashutoshc has accepted the revision HIVE-4960 [jira] lastAlias in CommonJoinOperator is not used. +1 REVISION DETAIL https://reviews.facebook.net/D11895 BRANCH HIVE-4960 ARCANIST PROJECT hive To: JIRA, ashutoshc, yhuai lastAlias in CommonJoinOperator is not used --- Key: HIVE-4960 URL: https://issues.apache.org/jira/browse/HIVE-4960 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-4960.D11895.1.patch In CommonJoinOperator, there is object called lastAlias. The initial value of this object is 'null'. After tracing the usage of this object, I found that there is no place to change the value of this object. Also, it is only used in processOp in JoinOperator and MapJoinOperator as {code} if ((lastAlias == null) || (!lastAlias.equals(alias))) { nextSz = joinEmitInterval; } {\code} Since lastAlias will always be null, we will assign joinEmitInterval to nextSz every time we get a row. Later in processOp, we have {code} nextSz = getNextSize(nextSz); {\code} Because we reset the value of nextSz to joinEmitInterval every time we get a row, seems that getNextSize will not be used as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
Venki Korukanti created HIVE-4969: - Summary: HCatalog HBaseHCatStorageHandler is not returning all the data Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on unordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-4969: -- Description: Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned has only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. was: Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on unordered array. HCatalog HBaseHCatStorageHandler is not returning all the data -- Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' =
[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-4969: -- Attachment: HIVE-4969-1.patch HCatalog HBaseHCatStorageHandler is not returning all the data -- Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Fix For: 0.11.1, 0.12.0 Attachments: HIVE-4969-1.patch Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned has only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-4969: -- Attachment: (was: HIVE-4969-1.patch) HCatalog HBaseHCatStorageHandler is not returning all the data -- Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Fix For: 0.11.1, 0.12.0 Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned has only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725807#comment-13725807 ] Venki Korukanti commented on HIVE-4969: --- attached a patch to sort KeyValue array before creating HBase Result object. HCatalog HBaseHCatStorageHandler is not returning all the data -- Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Fix For: 0.11.1, 0.12.0 Attachments: HIVE-4969-1.patch Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned has only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
[ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-4969: -- Attachment: HIVE-4969-1.patch HCatalog HBaseHCatStorageHandler is not returning all the data -- Key: HIVE-4969 URL: https://issues.apache.org/jira/browse/HIVE-4969 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Venki Korukanti Priority: Critical Fix For: 0.11.1, 0.12.0 Attachments: HIVE-4969-1.patch Repro steps: 1) Create an HCatalog table mapped to HBase table. hcat -e CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES('hbase.table.name' ='studentHBase', 'hbase.columns.mapping' = ':key,onecf:name,twocf:age,threecf:gpa'); 2) Load the following data from Pig. cat student_data 1^Asarah laertes^A23^A2.40 2^Atom allen^A72^A1.57 3^Abob ovid^A61^A2.67 4^Aethan nixon^A38^A2.15 5^Acalvin robinson^A28^A2.53 6^Airene ovid^A65^A2.56 7^Ayuri garcia^A36^A1.65 8^Acalvin nixon^A41^A1.04 9^Ajessica davidson^A48^A2.11 10^Akatie king^A39^A1.05 grunt A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); grunt STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); 3) Now from HBase do a scan on the studentHBase table hbase(main):026:0 scan 'studentPig', {LIMIT = 5} 4) From pig access the data in table grunt A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); grunt STORE A INTO '/user/root/studentPig'; 5) Verify the output written in StudentPig hadoop fs -cat /user/root/studentPig/part-r-0 1 23 2 72 3 61 4 38 5 28 6 65 7 36 8 41 9 48 10 39 The data returned has only two fields (rownum and age). Problem: While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4967) Don't serialize unnecessary fields in query plan
[ https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725845#comment-13725845 ] Hive QA commented on HIVE-4967: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595224/HIVE-4967.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2749 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/263/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/263/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Don't serialize unnecessary fields in query plan Key: HIVE-4967 URL: https://issues.apache.org/jira/browse/HIVE-4967 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-4967.patch There are quite a few fields which need not to be serialized since they are initialized anyways in backend. We need not to serialize them in our plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4968: -- Attachment: HIVE-4968.D11901.2.patch yhuai updated the revision HIVE-4968 [jira] When deduplicate multiple SelectOperators, we should update RowResolver accordinly. addressed Ashutosh's comments Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D11901 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11901?vs=36669id=36693#toc BRANCH HIVE-4968 ARCANIST PROJECT hive AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out To: JIRA, ashutoshc, yhuai When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4968: --- Status: Patch Available (was: Open) addressed Ashutosh's comments When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF
[ https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725882#comment-13725882 ] Harish Butani commented on HIVE-4966: - For my understanding, are you adding a new function collect_array or are you enhancing collect_set to have a dedup=true/false option. The signatures of collect_map and collect_set/array are different. So we have to expose them as separate fns. But open to sharing a single implementation. Makes sense. What specifically do you have in mind? Introduce Collect_Map UDAF -- Key: HIVE-4966 URL: https://issues.apache.org/jira/browse/HIVE-4966 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Similar to Collect_Set. For e.g. on a Txn table {noformat} Txn(customer, product, amt) select customer, collect_map(product, amt) from txn group by customer {noformat} Would give you an activity map for each customer. Other thoughts: - have explode do the inverse on maps just as it does for sets today. - introduce a table function that outputs each value as a column. So in the e.g. above you get an activity matrix instead of a map. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated HIVE-2482: -- Attachment: HIVE-2482.1.patch I've implemented the hex, encoding, and base64 UDFs along with unit tests. I've also changed Unhex to return a binary instead of wrapping it's output as a string. This is an incompatible change, but I think it's ultimately the right thing to do. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Assignee: Mark Wagner Attachments: HIVE-2482.1.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string * UDF's to convert to/from non-string types using a particular serde -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4970) BinaryConverter does not respect nulls
Mark Wagner created HIVE-4970: - Summary: BinaryConverter does not respect nulls Key: HIVE-4970 URL: https://issues.apache.org/jira/browse/HIVE-4970 Project: Hive Issue Type: Bug Reporter: Mark Wagner Assignee: Mark Wagner Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not handle null values the same as the other converters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4970) BinaryConverter does not respect nulls
[ https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated HIVE-4970: -- Attachment: HIVE-4970.1.patch This patch makes BinaryConverter match the other primitive converters BinaryConverter does not respect nulls -- Key: HIVE-4970 URL: https://issues.apache.org/jira/browse/HIVE-4970 Project: Hive Issue Type: Bug Reporter: Mark Wagner Assignee: Mark Wagner Attachments: HIVE-4970.1.patch Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not handle null values the same as the other converters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4879) Window functions that imply order can only be registered at compile time
[ https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4879: -- Attachment: HIVE-4879.4.patch.txt Window functions that imply order can only be registered at compile time Key: HIVE-4879 URL: https://issues.apache.org/jira/browse/HIVE-4879 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.12.0 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, HIVE-4879.3.patch.txt, HIVE-4879.4.patch.txt Adding an annotation for impliesOrder -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4966) Introduce Collect_Map UDAF
[ https://issues.apache.org/jira/browse/HIVE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725943#comment-13725943 ] Edward Capriolo commented on HIVE-4966: --- I have a working collect here https://github.com/edwardcapriolo/hive-collect/blob/master/src/main/java/com/jointhegrid/udf/collect/GenericUDAFCollect.java I was going to add it to hive but you can if you would like. Introduce Collect_Map UDAF -- Key: HIVE-4966 URL: https://issues.apache.org/jira/browse/HIVE-4966 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Similar to Collect_Set. For e.g. on a Txn table {noformat} Txn(customer, product, amt) select customer, collect_map(product, amt) from txn group by customer {noformat} Would give you an activity map for each customer. Other thoughts: - have explode do the inverse on maps just as it does for sets today. - introduce a table function that outputs each value as a column. So in the e.g. above you get an activity matrix instead of a map. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4970) BinaryConverter does not respect nulls
[ https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725944#comment-13725944 ] Edward Capriolo commented on HIVE-4970: --- Ideally we would like a q test to exersize the code and possibly a standard unit test as well. BinaryConverter does not respect nulls -- Key: HIVE-4970 URL: https://issues.apache.org/jira/browse/HIVE-4970 Project: Hive Issue Type: Bug Reporter: Mark Wagner Assignee: Mark Wagner Attachments: HIVE-4970.1.patch Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not handle null values the same as the other converters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4971) Unit test failure in TestVectorTimestampExpressions
[ https://issues.apache.org/jira/browse/HIVE-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4971: --- Description: Unit test testVectorUDFUnixTimeStampLong is failing TestVectorTimestampExpressions. (was: Unit test is failing TestVectorTimestampExpressions, failure message=expected:lt;-2gt; but was:lt;-1gt; type=junit.framework.AssertionFailedErrorjunit.framework.AssertionFailedError: expected:lt;- 2gt; but was:lt;-1gt; at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:282) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:136) at junit.framework.Assert.assertEquals(Assert.java:142) at org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.compareToUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:495) at org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.verifyUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:513) at org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong(TestVectorTimestampExpressions.java:546)) Unit test failure in TestVectorTimestampExpressions --- Key: HIVE-4971 URL: https://issues.apache.org/jira/browse/HIVE-4971 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Gopal V Unit test testVectorUDFUnixTimeStampLong is failing TestVectorTimestampExpressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725947#comment-13725947 ] Edward Capriolo commented on HIVE-2482: --- It is nice that you have written traditional junit tests which can stay but we normally do this with q file. You can look at the developer guide on the wiki to understand how to write these. I an help you along as well because...I do not want your UDFs ending up like mine :) https://issues.apache.org/jira/browse/HIVE-1262 Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Assignee: Mark Wagner Attachments: HIVE-2482.1.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string * UDF's to convert to/from non-string types using a particular serde -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4541) Run check-style on the branch and fix style issues.
[ https://issues.apache.org/jira/browse/HIVE-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4541: --- Attachment: HIVE-4541.2.patch Attached patch also fixes many issues in the templates. Run check-style on the branch and fix style issues. --- Key: HIVE-4541 URL: https://issues.apache.org/jira/browse/HIVE-4541 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4541.1.patch, HIVE-4541.2.patch We should run check style on the entire branch and fix issues before the branch is merged back to the trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725951#comment-13725951 ] Hive QA commented on HIVE-4388: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595231/HIVE-4388.patch {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/264/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/264/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4914) filtering via partition name should be done inside metastore server
[ https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-4914: -- Assignee: Sergey Shelukhin filtering via partition name should be done inside metastore server --- Key: HIVE-4914 URL: https://issues.apache.org/jira/browse/HIVE-4914 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Currently, if the filter pushdown is impossible (which is most cases), the client gets all partition names from metastore, filters them, and asks for partitions by names for the filtered set. Metastore server code should do that instead; it should check if pushdown is possible and do it if so; otherwise it should do name-based filtering. Saves the roundtrip with all partition names from the server to client, and also removes the need to have pushdown viability checking on both sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data
[ https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725986#comment-13725986 ] Hive QA commented on HIVE-4789: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595235/HIVE-4789.2.patch.txt {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/265/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/265/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. FetchOperator fails on partitioned Avro data Key: HIVE-4789 URL: https://issues.apache.org/jira/browse/HIVE-4789 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt HIVE-3953 fixed using partitioned avro tables for anything that used the MapOperator, but those that rely on FetchOperator still fail with the same error. e.g. {code} SELECT * FROM partitioned_avro LIMIT 5; SELECT * FROM partitioned_avro WHERE partition_col=value; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 10698: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10698/ --- (Updated Aug. 1, 2013, 3:23 a.m.) Review request for hive and Carl Steinbach. Changes --- rebased patch, added more test cases for set and dfs commands Bugs: HIVE-4395 https://issues.apache.org/jira/browse/HIVE-4395 Repository: hive-git Description --- Support fetch-from-start for hiveserver2 fetch operations. - Handle new fetch orientation for various HS2 operations. - Added support to reset the read position in Hive driver - Enabled scroll cursors with support for positioning cursor to start of resultset Diffs (updated) - jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 00f4351 jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 61985d1 jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 982ceb8 jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java 1042125 ql/src/java/org/apache/hadoop/hive/ql/Context.java 2a3ee24 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 2a6b944 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java df2ccf1 ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c service/src/java/org/apache/hive/service/cli/operation/DfsOperation.java a8b8ed4 service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java 581e69c service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java af87a90 service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java 0fe01c0 service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java bafe40c service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java eaf867e service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java d9d0e9c service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java 2daa9cd service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java 0a8825e service/src/java/org/apache/hive/service/cli/operation/Operation.java 6f4b8dc service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 976a1ef Diff: https://reviews.apache.org/r/10698/diff/ Testing --- Added new JDBC test cases. Thanks, Prasad Mujumdar
[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults
[ https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4395: -- Status: Patch Available (was: Open) rebased the patch Support TFetchOrientation.FIRST for HiveServer2 FetchResults Key: HIVE-4395 URL: https://issues.apache.org/jira/browse/HIVE-4395 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch Currently HiveServer2 only support fetching next row (TFetchOrientation.NEXT). This ticket is to implement support for TFetchOrientation.FIRST that resets the fetch position at the begining of the resultset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults
[ https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4395: -- Attachment: HIVE-4395.2.patch Support TFetchOrientation.FIRST for HiveServer2 FetchResults Key: HIVE-4395 URL: https://issues.apache.org/jira/browse/HIVE-4395 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch Currently HiveServer2 only support fetching next row (TFetchOrientation.NEXT). This ticket is to implement support for TFetchOrientation.FIRST that resets the fetch position at the begining of the resultset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726015#comment-13726015 ] Hive QA commented on HIVE-4794: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595252/HIVE-4794.3-vectorization.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 3490 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table_json org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_column org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2 org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/266/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/266/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, HIVE-4794.3-vectorization.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726019#comment-13726019 ] Yin Huai commented on HIVE-2206: thanks [~sershe]. I will make the change add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator
Yin Huai created HIVE-4972: -- Summary: update code generated by thrift for DemuxOperator and MuxOperator Key: HIVE-4972 URL: https://issues.apache.org/jira/browse/HIVE-4972 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai HIVE-2206 introduces two new operators, which are DemuxOperator and MuxOperator. queryplan.thrift has been updated. But code generated by thrift should be also updated -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator
[ https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4972: -- Attachment: HIVE-4972.D11907.1.patch yhuai requested code review of HIVE-4972 [jira] update code generated by thrift for DemuxOperator and MuxOperator. Reviewers: JIRA initial commit HIVE-2206 introduces two new operators, which are DemuxOperator and MuxOperator. queryplan.thrift has been updated. But code generated by thrift should be also updated TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11907 AFFECTED FILES ql/src/gen/thrift/gen-cpp/queryplan_types.cpp ql/src/gen/thrift/gen-cpp/queryplan_types.h ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java ql/src/gen/thrift/gen-php/Types.php ql/src/gen/thrift/gen-py/queryplan/ttypes.py ql/src/gen/thrift/gen-rb/queryplan_types.rb MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/28455/ To: JIRA, yhuai update code generated by thrift for DemuxOperator and MuxOperator - Key: HIVE-4972 URL: https://issues.apache.org/jira/browse/HIVE-4972 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4972.D11907.1.patch HIVE-2206 introduces two new operators, which are DemuxOperator and MuxOperator. queryplan.thrift has been updated. But code generated by thrift should be also updated -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator
[ https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4972: --- Status: Patch Available (was: Open) update code generated by thrift for DemuxOperator and MuxOperator - Key: HIVE-4972 URL: https://issues.apache.org/jira/browse/HIVE-4972 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4972.D11907.1.patch HIVE-2206 introduces two new operators, which are DemuxOperator and MuxOperator. queryplan.thrift has been updated. But code generated by thrift should be also updated -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4972) update code generated by thrift for DemuxOperator and MuxOperator
[ https://issues.apache.org/jira/browse/HIVE-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4972: --- Affects Version/s: 0.12.0 update code generated by thrift for DemuxOperator and MuxOperator - Key: HIVE-4972 URL: https://issues.apache.org/jira/browse/HIVE-4972 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4972.D11907.1.patch HIVE-2206 introduces two new operators, which are DemuxOperator and MuxOperator. queryplan.thrift has been updated. But code generated by thrift should be also updated -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726023#comment-13726023 ] Yin Huai commented on HIVE-2206: i opened https://issues.apache.org/jira/browse/HIVE-4972 to update code generated by thrift add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726038#comment-13726038 ] Mark Wagner commented on HIVE-2482: --- I'm aware of .q files and have used them before, but I figured that UDFs are nice and isolated so a unit test is more appropriate. I didn't realize all the other UDFs had their own .q tests. I'll update with a .q test. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Assignee: Mark Wagner Attachments: HIVE-2482.1.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string * UDF's to convert to/from non-string types using a particular serde -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726048#comment-13726048 ] Hive QA commented on HIVE-4794: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595252/HIVE-4794.3-vectorization.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 3490 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table_json org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_column org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2 org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTimestampExpressions.testVectorUDFUnixTimeStampLong org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/267/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/267/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, HIVE-4794.3-vectorization.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4843: - Resolution: Fixed Status: Resolved (was: Patch Available) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch, HIVE-4843.5.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726058#comment-13726058 ] Gunther Hagleitner commented on HIVE-4843: -- Committed to trunk. Thanks Vikram! Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch, HIVE-4843.5.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4968) When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726060#comment-13726060 ] Hive QA commented on HIVE-4968: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595301/HIVE-4968.D11901.2.patch {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/268/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/268/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. When deduplicating multiple SelectOperators, we should update RowResolver accordinly Key: HIVE-4968 URL: https://issues.apache.org/jira/browse/HIVE-4968 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4968.D11901.1.patch, HIVE-4968.D11901.2.patch {code:Sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT key, value FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code:sql} SELECT tmp3.key, tmp3.value, tmp3.count FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count FROM (SELECT * FROM src) tmp1 JOIN (SELECT count(*) as count FROM src) tmp2 ) tmp3; {\code} The plan is executable. {code} SELECT tmp4.key, tmp4.value, tmp4.count FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count FROM (SELECT * FROM (SELECT key, value FROM src) tmp1 ) tmp2 JOIN (SELECT count(*) as count FROM src) tmp3 ) tmp4; {\code} The plan is not executable. The plan related to the MapJoin is {code} Stage: Stage-5 Map Reduce Local Work Alias - Map Local Tables: tmp4:tmp2:tmp1:src Fetch Operator limit: -1 Alias - Map Local Operator Tree: tmp4:tmp2:tmp1:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 HashTable Sink Operator condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 1 Stage: Stage-4 Map Reduce Alias - Map Operator Tree: $INTNAME Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col2 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work {\code} The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4973) Compiler should captures UDF as part of read entities
Prasad Mujumdar created HIVE-4973: - Summary: Compiler should captures UDF as part of read entities Key: HIVE-4973 URL: https://issues.apache.org/jira/browse/HIVE-4973 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar The compiler doesn't capture UDF accessed by a query in the read/write entity. It will be a useful information to external plugin hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4827) Merge a Map-only task to its child task
[ https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726065#comment-13726065 ] Gunther Hagleitner commented on HIVE-4827: -- Committed to trunk. Thanks Yin! Merge a Map-only task to its child task --- Key: HIVE-4827 URL: https://issues.apache.org/jira/browse/HIVE-4827 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch, HIVE-4827.7.patch, HIVE-4827.8.patch When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job. Example: {code:sql} set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key; {\code} In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job. Another example: {code:sql} SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1 {\code} For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira