[jira] [Updated] (HIVE-4502) NPE - subquery smb joins fails
[ https://issues.apache.org/jira/browse/HIVE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4502: - Attachment: HIVE-4502-1.patch Hi [~navis] My attached patch actually retains the SMB join and I feel it is a better plan over-all than converting all of the joins to reduce side joins. It would be great if you could take a look and let me know your opinion. All existing unit tests pass with this patch. Thanks Vikram. NPE - subquery smb joins fails -- Key: HIVE-4502 URL: https://issues.apache.org/jira/browse/HIVE-4502 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Navis Attachments: HIVE-4502-1.patch, HIVE-4502.D10695.1.patch, smb_mapjoin_25.q Found this issue while running some SMB joins. Attaching test case that causes this error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655815#comment-13655815 ] Jin Adachi commented on HIVE-2564: -- Hope to resolve this issue soon, me too. If this patch is too big, I'd like to send new small patch. Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation
Prasad Mujumdar created HIVE-4546: - Summary: Hive CLI leaves behind the per session resource directory on non-interactive invocation Key: HIVE-4546 URL: https://issues.apache.org/jira/browse/HIVE-4546 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar As part of HIVE-4505, the resource directory is set to /tmp/${hive.session.id}_resources and suppose to be removed at the end. The CLI fails to remove it when invoked using -f or -e (non-interactive mode) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation
[ https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4546: -- Attachment: HIVE-4546-1.patch Hive CLI leaves behind the per session resource directory on non-interactive invocation --- Key: HIVE-4546 URL: https://issues.apache.org/jira/browse/HIVE-4546 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4546-1.patch As part of HIVE-4505, the resource directory is set to /tmp/${hive.session.id}_resources and suppose to be removed at the end. The CLI fails to remove it when invoked using -f or -e (non-interactive mode) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-4546: Hive CLI leaves behind the per session resource directory on non-interactive invocation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11083/ --- Review request for hive, Owen O'Malley and Gunther Hagleitner. Description --- Hive CLI leaves behind the per session resource directory on non-interactive invocation. The patch includes executing session state close() at the end of non-interactive invocation. Also changed the session id format to be a UUID. This is avoid possible resource directory path conflict when there are multiple session HiveServer2 from same user at same time. This addresses bug HIVE-4546. https://issues.apache.org/jira/browse/HIVE-4546 Diffs - cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 4239392 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8e6e24a Diff: https://reviews.apache.org/r/11083/diff/ Testing --- Thanks, Prasad Mujumdar
[jira] [Commented] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation
[ https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655853#comment-13655853 ] Prasad Mujumdar commented on HIVE-4546: --- Review request on https://reviews.apache.org/r/11083/ Hive CLI leaves behind the per session resource directory on non-interactive invocation --- Key: HIVE-4546 URL: https://issues.apache.org/jira/browse/HIVE-4546 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4546-1.patch As part of HIVE-4505, the resource directory is set to /tmp/${hive.session.id}_resources and suppose to be removed at the end. The CLI fails to remove it when invoked using -f or -e (non-interactive mode) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4547) A complex create view statement fails with new Antlr 3.4
Prasad Mujumdar created HIVE-4547: - Summary: A complex create view statement fails with new Antlr 3.4 Key: HIVE-4547 URL: https://issues.apache.org/jira/browse/HIVE-4547 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 A complex create view statement with CAST in join condition fails with IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade (HIVE-2439). The same statement works fine with Hive 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4
[ https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4547: -- Attachment: HIVE-4547-repro.tar Attached repro script A complex create view statement fails with new Antlr 3.4 Key: HIVE-4547 URL: https://issues.apache.org/jira/browse/HIVE-4547 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4547-repro.tar A complex create view statement with CAST in join condition fails with IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade (HIVE-2439). The same statement works fine with Hive 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-4547: A complex create view statement fails with new Antlr 3.4
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11084/ --- Review request for hive and Ashutosh Chauhan. Description --- The parser has a translation map where its possible to replace all the text with the appropriate escaped version in case of a view creation. This holds all individual translations and where they apply in the view definition. The newer antlr version seems to be more restrictive and throws assertion if there's an overlaps in these escape positions. The original patch for antlr upgrade added a check to take care of some of the simpler overlap cases found by unit tests. There are few more scenarios like the one in the customer case which are not covered. The patch includes Traverse the list of translation in a loop and look for all the possible overlaps. This addresses bug HIVE-4547. https://issues.apache.org/jira/browse/HIVE-4547 Diffs - data/files/v1.txt PRE-CREATION data/files/v2.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java ec2c088 ql/src/test/queries/clientpositive/view_cast.q PRE-CREATION ql/src/test/results/clientpositive/view_cast.q.out PRE-CREATION Diff: https://reviews.apache.org/r/11084/diff/ Testing --- Ran full test suite. Added new test. Thanks, Prasad Mujumdar
[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4
[ https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4547: -- Attachment: HIVE-4547-1.patch A complex create view statement fails with new Antlr 3.4 Key: HIVE-4547 URL: https://issues.apache.org/jira/browse/HIVE-4547 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4547-1.patch, HIVE-4547-repro.tar A complex create view statement with CAST in join condition fails with IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade (HIVE-2439). The same statement works fine with Hive 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4
[ https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4547: -- Status: Patch Available (was: Open) Review request on https://reviews.apache.org/r/11084/ A complex create view statement fails with new Antlr 3.4 Key: HIVE-4547 URL: https://issues.apache.org/jira/browse/HIVE-4547 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4547-1.patch, HIVE-4547-repro.tar A complex create view statement with CAST in join condition fails with IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade (HIVE-2439). The same statement works fine with Hive 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation
[ https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4546: -- Status: Patch Available (was: Open) Review request on https://reviews.apache.org/r/11083/ Hive CLI leaves behind the per session resource directory on non-interactive invocation --- Key: HIVE-4546 URL: https://issues.apache.org/jira/browse/HIVE-4546 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4546-1.patch As part of HIVE-4505, the resource directory is set to /tmp/${hive.session.id}_resources and suppose to be removed at the end. The CLI fails to remove it when invoked using -f or -e (non-interactive mode) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Request for subscribing to mailing list
Hi, Please add me to the mailing list Thanks, Nabhajit Ray
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #144
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/144/ -- [...truncated 6524 lines...] ivy-retrieve-hadoop-shim: [echo] Project: shims [javac] Compiling 13 source files to /x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/shims/classes [javac] Note: /x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: /x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [echo] Building shims 0.23 build-shims: [echo] Project: shims [echo] Compiling /x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.23/java against hadoop 2.0.0-alpha (/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/hadoopcore/hadoop-2.0.0-alpha) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/ivy/ivysettings.xml [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.0.0-alpha/hadoop-common-2.0.0-alpha-tests.jar ... [ivy:resolve] ... (1073kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.0.0-alpha!hadoop-common.jar(tests) (246ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.0.0-alpha/hadoop-common-2.0.0-alpha.jar ... [ivy:resolve] . (2051kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.0.0-alpha!hadoop-common.jar (292ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.0.0-alpha/hadoop-mapreduce-client-core-2.0.0-alpha.jar ... [ivy:resolve] .. (1314kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-core;2.0.0-alpha!hadoop-mapreduce-client-core.jar (218ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-archives/2.0.0-alpha/hadoop-archives-2.0.0-alpha.jar ... [ivy:resolve] ... (20kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-archives;2.0.0-alpha!hadoop-archives.jar (124ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.0.0-alpha/hadoop-hdfs-2.0.0-alpha.jar ... [ivy:resolve] ... (3790kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-hdfs;2.0.0-alpha!hadoop-hdfs.jar (260ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.0.0-alpha/hadoop-hdfs-2.0.0-alpha-tests.jar ... [ivy:resolve] . (1365kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-hdfs;2.0.0-alpha!hadoop-hdfs.jar(tests) (214ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.0.0-alpha/hadoop-mapreduce-client-jobclient-2.0.0-alpha.jar ... [ivy:resolve] (33kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.0.0-alpha!hadoop-mapreduce-client-jobclient.jar (115ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.0.0-alpha/hadoop-mapreduce-client-jobclient-2.0.0-alpha-tests.jar ... [ivy:resolve]
Hive-trunk-h0.21 - Build # 2101 - Still Failing
Changes for Build #2074 [namit] HIVE-4371 some issue with merging join trees (Navis via namit) [hashutosh] HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani via Ashutosh Chauhan) [namit] HIVE-4342 NPE for query involving UNION ALL with nested JOIN and UNION ALL (Navis via namit) [hashutosh] HIVE-4364 : beeline always exits with 0 status, should exit with non-zero status on error (Rob Weltman via Ashutosh Chauhan) [hashutosh] HIVE-4130 : Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs (Harish Butani via Ashutosh Chauhan) Changes for Build #2075 [hashutosh] HIVE-2379 : Hive/HBase integration could be improved (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4295 : Lateral view makes invalid result if CP is disabled (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4365 : wrong result in left semi join (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3861 : Upgrade hbase dependency to 0.94 (Gunther Hagleitner via Ashutosh Chauhan) Changes for Build #2076 [hashutosh] HIVE-3891 : physical optimizer changes for auto sort-merge join (Namit Jain via Ashutosh Chauhan) [namit] HIVE-4393 Make the deleteData flag accessable from DropTable/Partition events (Morgan Philips via namit) [hashutosh] HIVE-4394 : test leadlag.q fails (Ashutosh Chauhan) [namit] HIVE-4018 MapJoin failing with Distributed Cache error (Amareshwari Sriramadasu via Namit Jain) Changes for Build #2077 [namit] HIVE-4300 ant thriftif generated code that is checkedin is not up-to-date (Roshan Naik via namit) Changes for Build #2078 [namit] HIVE-4409 Prevent incompatible column type changes (Dilip Joseph via namit) [namit] HIVE-4095 Add exchange partition in Hive (Dheeraj Kumar Singh via namit) [namit] HIVE-4005 Column truncation (Kevin Wilfong via namit) [namit] HIVE-3952 merge map-job followed by map-reduce job (Vinod Kumar Vavilapalli via namit) [hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. (Navis via Ashutosh Chauhan) [khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config directory (Thejas M Nair via Sushanth Sowmyan) [namit] HIVE-4181 Star argument without table alias for UDTF is not working (Navis via namit) [hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails because of null case difference (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via Ashutosh Chauhan) Changes for Build #2079 [namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409 (Namit Jain) [hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer valid (Harish Butani via Ashutosh Chauhan) Changes for Build #2080 [navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is not estimated correctly (Navis) [khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via Sushanth Sowmyan) [hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration (Billie Rinaldi via Ashutosh Chauhan) Changes for Build #2081 Changes for Build #2082 [hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) [hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when originating session is closed (Ashish Vaidya via Ashutosh Chauhan) [hashutosh] HIVE-4019 : Ability to create and drop temporary partition function (Brock Noland via Ashutosh Chauhan) Changes for Build #2083 [navis] HIVE-4437 Missing file on HIVE-4068 (Navis) Changes for Build #2084 Changes for Build #2085 Changes for Build #2086 [hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via Ashutosh Chauhan) [hashutosh] HIVE-4439 : Remove unused join configuration parameter: hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4438 : Remove unused join configuration parameter: hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-3682 : when output hive table to file,users should could have a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan) [hashutosh] HIVE-4373 : Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via Ashutosh Chauhan) Changes for Build #2087 Changes for Build #2088 [gates] HIVE-4465 webhcat e2e tests succeed regardless of exitvalue Changes for Build #2089 [cws] HIVE-3957. Add pseudo-BNF grammar for RCFile to Javadoc (Mark Grover via cws) [cws] HIVE-4497. beeline module tests don't get run by default (Thejas Nair via cws) [gangtimliu] HIVE-4474: Column access not tracked properly for partitioned tables. Samuel Yuan via Gang Tim Liu [hashutosh] HIVE-4455 : HCatalog build directories get included in tar file produced by ant tar (Alan Gates via Ashutosh Chauhan) Changes for Build #2090 Changes for Build #2091 [hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit aggregate functions with star
Re: Review Request: HIVE-4546: Hive CLI leaves behind the per session resource directory on non-interactive invocation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11083/#review20491 --- I think we should refactor CliDriver.run into the setup and the part that actually runs the commands. If we pull out from the part where the cli object is created down, we can isolate all of the multiple exits to that routine and make the ss.close handling more future proof. In terms of changing the session id to a uuid, I think that it is better to have a human readable string than a random identifier. Since the current session id will be unique upto a process, maybe we could add a static counter that keeps track of how many session ids this process has created and add that as a suffix. - Owen O'Malley On May 13, 2013, 8:50 a.m., Prasad Mujumdar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11083/ --- (Updated May 13, 2013, 8:50 a.m.) Review request for hive, Owen O'Malley and Gunther Hagleitner. Description --- Hive CLI leaves behind the per session resource directory on non-interactive invocation. The patch includes executing session state close() at the end of non-interactive invocation. Also changed the session id format to be a UUID. This is avoid possible resource directory path conflict when there are multiple session HiveServer2 from same user at same time. This addresses bug HIVE-4546. https://issues.apache.org/jira/browse/HIVE-4546 Diffs - cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 4239392 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8e6e24a Diff: https://reviews.apache.org/r/11083/diff/ Testing --- Thanks, Prasad Mujumdar
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656141#comment-13656141 ] Mikhail Bautin commented on HIVE-4525: -- Correction to the design of this feature (I can't edit comments because of permissions, so adding another comment). In case the seconds field needs more than 31 bit, the first VInt is {{-1-reversedDecimal}} regardless of whether {{reversedDecimal}} is zero or not. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
On Saturday, I didn't include the Maven staging urls: Hive: https://repository.apache.org/content/repositories/orgapachehive-013/ HCatalog: https://repository.apache.org/content/repositories/orgapachehcatalog-014/ Thanks, Owen On Sat, May 11, 2013 at 10:33 AM, Owen O'Malley omal...@apache.org wrote: Based on feedback from everyone, I have respun release candidate, RC2. Please take a look. We've fixed 7 problems with the previous RC: * Release notes were incorrect * HIVE-4018 - MapJoin failing with Distributed Cache error * HIVE-4421 - Improve memory usage by ORC dictionaries * HIVE-4500 - Ensure that HiveServer 2 closes log files. * HIVE-4494 - ORC map columns get class cast exception in some contexts * HIVE-4498 - Fix TestBeeLineWithArgs failure * HIVE-4505 - Hive can't load transforms with remote scripts * HIVE-4527 - Fix the eclipse template Source tag for RC2 is at: https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2 Source tar ball and convenience binary artifacts can be found at: http://people.apache.org/~omalley/hive-0.11.0rc2/ This release has many goodies including HiveServer2, integrated hcatalog, windowing and analytical functions, decimal data type, better query planning, performance enhancements and various bug fixes. In total, we resolved more than 350 issues. Full list of fixed issues can be found at: http://s.apache.org/8Fr Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Owen
[jira] [Created] (HIVE-4548) Speed up vectorized LIKE filter for special cases abc%, %abc and %abc%
Eric Hanson created HIVE-4548: - Summary: Speed up vectorized LIKE filter for special cases abc%, %abc and %abc% Key: HIVE-4548 URL: https://issues.apache.org/jira/browse/HIVE-4548 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Teddy Choi Priority: Minor Fix For: vectorization-branch Speed up vectorized LIKE filter evaluation for abc%, %abc, and %abc% pattern special cases (here, abc is just a place holder for some fixed string). Problem: The current vectorized LIKE implementation always calls the standard LIKE function code in UDFLike.java. But this is pretty expensive. It calls multiple functions and allocates at least one new object per call. Probably 80% of uses of LIKE are for the simple patterns abc%, %abc, and %abc%. These can be implemented much more efficiently. Start by speeding up the case for Column LIKE abc% The goal would be to minimize expense in the inner loop. Don't use new() in the inner loop, and write a static function that checks the prefix of the string matches the like pattern as efficiently as possible, operating directly on the byte array holding UTF-8-encoded string data, and avoiding unnecessary additional function calls and if/else logic. Call that in the inner loop. If feasible, consider using a template-driven approach, with an instance of the template expanded for each of the three cases. Start doing the abc% (prefix match) by hand, then consider templatizing for the other two cases. The code is in the vectorization branch of the main hive repo. Start by checking in the constructor for FilterStringColLikeStringScalar.java if the pattern is one of the simple special cases. If so, record that, and have the evaluate() method call a special-case function for each case, i.e. the general case, and each of the 3 special cases. All the dynamic decision-making would be done once per vector, not once per element. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-hadoop2 - Build # 195 - Still Failing
Changes for Build #169 [hashutosh] HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani via Ashutosh Chauhan) [namit] HIVE-4342 NPE for query involving UNION ALL with nested JOIN and UNION ALL (Navis via namit) [hashutosh] HIVE-4364 : beeline always exits with 0 status, should exit with non-zero status on error (Rob Weltman via Ashutosh Chauhan) [hashutosh] HIVE-4130 : Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs (Harish Butani via Ashutosh Chauhan) Changes for Build #170 [hashutosh] HIVE-4295 : Lateral view makes invalid result if CP is disabled (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4365 : wrong result in left semi join (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3861 : Upgrade hbase dependency to 0.94 (Gunther Hagleitner via Ashutosh Chauhan) [namit] HIVE-4371 some issue with merging join trees (Navis via namit) Changes for Build #171 [hashutosh] HIVE-2379 : Hive/HBase integration could be improved (Navis via Ashutosh Chauhan) Changes for Build #172 [hashutosh] HIVE-4394 : test leadlag.q fails (Ashutosh Chauhan) [namit] HIVE-4018 MapJoin failing with Distributed Cache error (Amareshwari Sriramadasu via Namit Jain) Changes for Build #173 [namit] HIVE-4300 ant thriftif generated code that is checkedin is not up-to-date (Roshan Naik via namit) [hashutosh] HIVE-3891 : physical optimizer changes for auto sort-merge join (Namit Jain via Ashutosh Chauhan) [namit] HIVE-4393 Make the deleteData flag accessable from DropTable/Partition events (Morgan Philips via namit) Changes for Build #174 [khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config directory (Thejas M Nair via Sushanth Sowmyan) [namit] HIVE-4181 Star argument without table alias for UDTF is not working (Navis via namit) [hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails because of null case difference (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via Ashutosh Chauhan) Changes for Build #175 [hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer valid (Harish Butani via Ashutosh Chauhan) [namit] HIVE-4409 Prevent incompatible column type changes (Dilip Joseph via namit) [namit] HIVE-4095 Add exchange partition in Hive (Dheeraj Kumar Singh via namit) [namit] HIVE-4005 Column truncation (Kevin Wilfong via namit) [namit] HIVE-3952 merge map-job followed by map-reduce job (Vinod Kumar Vavilapalli via namit) [hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. (Navis via Ashutosh Chauhan) Changes for Build #176 [hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration (Billie Rinaldi via Ashutosh Chauhan) [namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409 (Namit Jain) Changes for Build #177 [navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is not estimated correctly (Navis) [khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via Sushanth Sowmyan) Changes for Build #178 Changes for Build #179 [hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) [hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when originating session is closed (Ashish Vaidya via Ashutosh Chauhan) [hashutosh] HIVE-4019 : Ability to create and drop temporary partition function (Brock Noland via Ashutosh Chauhan) Changes for Build #180 [navis] HIVE-4437 Missing file on HIVE-4068 (Navis) Changes for Build #181 Changes for Build #182 Changes for Build #183 [hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via Ashutosh Chauhan) [hashutosh] HIVE-4439 : Remove unused join configuration parameter: hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4438 : Remove unused join configuration parameter: hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-3682 : when output hive table to file,users should could have a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan) [hashutosh] HIVE-4373 : Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via Ashutosh Chauhan) Changes for Build #184 Changes for Build #185 Changes for Build #186 Changes for Build #187 Changes for Build #188 [hashutosh] HIVE-4466 : Fix continue.on.failure in unit tests to -well- continue on failure in unit tests (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4471 : Build fails with hcatalog checkstyle error (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4421 : Improve memory usage by ORC dictionaries (Owen Omalley via Ashutosh Chauhan) [mithun] HCATALOG-627 - Adding thread-safety to
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656226#comment-13656226 ] Eric Hanson commented on HIVE-4525: --- For vectorized query execution (HIVE-4160), we are going to represent a timestamp value internally as a vector of 64 bit integers representing the number of nanos since the epoch (in 1970). Given your proposal to also support time values before 1970, I'd propose that for vectorized QE we extend this so a negative number of nanos is used to represent a value before 1970. This gives a range of 292 years before or after 1970, good enough for practical purposes. Data outside that range might first not be supported for vectorized QE, and then later might be supported but revert to a slower code path. We may want to consider that the storage layer (say ORC) store timestamps simply as a long, so it is not as expensive to flow this data into vectorized query execution. With compression, these long values will compress pretty well, so the storage layout becomes less of a concern and query execution speed becomes the more pressing issue. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4160) Vectorized Query Execution in Hive
[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4160: -- Attachment: Hive-Vectorized-Query-Execution-Design-rev7.docx Added discussion of timestamp values before the epoch (in 1970) related to HIVE-4525. Vectorized Query Execution in Hive -- Key: HIVE-4160 URL: https://issues.apache.org/jira/browse/HIVE-4160 Project: Hive Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Hive-Vectorized-Query-Execution-Design.docx, Hive-Vectorized-Query-Execution-Design-rev2.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.pdf, Hive-Vectorized-Query-Execution-Design-rev4.docx, Hive-Vectorized-Query-Execution-Design-rev4.pdf, Hive-Vectorized-Query-Execution-Design-rev5.docx, Hive-Vectorized-Query-Execution-Design-rev5.pdf, Hive-Vectorized-Query-Execution-Design-rev6.docx, Hive-Vectorized-Query-Execution-Design-rev6.pdf, Hive-Vectorized-Query-Execution-Design-rev7.docx The Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before the next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, deserialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing. This work will add support for vectorized query execution to Hive, where, instead of individual rows, batches of about a thousand rows at a time are processed. Each column in the batch is represented as a vector of a primitive data type. The inner loop of execution scans these vectors very fast, avoiding method calls, deserialization, unnecessary if-then-else, etc. This substantially reduces CPU time used, and gives excellent instructions per cycle (i.e. improved processor pipeline utilization). See the attached design specification for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656271#comment-13656271 ] Mikhail Bautin commented on HIVE-4525: -- [~ehans]: switching to long nanosecond timestamps would definitely be a much nicer solution, but don't you think it would break backward-compatibility for timestamps serialized using the old format? Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4510) HS2 doesn't nest exceptions properly (fun debug times)
[ https://issues.apache.org/jira/browse/HIVE-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656302#comment-13656302 ] Thejas M Nair commented on HIVE-4510: - I am running the full hive unit test suite on this patch. I will update when it is done. HS2 doesn't nest exceptions properly (fun debug times) -- Key: HIVE-4510 URL: https://issues.apache.org/jira/browse/HIVE-4510 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Gunther Hagleitner Assignee: Thejas M Nair Attachments: HIVE-4510.1.patch, HIVE-4510.2.patch In SQLOperation.java lines 97 + 113 for instance, we catch errors and throw a new HiveSQLException, but we don't wrap the original exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4540) JOIN-GRP BY-DISTINCT fails with NPE when mapjoin.mapreduce=true
[ https://issues.apache.org/jira/browse/HIVE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4540: - Status: Patch Available (was: Open) JOIN-GRP BY-DISTINCT fails with NPE when mapjoin.mapreduce=true --- Key: HIVE-4540 URL: https://issues.apache.org/jira/browse/HIVE-4540 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4540.1.patch If the mapjoin.mapreduce optimization kicks in on a query of this form: {noformat} select count(distinct a.v) from a join b on (a.k = b.k) group by a.g {noformat} The planer will NPE in the metadataonly optimizer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-4513 - disable hivehistory logs by default
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/ --- (Updated May 13, 2013, 8:13 p.m.) Review request for hive. Summary (updated) - HIVE-4513 - disable hivehistory logs by default Description --- HIVE-4513 This addresses bug HIVE-4513. https://issues.apache.org/jira/browse/HIVE-4513 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 conf/hive-default.xml.template 3a7d1dc data/conf/hive-site.xml 544ba35 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 Diff: https://reviews.apache.org/r/11029/diff/ Testing --- Thanks, Thejas Nair
[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs
[ https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4531: - Attachment: HIVE-4531-4.patch Adding documentation. [WebHCat] Collecting task logs to hdfs -- Key: HIVE-4531 URL: https://issues.apache.org/jira/browse/HIVE-4531 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch It would be nice we collect task logs after job finish. This is similar to what Amazon EMR does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4549) JDBC compliance change
Johndee Burks created HIVE-4549: --- Summary: JDBC compliance change Key: HIVE-4549 URL: https://issues.apache.org/jira/browse/HIVE-4549 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.10.0 Environment: Hive 0.10 Reporter: Johndee Burks Priority: Trivial The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second column name is not compliant with the JDBC standard (http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()): the column name should be TABLE_SCHEM instead of TABLE_SCHEMA. Suggested fix in Hive (org.apache.hive.service.cli.operation.GetTablesOperation.java) change from private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEMA, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); to private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEM, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4549) JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM
[ https://issues.apache.org/jira/browse/HIVE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johndee Burks updated HIVE-4549: Summary: JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM (was: JDBC compliance change) JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM -- Key: HIVE-4549 URL: https://issues.apache.org/jira/browse/HIVE-4549 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.10.0 Environment: Hive 0.10 Reporter: Johndee Burks Priority: Trivial Labels: newbie The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second column name is not compliant with the JDBC standard (http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()): the column name should be TABLE_SCHEM instead of TABLE_SCHEMA. Suggested fix in Hive (org.apache.hive.service.cli.operation.GetTablesOperation.java) change from private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEMA, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); to private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEM, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions
Gunther Hagleitner created HIVE-4550: Summary: local_mapred_error_cache fails on some hadoop versions Key: HIVE-4550 URL: https://issues.apache.org/jira/browse/HIVE-4550 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor I've tested it manually on the upcoming 1.3 version (branch 1). We do mask job_* ids, but not job_local* ids. The fix is to extend this to both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions
[ https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4550: - Attachment: HIVE-4550.1.patch local_mapred_error_cache fails on some hadoop versions -- Key: HIVE-4550 URL: https://issues.apache.org/jira/browse/HIVE-4550 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4550.1.patch I've tested it manually on the upcoming 1.3 version (branch 1). We do mask job_* ids, but not job_local* ids. The fix is to extend this to both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
Sushanth Sowmyan created HIVE-4551: -- Summary: ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 12 more 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656384#comment-13656384 ] Sushanth Sowmyan commented on HIVE-4551: The problem here is that the raw data encapsulated by HCatRecord and HCatSchema are out of synch, which was one of my worries back in HCATALOG-425 : https://issues.apache.org/jira/browse/HCATALOG-425?focusedCommentId=13439652page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13439652 Basically, the raw data contained in the smallint/tinyint columns are raw shorts and bytes, and we try to read it as an Int. In the case of rcfile, the underlying raw data is also stored as an IntWritable in the cases of smallint and tinyint, but not so in the case of orc. This leads to the following kind of calls in the rcfile case, and in the orc case: RCFILE: {noformat} 13/05/11 02:56:10 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b, serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean} == org.apache.hadoop.hive.serde2.lazy.LazyInteger:-3 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyInteger:9001 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyInteger:86400 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyLong:4294967297 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint == org.apache.hadoop.hive.serde2.lazy.LazyFloat:34.532 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float == org.apache.hadoop.hive.serde2.lazy.LazyDouble:2.184239842983489E15 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double == org.apache.hadoop.hive.serde2.lazy.LazyBoolean:true == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean == org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int == org.apache.hadoop.hive.serde2.lazy.LazyLong:0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint == org.apache.hadoop.hive.serde2.lazy.LazyFloat:0.0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float == org.apache.hadoop.hive.serde2.lazy.LazyDouble:0.0 == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double == org.apache.hadoop.hive.serde2.lazy.LazyBoolean:false == org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean {noformat} ORC: {noformat} 13/05/11 02:56:16 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.ql.io.orc.OrcSerde with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b, serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean} == org.apache.hadoop.hive.serde2.io.ByteWritable:-3 == org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector:int 13/05/11 02:56:16 WARN mapred.LocalJobRunner: job_local_0003 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at
[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656388#comment-13656388 ] Sushanth Sowmyan commented on HIVE-4551: I'm attaching a patch for this, by doing the following: a) Removing promotion logic from HCatSchema, keeping that pure so it reflects the table type. b) Doing to conversion to appropriate pig types inside PigHCatUTil. This breaks Travis' original intent of having HCatRecord/HCatSchema do promotions for all M/R programs, but given that there was a bug in that conversion anyway, this breakage is not a backward-incompatible breakage. c) If we intend to add back that support, then the correct way to do that, imo, is to add that promotion to HCatRecord's accessors, but leave HCatSchema alone. d) I've also added a new Testcase to mimic the e2e test that failed, and so we can build on that from now on. I've also refactored more Loader/Storer tests to run against orc as well. ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int --- Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 12 more 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4551: --- Attachment: 4551.patch (patch attached) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int --- Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: 4551.patch This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 12 more 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656393#comment-13656393 ] Sushanth Sowmyan commented on HIVE-4551: [~traviscrawford], could you please have a look at this? ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int --- Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: 4551.patch This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 12 more 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-4513 - disable hivehistory logs by default
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/ --- (Updated May 13, 2013, 9:51 p.m.) Review request for hive. Changes --- Changes in new patch - add @Override to interface functions being implemented in HiveHistoryImpl Removing javadoc duplication in HiveHistoryImpl. It will automatically inherit the documentation from interface. Logging the exception in code unrelated to patch, to partly address Brock's concern. Since the code is not part of the patch, I don't want to increase the scope to address that concern. Description --- HIVE-4513 This addresses bug HIVE-4513. https://issues.apache.org/jira/browse/HIVE-4513 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 conf/hive-default.xml.template 3a7d1dc data/conf/hive-site.xml 544ba35 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 Diff: https://reviews.apache.org/r/11029/diff/ Testing --- Thanks, Thejas Nair
[jira] [Updated] (HIVE-4513) disable hivehistory logs by default
[ https://issues.apache.org/jira/browse/HIVE-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4513: Description: HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. was: HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. HIVE-4500 is introducing a new config parameter to turn this off, we should use that to turn this off by default. disable hivehistory logs by default --- Key: HIVE-4513 URL: https://issues.apache.org/jira/browse/HIVE-4513 Project: Hive Issue Type: Bug Components: Configuration, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4513.1.patch, HIVE-4513.2.patch, HIVE-4513.3.patch, HIVE-4513.4.patch HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-4513 - disable hivehistory logs by default
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/ --- (Updated May 13, 2013, 10:12 p.m.) Review request for hive. Changes --- Updating review with background of the changes. Description (updated) --- HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. The change : A new config parameter hive.session.history.enabled controls if the history-log is enabled. By default it is set to false. SessionState initializes the HiveHIstory object. When this config is set to false, it creates a Proxy object that does not do anything. I did this instead of having SessionState return null, because that would add null checks in too many places. This keeps the code cleaner and avoids possibility of NPE. As the proxy only works against interfaces, i created a HiveHistory interface, moved the implementation to HiveHistoryImpl. static functions were moved to HiveHistoryUtil . This addresses bug HIVE-4513. https://issues.apache.org/jira/browse/HIVE-4513 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 conf/hive-default.xml.template 3a7d1dc data/conf/hive-site.xml 544ba35 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 Diff: https://reviews.apache.org/r/11029/diff/ Testing --- Thanks, Thejas Nair
Re: Review Request: HIVE-4513 - disable hivehistory logs by default
On May 9, 2013, 4:37 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java, lines 71-73 https://reviews.apache.org/r/11029/diff/1/?file=289274#file289274line71 This is bad... I know it's not related to your change but can we fix this? I have made things slightly better by logging the error. I looked at throwing an exception, but that would need changes in other classes to handle the exception correctly (Such as hive web interface classes). Since this code is unrelated to the patch, and it is not a 1-2 liner, I think we should address that separately. - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/#review20380 --- On May 13, 2013, 10:12 p.m., Thejas Nair wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/ --- (Updated May 13, 2013, 10:12 p.m.) Review request for hive. Description --- HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. The change : A new config parameter hive.session.history.enabled controls if the history-log is enabled. By default it is set to false. SessionState initializes the HiveHIstory object. When this config is set to false, it creates a Proxy object that does not do anything. I did this instead of having SessionState return null, because that would add null checks in too many places. This keeps the code cleaner and avoids possibility of NPE. As the proxy only works against interfaces, i created a HiveHistory interface, moved the implementation to HiveHistoryImpl. static functions were moved to HiveHistoryUtil . This addresses bug HIVE-4513. https://issues.apache.org/jira/browse/HIVE-4513 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 conf/hive-default.xml.template 3a7d1dc data/conf/hive-site.xml 544ba35 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 Diff: https://reviews.apache.org/r/11029/diff/ Testing --- Thanks, Thejas Nair
[jira] [Updated] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4551: --- Summary: HCatLoader smallint/tinyint promotions to Int have issues with ORC integration (was: ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration -- Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: 4551.patch This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 12 more 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656479#comment-13656479 ] Eric Hanson commented on HIVE-4525: --- Yes, so you'd have to support both at least for an extended period of time. It would be a performance enhancement and you'd need to maintain backward compatibility for older data. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration
[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656493#comment-13656493 ] Sushanth Sowmyan commented on HIVE-4551: Also, a few more notes : a) With my patch that fixes this bug, HCatRecordSerDe still is doing the promotion, so HCatRecord does have the promoted data when reading off it, so promotion is still configurable in the current way. I intend to refactor this out in a new patch(details below) b) Only the HCatSchema has been made pure in that it reflects the underlying data. -- My eventual goal, post-bugfix, to clean this up is as follows: a) HCatRecord and HCatSchema reflect underlying raw data and do no promotions. b) Introduce a ConversionImpl, which defines various datatype conversion functions, which all default to returning the input, and having a config that allows a user which conversions are implemented. c) Introduce a PromotedHCatRecord PromotedHCatSchema that wrap HCatRecord/HCatSchema and use a ConversionImpl. d) Implement a PigLoaderConversionImpl/PigStorerConversionImpl in hcat-pig-adapter, which implements the following: Short-Int promotion, Short-Int promotion, Boolean-Int promotion e) Have HCatLoader/HCatStorer use the promoted versions of HCatRecord/HCatSchema which use the PigConversionImpl. f) Remove the current HCatContext promotion parameters and make them be HCatLoader/HCatStorer parameters. HCatLoader smallint/tinyint promotions to Int have issues with ORC integration -- Key: HIVE-4551 URL: https://issues.apache.org/jira/browse/HIVE-4551 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: 4551.patch This was initially reported from an e2e test run, with the following E2E test: {code} { 'name' = 'Hadoop_ORC_Write', 'tests' = [ { 'num' = 1 ,'hcat_prep'=q\ drop table if exists hadoop_orc; create table hadoop_orc ( t tinyint, si smallint, i int, b bigint, f float, d double, s string) stored as orc;\ ,'hadoop' = q\ jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, ,'result_table' = 'hadoop_orc' ,'sql' = q\select * from all100k;\ ,'floatpostprocess' = 1 ,'delimiter' = ' ' }, ], }, {code} This fails with the following error: {code} 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at
[jira] [Updated] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions
[ https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4550: - Status: Patch Available (was: Open) local_mapred_error_cache fails on some hadoop versions -- Key: HIVE-4550 URL: https://issues.apache.org/jira/browse/HIVE-4550 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4550.1.patch I've tested it manually on the upcoming 1.3 version (branch 1). We do mask job_* ids, but not job_local* ids. The fix is to extend this to both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4475) Switch RCFile default to LazyBinaryColumnarSerDe
[ https://issues.apache.org/jira/browse/HIVE-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656512#comment-13656512 ] Gunther Hagleitner commented on HIVE-4475: -- review: https://reviews.facebook.net/D10785 Switch RCFile default to LazyBinaryColumnarSerDe Key: HIVE-4475 URL: https://issues.apache.org/jira/browse/HIVE-4475 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4475.1.patch For most workloads it seems LazyBinaryColumnarSerDe (binary) will perform better than ColumnarSerDe (text). Not sure why ColumnarSerDe is the default, but my guess is, that's for historical reasons. I suggest switching the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4542) TestJdbcDriver2.testMetaDataGetSchemas fails because of unexpected database
[ https://issues.apache.org/jira/browse/HIVE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4542: Attachment: HIVE-4542.1.patch HIVE-4542.1.patch - needs HIVE-4171 (HIVE-4171.4.patch) to be applied first. TestJdbcDriver2.testMetaDataGetSchemas fails because of unexpected database --- Key: HIVE-4542 URL: https://issues.apache.org/jira/browse/HIVE-4542 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4542.1.patch The check for database name in TestJdbcDriver2.testMetaDataGetSchemas fails with the error - {code} junit.framework.ComparisonFailure: expected:...efault but was:...bname {code} ie, a database called dbname is found, which it does not expect. This failure will happen depending on the order in which the function get the databases, if default database is the first one, it succeeds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4535) hive build fails with hadoop 0.20
[ https://issues.apache.org/jira/browse/HIVE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4535: Assignee: Thejas M Nair hive build fails with hadoop 0.20 - Key: HIVE-4535 URL: https://issues.apache.org/jira/browse/HIVE-4535 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4535.1.patch, HIVE-4535.2.patch ant package -Dhadoop.mr.rev=20 leads to - {code} [javac] /Users/thejas/hive_thejas_git/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java:382: cannot find symbol [javac] symbol : method join(java.lang.String,java.util.Listjava.lang.String) [javac] location: class org.apache.hadoop.util.StringUtils [javac] StringUtils.join(,, incompatibleCols) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Add Vectorized Substr
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11106/#review20512 --- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42276 please add javadoc comment for purpose of class ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42278 put comment for why this is here ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42280 explain more clearly what this function does ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42281 if you use a negative start index -n, the existing Hive code (non-vectorized) seems to take the tail end total n characters. e.g. substr(foo, -2) is oo. If you use -n and n is greater than the string length, the output is the empty string. Please handle this case with the same behavior as non vectorized hive. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42282 please run ant checkstyle and follow suggestions, e.g. there is no blank after if before ( ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42283 also set output value to empty string if output is null outV.noNulls needs to get set for every case and doesn't get set here ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42289 len[0] - offset could be negative. Do you need to use len[0] - (start[0] - offset)? Make sure you have unit tests for case where start != 0. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java https://reviews.apache.org/r/11106/#comment42296 I've heard that if the common case is the first case, things can run faster. You could reverse these and make the test for offset -1. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java https://reviews.apache.org/r/11106/#comment42305 need to handle negative start index case. It appears your code could get array out of bounds in that case. Also, for substrLength = 0, result should be empty string ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java https://reviews.apache.org/r/11106/#comment42308 should set isRepeating to false for default case ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java https://reviews.apache.org/r/11106/#comment42306 need to check noNulls first. If noNulls then you can't look into isNull array or you could see invalid data ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java https://reviews.apache.org/r/11106/#comment42309 Is this output supposed to be null or empty string? My add-hoc tests seemed to show it was empty string. You should double check. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java https://reviews.apache.org/r/11106/#comment42310 need to set outV.isNull[I] to true or false always, unless you are setting outV.noNulls to true. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java https://reviews.apache.org/r/11106/#comment42311 second argument to VEctorizedRowBatch constructor should not be use (it defaults to correct value) ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java https://reviews.apache.org/r/11106/#comment42314 argument is not needed -- use default constructor with no args ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java https://reviews.apache.org/r/11106/#comment42312 you need to test for some data with multi-byte characters. There is an example of that someplace else in the tests. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java https://reviews.apache.org/r/11106/#comment42315 need to try to test the case where data start position is not 0 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java https://reviews.apache.org/r/11106/#comment42313 need to verify the other rows besides 0 are always set to not null. The isNull entries for them could have been true by chance from a previous use of the batch - Eric Hanson On May 13, 2013, 9:54 p.m., Timothy Chen wrote:
[jira] [Created] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly
Sarvesh Sakalanaga created HIVE-4552: Summary: Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly Key: HIVE-4552 URL: https://issues.apache.org/jira/browse/HIVE-4552 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga IsRepeating flag in ColumnVector is being set incorrectly by ORC RecordReader(RecordReaderImpl.java) and as such wrong results are being written by VectorFileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4495) Implement vectorized string substr
[ https://issues.apache.org/jira/browse/HIVE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656578#comment-13656578 ] Eric Hanson commented on HIVE-4495: --- See my comments on the first version of the patch at https://reviews.apache.org/r/11106/ Implement vectorized string substr -- Key: HIVE-4495 URL: https://issues.apache.org/jira/browse/HIVE-4495 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen Assignee: Timothy Chen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Column Column, and Column Scalar vectorized execution tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/ Testing --- generated tests, and ran them. Thanks, tony murphy
Re: Review Request: Column Column, and Column Scalar vectorized execution tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- (Updated May 14, 2013, 12:27 a.m.) Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/ Testing --- generated tests, and ran them. Thanks, tony murphy
[jira] [Created] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
Tony Murphy created HIVE-4553: - Summary: Column Column, and Column Scalar vectorized execution tests Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Column Column, and Column Scalar vectorized execution tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- (Updated May 14, 2013, 12:34 a.m.) Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. This addresses bug HIVE-4553. https://issues.apache.org/jira/browse/HIVE-4553 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/ Testing --- generated tests, and ran them. Thanks, tony murphy
[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
[ https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4553: -- Attachment: HIVE-4553.patch Column Column, and Column Scalar vectorized execution tests --- Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4553.patch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly
[ https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarvesh Sakalanaga updated HIVE-4552: - Attachment: Hive.4552.0.patch Patch uploaded. Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly --- Key: HIVE-4552 URL: https://issues.apache.org/jira/browse/HIVE-4552 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Attachments: Hive.4552.0.patch IsRepeating flag in ColumnVector is being set incorrectly by ORC RecordReader(RecordReaderImpl.java) and as such wrong results are being written by VectorFileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly
[ https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarvesh Sakalanaga updated HIVE-4552: - Status: Patch Available (was: Open) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly --- Key: HIVE-4552 URL: https://issues.apache.org/jira/browse/HIVE-4552 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Attachments: Hive.4552.0.patch IsRepeating flag in ColumnVector is being set incorrectly by ORC RecordReader(RecordReaderImpl.java) and as such wrong results are being written by VectorFileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
Owen, Where do I find the public keys you used to sign the files ? putting it in http://apache.org/dist/hive/KEYS seems to be the convention so far. (I found that location from similar location in pig howtorelease doc https://cwiki.apache.org/confluence/display/PIG/HowToRelease). Thanks, Thejas On Mon, May 13, 2013 at 10:19 AM, Owen O'Malley omal...@apache.org wrote: On Saturday, I didn't include the Maven staging urls: Hive: https://repository.apache.org/content/repositories/orgapachehive-013/ HCatalog: https://repository.apache.org/content/repositories/orgapachehcatalog-014/ Thanks, Owen On Sat, May 11, 2013 at 10:33 AM, Owen O'Malley omal...@apache.org wrote: Based on feedback from everyone, I have respun release candidate, RC2. Please take a look. We've fixed 7 problems with the previous RC: * Release notes were incorrect * HIVE-4018 - MapJoin failing with Distributed Cache error * HIVE-4421 - Improve memory usage by ORC dictionaries * HIVE-4500 - Ensure that HiveServer 2 closes log files. * HIVE-4494 - ORC map columns get class cast exception in some contexts * HIVE-4498 - Fix TestBeeLineWithArgs failure * HIVE-4505 - Hive can't load transforms with remote scripts * HIVE-4527 - Fix the eclipse template Source tag for RC2 is at: https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2 Source tar ball and convenience binary artifacts can be found at: http://people.apache.org/~omalley/hive-0.11.0rc2/ This release has many goodies including HiveServer2, integrated hcatalog, windowing and analytical functions, decimal data type, better query planning, performance enhancements and various bug fixes. In total, we resolved more than 350 issues. Full list of fixed issues can be found at: http://s.apache.org/8Fr Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Owen
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
On Mon, May 13, 2013 at 6:14 PM, Thejas Nair the...@hortonworks.com wrote: Owen, Where do I find the public keys you used to sign the files ? You can get them from: https://people.apache.org/keys/group/hive.asc putting it in http://apache.org/dist/hive/KEYS seems to be the convention so far. Having KEYS files was the way it was done before you could put your public key into id.apache.org. Once a committer has their key uploaded, it is automatically added to each of the groups they are in. (I found that location from similar location in pig howtorelease doc https://cwiki.apache.org/confluence/display/PIG/HowToRelease). We should update the KEYS file to automatically redirect to the dynamic list of keys. -- Owen
[jira] [Created] (HIVE-4554) Failed to create a table from existing file if file path has spaces
Xuefu Zhang created HIVE-4554: - Summary: Failed to create a table from existing file if file path has spaces Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces
[ https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-4554: -- Attachment: HIVE-4554.patch Patch attempting to fix the issue. Failed to create a table from existing file if file path has spaces --- Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang Attachments: HIVE-4554.patch To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces
[ https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-4554: -- Fix Version/s: 0.11.0 Status: Patch Available (was: Open) Failed to create a table from existing file if file path has spaces --- Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang Fix For: 0.11.0 Attachments: HIVE-4554.patch To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces
[ https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-4554: -- Status: Open (was: Patch Available) Failed to create a table from existing file if file path has spaces --- Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang Fix For: 0.11.0 Attachments: HIVE-4554.patch To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira