[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449461#comment-13449461 ] Navis commented on HIVE-3427: - @Ashutosh, You are right. build/ql/test/data/exports directory is used by many tests(exim~, etc.). How about changing test directory build/ql/test/data/exports to build/ql/test/data/exports/HIVE-3428 or something? Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-3427.1.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3438) Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1
Namit Jain created HIVE-3438: Summary: Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1 Key: HIVE-3438 URL: https://issues.apache.org/jira/browse/HIVE-3438 Project: Hive Issue Type: Test Components: Tests Reporter: Namit Jain Assignee: Namit Jain Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be good to add more tests which tests the above condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3438) Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1
[ https://issues.apache.org/jira/browse/HIVE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449484#comment-13449484 ] Namit Jain commented on HIVE-3438: -- I have verified that the above scenarios work - it would be good to add those tests. Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1 --- Key: HIVE-3438 URL: https://issues.apache.org/jira/browse/HIVE-3438 Project: Hive Issue Type: Test Components: Tests Reporter: Namit Jain Assignee: Namit Jain Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be good to add more tests which tests the above condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3427: Attachment: HIVE-3427.2.patch.txt reproduce : ant package test -Dtestcase=TestCliDriver -Dqfile=exim_00_nonpart_empty.q,metadata_export_drop.q Changed directory and test passed. Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3283) bucket information should be used from the partition instead of the table
[ https://issues.apache.org/jira/browse/HIVE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449502#comment-13449502 ] Namit Jain commented on HIVE-3283: -- Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be useful to have the partition metadata be used for bucketing information. bucket information should be used from the partition instead of the table - Key: HIVE-3283 URL: https://issues.apache.org/jira/browse/HIVE-3283 Project: Hive Issue Type: Bug Reporter: Namit Jain Currently Hive uses the number of buckets from the table object. Ideally, the number of buckets from the partition should be used -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 3.0.1
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449514#comment-13449514 ] Sushanth Sowmyan commented on HIVE-2084: @Carl : As an update, I discovered that with the newer DataNucleus, what's happening is that Map types with null values cannot be persisted. This is a problem because we stamp a comment field in the parametersMap irrespective of whether a comment was provided or not, and this causes a failure during index creation. This is also the same issue that I refer to in HIVE-2800 where thrift has similar issues, where the fix is the same. Upgrade datanucleus from 2.0.3 to 3.0.1 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Ning Zhang Assignee: Sushanth Sowmyan Labels: datanucleus Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2084.D2397.1.patch, HIVE-2084.1.patch.txt, HIVE-2084.2.patch.txt, HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 3.0.1
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449528#comment-13449528 ] Andy Jefferson commented on HIVE-2084: -- Obviously DataNucleus has testcases that persist Maps with null values, and they work (since all tests pass with every release), so clearly down to your map and how you're doing things. Upgrade datanucleus from 2.0.3 to 3.0.1 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Ning Zhang Assignee: Sushanth Sowmyan Labels: datanucleus Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2084.D2397.1.patch, HIVE-2084.1.patch.txt, HIVE-2084.2.patch.txt, HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449636#comment-13449636 ] Edward Capriolo commented on HIVE-3427: --- As a follow up the economic tests should clean themselves up since that is the real issue here. Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449637#comment-13449637 ] Edward Capriolo commented on HIVE-3427: --- *exim tests Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3436) Difference in exception string from native method causes script_pipe.q to fail on windows
[ https://issues.apache.org/jira/browse/HIVE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3436. Resolution: Fixed Fix Version/s: 0.10.0 Assignee: Thejas M Nair Committed to trunk. Thanks, Thejas! Difference in exception string from native method causes script_pipe.q to fail on windows -- Key: HIVE-3436 URL: https://issues.apache.org/jira/browse/HIVE-3436 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.10.0 Attachments: HIVE-3436.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2999) Offline build is not working
[ https://issues.apache.org/jira/browse/HIVE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-2999. Resolution: Fixed Fix Version/s: 0.10.0 Committed to trunk. Thanks, Navis! Offline build is not working Key: HIVE-2999 URL: https://issues.apache.org/jira/browse/HIVE-2999 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-2999.1.patch.txt, HIVE-2999.2.patch.txt It's fine without -Doffline=true option. But with offline option (ant -Doffline=true clean package), it's failing with error message like this. {noformat} ivy-retrieve: [echo] Project: common [ivy:retrieve] :: loading settings :: file = /home/navis/apache/oss-hive/ivy/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve]module not found: org.apache.hadoop#hadoop-common;0.20.2 [ivy:retrieve] local: tried [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/ivys/ivy.xml [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/jars/hadoop-common.jar [ivy:retrieve] apache-snapshot: tried [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] maven2: tried [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] datanucleus-repo: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] hadoop-source: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] hadoop-source2: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://archive.cloudera.com/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve]module not found: org.apache.hadoop#hadoop-auth;0.20.2 [ivy:retrieve] local: tried [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/ivys/ivy.xml [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/jars/hadoop-auth.jar [ivy:retrieve] apache-snapshot: tried [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar [ivy:retrieve] maven2: tried [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar [ivy:retrieve] datanucleus-repo: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar [ivy:retrieve] hadoop-source: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve]
[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449751#comment-13449751 ] Ashutosh Chauhan commented on HIVE-3427: Navis, Current patch fixes the problem. +1 will commit if tests pass. Thanks for your time for this one. Ed, Totally agree. Mind creating a new jira for it. Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-3323 ThriftSerde: Enable enum to string conversions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6915/#review11101 --- Status update: The CI job timed out (after 8 hours!!) so I'm looking into increasing the global job runtime limit and rerunning the tests. When I verified the tests pass I'll post this patch in the jira. - Travis Crawford On Sept. 6, 2012, 12:12 a.m., Travis Crawford wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6915/ --- (Updated Sept. 6, 2012, 12:12 a.m.) Review request for hive and Ashutosh Chauhan. Description --- ThriftSerde: Enable enum to string conversions This addresses bug HIVE-3323. https://issues.apache.org/jira/browse/HIVE-3323 Diffs - ql/src/test/queries/clientpositive/convert_enum_to_string.q PRE-CREATION ql/src/test/results/clientpositive/convert_enum_to_string.q.out PRE-CREATION serde/if/test/megastruct.thrift PRE-CREATION serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java PRE-CREATION serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java PRE-CREATION serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java b21755e serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java 921ce2b Diff: https://reviews.apache.org/r/6915/diff/ Testing --- Running CI now after rebasing to master and changing the default to enabled. Some preliminary feedback would be great though https://travis.ci.cloudbees.com/job/HIVE-3323_enum_to_string/10/ To test, I added a new struct that contains an enum field, we check that its schema is correctly described, and that this property can be enable/disabled at runtime. Something I'm not clear on with Hive is how to write more comprehensive tests that involved more than just ql commands. For example, take a look at: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatHiveThriftCompatibility.java?view=markup Here we see an example junit test I wrote that creates a file containing thrift structs, creates the table, checks its schema, and ensures the query returns expected output. With the Hive test suite all I add here are ql commands that check the schema, since I'm not sure how to do the test setup. I'm more than happy to add a more comprehensive test but would appreciate some guidance to do that correctly. Thanks, Travis Crawford
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #128
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/ -- [...truncated 10256 lines...] [echo] Project: odbc [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/odbc/src/conf does not exist. ivy-resolve-test: [echo] Project: odbc ivy-retrieve-test: [echo] Project: odbc compile-test: [echo] Project: odbc create-dirs: [echo] Project: serde [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/serde/src/test/resources does not exist. init: [echo] Project: serde ivy-init-settings: [echo] Project: serde ivy-resolve: [echo] Project: serde [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-serde-default.xml to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-serde-default.html ivy-retrieve: [echo] Project: serde dynamic-serde: compile: [echo] Project: serde ivy-resolve-test: [echo] Project: serde ivy-retrieve-test: [echo] Project: serde compile-test: [echo] Project: serde [javac] Compiling 26 source files to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/serde/test/classes [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. create-dirs: [echo] Project: service [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/service/src/test/resources does not exist. init: [echo] Project: service ivy-init-settings: [echo] Project: service ivy-resolve: [echo] Project: service [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-service-default.html ivy-retrieve: [echo] Project: service compile: [echo] Project: service ivy-resolve-test: [echo] Project: service ivy-retrieve-test: [echo] Project: service compile-test: [echo] Project: service [javac] Compiling 2 source files to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/service/test/classes test: [echo] Project: hive test-shims: [echo] Project: hive test-conditions: [echo] Project: shims gen-test: [echo] Project: shims create-dirs: [echo] Project: shims [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/test/resources does not exist. init: [echo] Project: shims ivy-init-settings: [echo] Project: shims ivy-resolve: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html ivy-retrieve: [echo] Project: shims compile: [echo] Project: shims [echo] Building shims 0.20 build_shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java against hadoop 0.20.2 (https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/hadoopcore/hadoop-0.20.2) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.20S build_shims: [echo] Project: shims [echo] Compiling
[jira] [Updated] (HIVE-3306) SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key
[ https://issues.apache.org/jira/browse/HIVE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3306: - Resolution: Fixed Fix Version/s: 0.10.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Navis SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key -- Key: HIVE-3306 URL: https://issues.apache.org/jira/browse/HIVE-3306 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.10.0 Attachments: HIVE-3306.1.patch.txt CREATE TABLE bucket_small (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_small; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_small; CREATE TABLE bucket_big (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket3outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket4outof4.txt' INTO TABLE bucket_big; select count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; select /* + MAPJOIN(a) */ count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; returns 116 (same) But with BucketMapJoin or SMBJoin, it returns 61. But this should not be allowed cause hash(a.key) != hash(a.key + a.key). Bucket context should be utilized only with exact matching join expression with sort/cluster key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu updated HIVE-3421: -- Attachment: HIVE-3421.patch.4.txt Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt Compute (estimate) top k values for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3422) Support partial partition specifications in when enabling/disabling protections in Hive
[ https://issues.apache.org/jira/browse/HIVE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449923#comment-13449923 ] Jean Xu commented on HIVE-3422: --- Phabricator diff: https://reviews.facebook.net/D5241 Support partial partition specifications in when enabling/disabling protections in Hive --- Key: HIVE-3422 URL: https://issues.apache.org/jira/browse/HIVE-3422 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Jean Xu Priority: Minor Currently if you have a table t with partition columns c1 and c2 the following command works: ALTER TABLE t PARTITION (c1 = 'x', c2 = 'y') ENABLE NO_DROP; The following does not: ALTER TABLE t PARTITION (c1 = 'x') ENABLE NO_DROP; We would like all existing partitions for which c1 = 'x' to have NO_DROP enabled when a user runs the above command -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3098: --- Status: Open (was: Patch Available) Posting updated patch for unsecure-Hadoop. Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-3098_(FS_closeAllForUGI()).patch, Hive_3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2095) auto convert map join bug
[ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449933#comment-13449933 ] Matt Kleiderman commented on HIVE-2095: --- I think I'm hitting this issue with an 0.7.1 installation - can you provide information about how big the tables need to be in order to trigger the NullPointerException? auto convert map join bug - Key: HIVE-2095 URL: https://issues.apache.org/jira/browse/HIVE-2095 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.8.0 Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch 1) when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. 2) added a null check for back up tasks. Otherwise will see NullPointerException 3) CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. 4) changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. 5) Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. Here is the logic: + * Get a list of big table candidates. Only the tables in the returned set can + * be used as big table in the join operation. + * + * The logic here is to scan the join condition array from left to right. If + * see a inner join and the bigTableCandidates is empty, add both side of this + * inner join to big table candidates. If see a left outer join, and the + * bigTableCandidates is empty, add the left side to it, and if the + * bigTableCandidates is not empty, do nothing (which means the + * bigTableCandidates is from left side). If see a right outer join, clear the + * bigTableCandidates, and add right side to the bigTableCandidates, it means + * the right side of a right outer join always win. If see a full outer join, + * return null immediately (no one can be the big table, can not do a + * mapjoin). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3098: --- Attachment: hive-3098.patch Updated patch that fixes the leak in TUGIBasedProcessor (alongside the fix in HadoopThriftAuthBridge20S.) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, Hive_3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449937#comment-13449937 ] Mithun Radhakrishnan commented on HIVE-3098: Thanks, Ashutosh and Alan. The new patch looks good. Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, Hive_3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3098: --- Status: Patch Available (was: Open) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, Hive_3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3439) PARTITIONED BY clause in CREATE TABLE is order-dependent
Jonathan Natkins created HIVE-3439: -- Summary: PARTITIONED BY clause in CREATE TABLE is order-dependent Key: HIVE-3439 URL: https://issues.apache.org/jira/browse/HIVE-3439 Project: Hive Issue Type: Bug Reporter: Jonathan Natkins hive create external table foo (a int) location '/user/natty/foo' partitioned by (b int); FAILED: Parse Error: line 1:61 mismatched input 'partitioned' expecting EOF near ''/user/natty/foo'' hive create external table foo (a int) partitioned by (b int) location '/user/natty/foo'; OK Time taken: 0.051 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed
[ https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449978#comment-13449978 ] Brian Bloniarz commented on HIVE-1898: -- I think Luke is right -- maybe the bug title should be changed to simply say data with newlines won't work in Text/LazySimpleSerDe tables? I haven't tested it, but would STORED AS SEQUENCEFILE tables be immune to this problem? The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed --- Key: HIVE-1898 URL: https://issues.apache.org/jira/browse/HIVE-1898 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.5.0 Reporter: Josh Patterson Priority: Minor If I want to preserve data in columns which contains a newline (webcrawling for instance) I cannot set the ESCAPED BY clause to escape these out (other characters such as commas escape fine, however). This may be due to the line terminators, which are locked to be newlines, are picked up first, and then fields processed. This seems to be related to: SerDe should escape some special characters https://issues.apache.org/jira/browse/HIVE-136 and Implement LINES TERMINATED BY https://issues.apache.org/jira/browse/HIVE-302 where at comment: https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435 This is not fixable currently because the line terminator is determined by LineRecordReader.LineReader which is in the Hadoop land. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3436) Difference in exception string from native method causes script_pipe.q to fail on windows
[ https://issues.apache.org/jira/browse/HIVE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450010#comment-13450010 ] Hudson commented on HIVE-3436: -- Integrated in Hive-trunk-h0.21 #1651 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1651/]) HIVE-3436 : Difference in exception string from native method causes script_pipe.q to fail on windows (Thejas Nair via Ashutosh Chauhan) (Revision 1381597) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381597 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java Difference in exception string from native method causes script_pipe.q to fail on windows -- Key: HIVE-3436 URL: https://issues.apache.org/jira/browse/HIVE-3436 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.10.0 Attachments: HIVE-3436.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1651 - Still Failing
Changes for Build #1638 [namit] HIVE-3393 get_json_object and json_tuple should use Jackson library (Kevin Wilfong via namit) Changes for Build #1639 Changes for Build #1640 [ecapriolo] HIVE-3068 Export table metadata as JSON on table drop (Andrew Chalfant via egc) Changes for Build #1641 Changes for Build #1642 [hashutosh] HIVE-3338 : Archives broken for hadoop 1.0 (Vikram Dixit via Ashutosh Chauhan) Changes for Build #1643 Changes for Build #1644 Changes for Build #1645 [cws] HIVE-3413. Fix pdk.PluginTest on hadoop23 (Zhenxiao Luo via cws) Changes for Build #1646 [cws] HIVE-3056. Ability to bulk update location field in Db/Table/Partition records (Shreepadma Venugopalan via cws) [cws] HIVE-3416 [jira] Fix TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS when running Hive on hadoop23 (Zhenxiao Luo via Carl Steinbach) Summary: HIVE-3416: Fix TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS when running Hive on hadoop23 TestAvroSerdeUtils determinSchemaCanReadSchemaFromHDFS is failing when running hive on hadoop23: $ant very-clean package -Dhadoop.version=0.23.1 -Dhadoop-0.23.version=0.23.1 -Dhadoop.mr.rev=23 $ant test -Dhadoop.version=0.23.1 -Dhadoop-0.23.version=0.23.1 -Dhadoop.mr.rev=23 -Dtestcase=TestAvroSerdeUtils testcase classname=org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils name=determineSchemaCanReadSchemaFromHDFS time=0.21 error message=org/apache/hadoop/net/StaticMapping type=java.lang.NoClassDefFoundErrorjava.lang.NoClassDefFoundError: org/apache/hadoop/net/StaticMapping at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:534) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:489) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:360) at org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS(TestAvroSerdeUtils.java:187) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.net.StaticMapping at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 25 more /error /testcase Test Plan: EMPTY Reviewers: JIRA Differential Revision: https://reviews.facebook.net/D5025 [cws] HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 (008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws) [cws] HIVE-3412. Fix TestCliDriver.repair on Hadoop 0.23.3, 3.0.0, and 2.2.0-alpha (Zhenxiao Luo via cws) Changes for Build #1647 Changes for Build #1648 [namit] HIVE-3429 Bucket map join involving table with more than 1 partition column causes FileNotFoundException (Kevin Wilfong via namit) Changes for Build #1649 [hashutosh] HIVE-3075 : Improve HiveMetaStore logging (Travis Crawford via Ashutosh Chauhan) Changes for Build #1650 [hashutosh] HIVE-3340 : shims unit test failures fails further test progress (Giridharan Kesavan via Ashutosh Chauhan) Changes for Build #1651 [hashutosh]
[jira] [Created] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
Zhenxiao Luo created HIVE-3440: -- Summary: Fix pdk PluginTest failing on trunk-h0.21 Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:441) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by:
[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu updated HIVE-3421: -- Attachment: HIVE-967.5.patch.txt Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, HIVE-967.5.patch.txt Compute (estimate) top k values for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
[ https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450041#comment-13450041 ] Zhenxiao Luo commented on HIVE-3440: This is NOT running on hadoop23, even on the current trunk, build with: $ant test -Dtest.continue.on.failure=false Could get the error. Also found that it happens here: https://builds.apache.org/job/Hive-trunk-h0.21/1651/consoleFull Seems like missing jackson mapper library. Fix pdk PluginTest failing on trunk-h0.21 - Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #128
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/ -- [...truncated 36447 lines...] [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-09-06_14-08-39_828_6118424854506059128/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_1954180276.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] Copying file: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2012-09-06_14-08-44_161_4340349784355273135/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-09-06_14-08-44_161_4340349784355273135/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_1344012563.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_505766291.txt [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_321993572.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] Copying
[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
[ https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450056#comment-13450056 ] Zhenxiao Luo commented on HIVE-3440: Found that this is due to missing jackson mapper library. From the build log, we could see it start failing after HIVE-3393 is commited: Build#1637 is passed: https://builds.apache.org/job/Hive-trunk-h0.21/1637/consoleFull Build#1638 is failing: https://builds.apache.org/job/Hive-trunk-h0.21/1638/consoleFull I think to fix it. We need to put jackson-mapper dependency into ql, so that when pdk is running GenericUDTFJSONTuple.java, ObjectMapper initialization, no such NoClassDefFoundError. Fix pdk PluginTest failing on trunk-h0.21 - Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at
[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
[ https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450061#comment-13450061 ] Zhenxiao Luo commented on HIVE-3440: Review Request submitted at: https://reviews.facebook.net/D5265 With this patch, pdk pluginTest is passed when triggered by both builtin and pdk: $ant test -Dtest.continue.on.failure=false Fix pdk PluginTest failing on trunk-h0.21 - Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Attachments: HIVE-3440.1.patch.txt Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434) at
[jira] [Updated] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
[ https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-3440: --- Attachment: HIVE-3440.1.patch.txt Fix pdk PluginTest failing on trunk-h0.21 - Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Attachments: HIVE-3440.1.patch.txt Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166) at
[jira] [Updated] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21
[ https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-3440: --- Status: Patch Available (was: Open) Fix pdk PluginTest failing on trunk-h0.21 - Key: HIVE-3440 URL: https://issues.apache.org/jira/browse/HIVE-3440 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 Attachments: HIVE-3440.1.patch.txt Get the failure when running on hadoop21, triggered directly from pdk(when triggered from builtin, pdk test is passed). Here is the execution log: 2012-09-06 13:46:05,646 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472) at org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166) at
[jira] [Created] (HIVE-3441) testcases escape1,escape2 fail on windows
Thejas M Nair created HIVE-3441: --- Summary: testcases escape1,escape2 fail on windows Key: HIVE-3441 URL: https://issues.apache.org/jira/browse/HIVE-3441 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.10.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3441) testcases escape1,escape2 fail on windows
[ https://issues.apache.org/jira/browse/HIVE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450121#comment-13450121 ] Thejas M Nair commented on HIVE-3441: - The tests fail because the partitions inserted have have partition column strings that have strings that are not accepted in file names on windows. testcases escape1,escape2 fail on windows - Key: HIVE-3441 URL: https://issues.apache.org/jira/browse/HIVE-3441 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.10.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu updated HIVE-3421: -- Description: Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf was: Compute (estimate) top k values for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, HIVE-967.5.patch.txt Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu updated HIVE-3421: -- Description: Compute (estimate) top k values statistic for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf was: Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, HIVE-967.5.patch.txt Compute (estimate) top k values statistic for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu updated HIVE-3421: -- Description: Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf was: Compute (estimate) top k values statistic for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, HIVE-967.5.patch.txt Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
Zhenxiao Luo created HIVE-3442: -- Summary: AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2999) Offline build is not working
[ https://issues.apache.org/jira/browse/HIVE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450183#comment-13450183 ] Hudson commented on HIVE-2999: -- Integrated in Hive-trunk-h0.21 #1652 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1652/]) HIVE-2999 : Offline build is not working (Navis via Ashutosh Chauhan) (Revision 1381643) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381643 Files : * /hive/trunk/builtins/ivy.xml * /hive/trunk/common/ivy.xml * /hive/trunk/ql/ivy.xml * /hive/trunk/serde/ivy.xml Offline build is not working Key: HIVE-2999 URL: https://issues.apache.org/jira/browse/HIVE-2999 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-2999.1.patch.txt, HIVE-2999.2.patch.txt It's fine without -Doffline=true option. But with offline option (ant -Doffline=true clean package), it's failing with error message like this. {noformat} ivy-retrieve: [echo] Project: common [ivy:retrieve] :: loading settings :: file = /home/navis/apache/oss-hive/ivy/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve]module not found: org.apache.hadoop#hadoop-common;0.20.2 [ivy:retrieve] local: tried [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/ivys/ivy.xml [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/jars/hadoop-common.jar [ivy:retrieve] apache-snapshot: tried [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] maven2: tried [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] datanucleus-repo: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] hadoop-source: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve] hadoop-source2: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar: [ivy:retrieve] http://archive.cloudera.com/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar [ivy:retrieve]module not found: org.apache.hadoop#hadoop-auth;0.20.2 [ivy:retrieve] local: tried [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/ivys/ivy.xml [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/jars/hadoop-auth.jar [ivy:retrieve] apache-snapshot: tried [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar [ivy:retrieve] maven2: tried [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar: [ivy:retrieve] http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar [ivy:retrieve] datanucleus-repo: tried [ivy:retrieve] -- artifact org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
[jira] [Commented] (HIVE-3306) SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key
[ https://issues.apache.org/jira/browse/HIVE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450184#comment-13450184 ] Hudson commented on HIVE-3306: -- Integrated in Hive-trunk-h0.21 #1652 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1652/]) HIVE-3306 SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key (Navis via namit) (Revision 1381669) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381669 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java * /hive/trunk/ql/src/test/queries/clientpositive/bucket_map_join_1.q * /hive/trunk/ql/src/test/queries/clientpositive/bucket_map_join_2.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketmapjoin_negative3.q * /hive/trunk/ql/src/test/results/clientpositive/bucket_map_join_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucket_map_join_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key -- Key: HIVE-3306 URL: https://issues.apache.org/jira/browse/HIVE-3306 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.10.0 Attachments: HIVE-3306.1.patch.txt CREATE TABLE bucket_small (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_small; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_small; CREATE TABLE bucket_big (key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket3outof4.txt' INTO TABLE bucket_big; load data local inpath '/home/navis/apache/oss-hive/data/files/srcsortbucket4outof4.txt' INTO TABLE bucket_big; select count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; select /* + MAPJOIN(a) */ count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = b.key; returns 116 (same) But with BucketMapJoin or SMBJoin, it returns 61. But this should not be allowed cause hash(a.key) != hash(a.key + a.key). Bucket context should be utilized only with exact matching join expression with sort/cluster key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450191#comment-13450191 ] Zhenxiao Luo commented on HIVE-3442: CC'd Jakob. So that if there is any AvroSerDe usage error, Jakob's comments and suggesions are always welcome. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but
[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()
[ https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450192#comment-13450192 ] Siying Dong commented on HIVE-3388: --- +1 Improve Performance of UDF PERCENTILE_APPROX() -- Key: HIVE-3388 URL: https://issues.apache.org/jira/browse/HIVE-3388 Project: Hive Issue Type: Task Reporter: Rongrong Zhong Assignee: Rongrong Zhong Priority: Minor Attachments: HIVE-3388.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450198#comment-13450198 ] Jakob Homan commented on HIVE-3442: --- The docs are out of date (my fault). schema.url and schema.literal got changed to avro.schema.url and avro.schema.literal during the move to Apache, to be more specific to Avro. Try with those. I'll update the wiki. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450203#comment-13450203 ] Zhenxiao Luo commented on HIVE-3442: @Jakob: Thanks a lot. I tried avro.schema.url, seems still not working: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450202#comment-13450202 ] Jakob Homan commented on HIVE-3442: --- updated the wiki. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically generated by JIRA. If you think it was
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450206#comment-13450206 ] Zhenxiao Luo commented on HIVE-3442: Also tried avro.schema.literal, seems not working: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. @Jakob: I will trace the code to see what is wrong. Any comments are appreciated. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450209#comment-13450209 ] Jakob Homan commented on HIVE-3442: --- bq. 'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc' Is this a valid URL? Is it accessible from the metastore? AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450215#comment-13450215 ] Zhenxiao Luo commented on HIVE-3442: @Jakob: Thanks a lot. Get it working with the following valid URL: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='file:${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; How about I resolve this as Not A Bug? AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
[jira] [Updated] (HIVE-3411) Filter predicates on outer join overlapped on single alias is not handled properly
[ https://issues.apache.org/jira/browse/HIVE-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3411: Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-3381) Filter predicates on outer join overlapped on single alias is not handled properly -- Key: HIVE-3411 URL: https://issues.apache.org/jira/browse/HIVE-3411 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Environment: ubuntu 10.10 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3411.1.patch.txt Currently, join predicates on outer join are evaluated in join operator (or HashSink for MapJoin) and the result value is tagged to end of each values(as a boolean), which is used for joining values. But when predicates are overlapped on single alias, all the predicates are evaluated with AND conjunction, which makes invalid result. For example with table a with values, {noformat} 100 40 100 50 100 60 {noformat} Query below has overlapped predicates on alias b, which is making all the values on b are tagged with true(filtered) {noformat} select * from a right outer join a b on (a.key=b.key AND a.value=50 AND b.value=50) left outer join a c on (b.key=c.key AND b.value=60 AND c.value=60); NULL NULL100 40 NULLNULL NULL NULL100 50 NULLNULL NULL NULL100 60 NULLNULL -- Join predicate Join Operator condition map: Right Outer Join0 to 1 Left Outer Join1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} 2 {VALUE._col0} {VALUE._col1} filter predicates: 0 1 {(VALUE._col1 = 50)} {(VALUE._col1 = 60)} 2 {noformat} but this should be {noformat} NULL NULL100 40 NULLNULL 100 50 100 50 NULLNULL NULL NULL100 60 100 60 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450217#comment-13450217 ] Jakob Homan commented on HIVE-3442: --- Sounds good. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically generated by JIRA. If you think it was
[jira] [Commented] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23
[ https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450216#comment-13450216 ] Chris Drome commented on HIVE-3437: --- I'm actively working on this JIRA, but was not able to assign it to myself. 0.23 compatibility: fix unit tests when building against 0.23 - Key: HIVE-3437 URL: https://issues.apache.org/jira/browse/HIVE-3437 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.9.1 Reporter: Chris Drome Many unit tests fail as a result of building the code against hadoop 0.23. Initial focus will be to fix 0.9. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo resolved HIVE-3442. Resolution: Not A Problem Get help from Jakob. It is actually an invalid use of AvroSerDe. Not a bug. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically
[jira] [Created] (HIVE-3443) Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key
Shreepadma Venugopalan created HIVE-3443: Summary: Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key Key: HIVE-3443 URL: https://issues.apache.org/jira/browse/HIVE-3443 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical Hive Metatool should take serde_param_key from the user to allow for chanes to avro serde's schema url key. In the past avro.schema.url key used to be called schema.url. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3443) Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key
[ https://issues.apache.org/jira/browse/HIVE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450242#comment-13450242 ] Shreepadma Venugopalan commented on HIVE-3443: -- Support for Hive MetaTool was added in HIVE-3056. Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key --- Key: HIVE-3443 URL: https://issues.apache.org/jira/browse/HIVE-3443 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical Hive Metatool should take serde_param_key from the user to allow for chanes to avro serde's schema url key. In the past avro.schema.url key used to be called schema.url. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23
[ https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3437: --- Assignee: Chris Drome 0.23 compatibility: fix unit tests when building against 0.23 - Key: HIVE-3437 URL: https://issues.apache.org/jira/browse/HIVE-3437 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.9.1 Reporter: Chris Drome Assignee: Chris Drome Many unit tests fail as a result of building the code against hadoop 0.23. Initial focus will be to fix 0.9. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23
[ https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450295#comment-13450295 ] Ashutosh Chauhan commented on HIVE-3437: Assigned to Chris. Chris, also added you to contributors list. So, you can assign yourself any other jiras. 0.23 compatibility: fix unit tests when building against 0.23 - Key: HIVE-3437 URL: https://issues.apache.org/jira/browse/HIVE-3437 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.9.1 Reporter: Chris Drome Assignee: Chris Drome Many unit tests fail as a result of building the code against hadoop 0.23. Initial focus will be to fix 0.9. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450302#comment-13450302 ] Ashutosh Chauhan commented on HIVE-3098: +1 will commit if tests pass. Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, Hive_3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3427: --- Resolution: Fixed Fix Version/s: 0.10.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk Key: HIVE-3427 URL: https://issues.apache.org/jira/browse/HIVE-3427 Project: Hive Issue Type: Test Affects Versions: 0.10.0 Reporter: Ashutosh Chauhan Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt I think its a new test which was added via HIVE-3068 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3086) Skewed Join Optimization
[ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3086: - Status: Open (was: Patch Available) comments from Kevin Skewed Join Optimization Key: HIVE-3086 URL: https://issues.apache.org/jira/browse/HIVE-3086 Project: Hive Issue Type: New Feature Reporter: Nadeem Moidu Assignee: Namit Jain Attachments: hive.3086.1.patch During a join operation, if one of the columns has a skewed key, it can cause that particular reducer to become the bottleneck. The following feature will address it: https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3444) Support reading columns containing line separator
Navis created HIVE-3444: --- Summary: Support reading columns containing line separator Key: HIVE-3444 URL: https://issues.apache.org/jira/browse/HIVE-3444 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Currently, LazySimpleSerde cannot handle columns including newline character cause hadoop splits rows by newline character. If the overhead counting fields by full scan and merging partial lines is tolerable, multi-lined column can be reconstructed at runtime. But with this method, multi-lined column should not be located at the last of the row. This is just a idea for HIVE-1898. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3444) Support reading columns containing line separator
[ https://issues.apache.org/jira/browse/HIVE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3444: Status: Patch Available (was: Open) https://reviews.facebook.net/D5277 Support reading columns containing line separator - Key: HIVE-3444 URL: https://issues.apache.org/jira/browse/HIVE-3444 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3444.1.patch.txt Currently, LazySimpleSerde cannot handle columns including newline character cause hadoop splits rows by newline character. If the overhead counting fields by full scan and merging partial lines is tolerable, multi-lined column can be reconstructed at runtime. But with this method, multi-lined column should not be located at the last of the row. This is just a idea for HIVE-1898. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3444) Support reading columns containing line separator
[ https://issues.apache.org/jira/browse/HIVE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3444: Attachment: HIVE-3444.1.patch.txt Support reading columns containing line separator - Key: HIVE-3444 URL: https://issues.apache.org/jira/browse/HIVE-3444 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3444.1.patch.txt Currently, LazySimpleSerde cannot handle columns including newline character cause hadoop splits rows by newline character. If the overhead counting fields by full scan and merging partial lines is tolerable, multi-lined column can be reconstructed at runtime. But with this method, multi-lined column should not be located at the last of the row. This is just a idea for HIVE-1898. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3086) Skewed Join Optimization
[ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3086: - Status: Patch Available (was: Open) addressed comments Skewed Join Optimization Key: HIVE-3086 URL: https://issues.apache.org/jira/browse/HIVE-3086 Project: Hive Issue Type: New Feature Reporter: Nadeem Moidu Assignee: Namit Jain Attachments: hive.3086.1.patch, hive.3086.2.patch During a join operation, if one of the columns has a skewed key, it can cause that particular reducer to become the bottleneck. The following feature will address it: https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3086) Skewed Join Optimization
[ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3086: - Attachment: hive.3086.2.patch Skewed Join Optimization Key: HIVE-3086 URL: https://issues.apache.org/jira/browse/HIVE-3086 Project: Hive Issue Type: New Feature Reporter: Nadeem Moidu Assignee: Namit Jain Attachments: hive.3086.1.patch, hive.3086.2.patch During a join operation, if one of the columns has a skewed key, it can cause that particular reducer to become the bottleneck. The following feature will address it: https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450345#comment-13450345 ] Uma Maheswara Rao G commented on HIVE-2604: --- Hi Yongqiang, Any reason for holding this off from commit? Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies Key: HIVE-2604 URL: https://issues.apache.org/jira/browse/HIVE-2604 Project: Hive Issue Type: Sub-task Components: Contrib Reporter: Krishna Kumar Assignee: Krishna Kumar Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2604.D1011.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2604.D1011.2.patch, HIVE-2604.v0.patch, HIVE-2604.v1.patch, HIVE-2604.v2.patch The strategies supported are 1. using a specified codec on the column 2. using a specific codec on the column which is serialized via a specific serde 3. using a specific TypeSpecificCompressor instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira