[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949554#comment-13949554 ] Alan Gates commented on HIVE-6670: -- Ran tests locally, all looks good. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949968#comment-13949968 ] Hive QA commented on HIVE-6670: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12637036/HIVE-6670.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5492 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1987/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1987/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12637036 ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947964#comment-13947964 ] Hive QA commented on HIVE-6670: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12636760/HIVE-6670.patch {color:green}SUCCESS:{color} +1 5457 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1964/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1964/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12636760 ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948620#comment-13948620 ] Abin Shahab commented on HIVE-6670: --- Thanks for rolling it forward! ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948677#comment-13948677 ] Jason Dere commented on HIVE-6670: -- +1 ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947077#comment-13947077 ] Abin Shahab commented on HIVE-6670: --- [~hashutosh] I can write a test case. Is there a similar testcase that I can look at? I'm not sure how to create a ReviewBoard entry. It'd be great if you can do that once I upload the test. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947287#comment-13947287 ] Ashutosh Chauhan commented on HIVE-6670: Query you posted in description is a good testcase. Just add it in as .q file in ql/src/test/queries/clientpositive/ where all other test queries are. More info at [wiki site | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ] You can create review request on [review board | https://reviews.apache.org/r/new/] ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947359#comment-13947359 ] Ashutosh Chauhan commented on HIVE-6670: I tested manually and I am able to repro. Also, with patch it succeeds. Thats, good. However, I think instead of passing on cmd line, better to pass it via Conf object using {{hive.added.jars.path}} variable , the way its done in MapRedTask. That way its consistent across two types of task. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947420#comment-13947420 ] Abin Shahab commented on HIVE-6670: --- But I don't want to overwrite existing added jars. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947432#comment-13947432 ] Ashutosh Chauhan commented on HIVE-6670: You need not to. You can append. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)