[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948620#comment-13948620 ] Abin Shahab commented on HIVE-6670: --- Thanks for rolling it forward! ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-6670: -- Attachment: HIVE-6670.patch HIVE-6670-branch-0.12.patch ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-6670: -- Status: Patch Available (was: Open) Added patches for both trunk and branch. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947077#comment-13947077 ] Abin Shahab commented on HIVE-6670: --- [~hashutosh] I can write a test case. Is there a similar testcase that I can look at? I'm not sure how to create a ReviewBoard entry. It'd be great if you can do that once I upload the test. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947420#comment-13947420 ] Abin Shahab commented on HIVE-6670: --- But I don't want to overwrite existing added jars. ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6670) ClassNotFound with Serde
Abin Shahab created HIVE-6670: - Summary: ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place the sample files attached to this ticket in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive (this might cause a problem in dogfood since i've already created tables in those names, so you'll have to change the table names or delete mine): ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6670) ClassNotFound with Serde
[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-6670: -- Description: We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place some sample SCV files in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == was: We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place the sample files attached to this ticket in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ create the tables in hive (this might cause a problem in dogfood since i've already created tables in those names, so you'll have to change the table names or delete mine): ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; === Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit
[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4
[ https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913481#comment-13913481 ] Abin Shahab commented on HIVE-5112: --- Steven, you can try compiling the orc related files with the correct version of protobuf(2.5.0). Upgrade protobuf to 2.5 from 2.4 Key: HIVE-5112 URL: https://issues.apache.org/jira/browse/HIVE-5112 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Owen O'Malley Fix For: 0.13.0 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch Hadoop and Hbase have both upgraded protobuf. We should as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6468) HS2 out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-6468: -- Description: We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) was: We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) beeline -u jdbc:hive2://localhost:1 -n user1 -d org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int; Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Summary: HS2 out of memory error when curl sends a get request (was: HS2 out of memory error with Beeline) HS2 out of memory error when curl sends a get request - Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905687#comment-13905687 ] Abin Shahab commented on HIVE-4501: --- What is the progress on this issue? HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.trunk.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6468) HS2 out of memory error with Beeline
Abin Shahab created HIVE-6468: - Summary: HS2 out of memory error with Beeline Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) beeline -u jdbc:hive2://localhost:1 -n user1 -d org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int; Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906417#comment-13906417 ] Abin Shahab commented on HIVE-6468: --- I am suspecting that too. But which component versions must match? HS2 out of memory error with Beeline Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) beeline -u jdbc:hive2://localhost:1 -n user1 -d org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int; Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906510#comment-13906510 ] Abin Shahab commented on HIVE-6468: --- Hmm, I'm seeing: libfb303-0.9.0.jar, libthrift-0.9.0.jar, and hive-service-0.12.0.jar Are these not correct? HS2 out of memory error with Beeline Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) beeline -u jdbc:hive2://localhost:1 -n user1 -d org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int; Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906649#comment-13906649 ] Abin Shahab commented on HIVE-6468: --- This is how we build hive: export HADOOP_VERSION=2.2.0 ant clean package tar -Dhadoop.version=${HADOOP_VERSION} -Dhadoop-0.23.version=${HADOOP_VERSION} -Dhadoop.mr.rev=23 -Dmvn.hadoop.profile=hadoop23 -Dhadoop23.version=${HADOOP_VERSION} HS2 out of memory error with Beeline Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) beeline -u jdbc:hive2://localhost:1 -n user1 -d org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int; Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4
[ https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890348#comment-13890348 ] Abin Shahab commented on HIVE-5112: --- Hi All, I notice the following exception when we try to use hive-0.12 orcfile format with hadoop-2.2. This goes away when we use it with hadoop-2.0.5. My hunch is that this is caused by the protobuf-2.4.1 code in hive-0.12. Should the bug be reopened? Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses. at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSerializedSize(OrcProto.java:7281) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.getSerializedSize(OrcProto.java:9054) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.writeTo(OrcProto.java:9007) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeFooter(WriterImpl.java:1804) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1869) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) Upgrade protobuf to 2.5 from 2.4 Key: HIVE-5112 URL: https://issues.apache.org/jira/browse/HIVE-5112 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Owen O'Malley Fix For: 0.13.0 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch Hadoop and Hbase have both upgraded protobuf. We should as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-5016: -- Reproduced In: 0.10.0 Status: Patch Available (was: Open) The patch changes 1 line in JobSubmitter, submitting the full classpath to the DistributedCache instead of the truncated path. Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Attachments: HIVE-5016.patch Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at
[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-5016: -- Attachment: HIVE-5016.patch Changes the classpaths submitted by the JobSubmitter to full path. Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Attachments: HIVE-5016.patch Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at
[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-5016: -- Status: Open (was: Patch Available) Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Attachments: HIVE-5016.patch Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at
[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-5016: -- Release Note: This change will allow hive to operate in local mode. Status: Patch Available (was: Open) This changes the classpath submitted by the jobsubmitter to the distributed cache. The old path did not include the protocol. The new path includes the protocol. This allows the classpath to be resolved for local mode. Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Attachments: HIVE-5016.patch Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at
[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HIVE-5016: -- Attachment: HIVE-5016.patch Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Attachments: HIVE-5016.patch, HIVE-5016.patch Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at
[jira] [Commented] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741219#comment-13741219 ] Abin Shahab commented on HIVE-5016: --- Root cause of this issue is: The classpath for jars in local mode points to a real file on disk. However, the JobSubmitter was cutting off the protocol part of the path. By default DistributedCache assumes that a protocol-less file is from HDFS, and that was causing the FileNotFound exception. The solution is to the entire path to the DistributedCache, which allowed DistributedCache to find it in the file system. Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at
[jira] [Commented] (HIVE-5016) Local mode FileNotFoundException: File does not exist
[ https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733660#comment-13733660 ] Abin Shahab commented on HIVE-5016: --- Root cause of this issue is Not picking the YarnRunner as the job runner when hive.exec.mode.local.auto=true mapreduce.framework.name gets set to 'local' instead of 'yarn'. This results in the LocalJobRunner being used as the JobRunner, and messes up the path creation. Local mode FileNotFoundException: File does not exist - Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at
[jira] [Commented] (HIVE-4881) hive local mode: java.io.FileNotFoundException: emptyFile
[ https://issues.apache.org/jira/browse/HIVE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733895#comment-13733895 ] Abin Shahab commented on HIVE-4881: --- I find that this bug only happens when you have an empty table. hive local mode: java.io.FileNotFoundException: emptyFile - Key: HIVE-4881 URL: https://issues.apache.org/jira/browse/HIVE-4881 Project: Hive Issue Type: Bug Environment: hive 0.9.0+158-1.cdh4.1.3.p0.23~squeeze-cdh4.1.3 Reporter: Bartosz Cisek Priority: Critical Our hive jobs fail due to strange error pasted below. Strace showed that process created this file, accessed it a few times and then it throwed exception that it couldn't find file it just accessed. In next step it unliked it. Yay. Very similar problem was reported [in already closed task|https://issues.apache.org/jira/browse/HIVE-1633?focusedCommentId=13598983page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13598983] or left unresolved on [mailing lists|http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3c94f02eb368b740ebbcd94df4d5d1d...@amxpr03mb054.eurprd03.prod.outlook.com%3E]. I'll be happy to provide required additional details. {code:title=Stack trace} 2013-07-18 12:49:46,109 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException as:username (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile 2013-07-18 12:49:46,113 ERROR exec.ExecDriver (SessionState.java:printError(403)) - Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile)' java.io.FileNotFoundException: File does not exist: /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) at org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:462) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1040) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1032) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:895) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:895) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:869) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) {code} {code:title=strace with grep emptyFile} 7385 14:48:02.808096 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile, {st_mode=S_IFREG|0755, st_size=0, ...}) = 0 7385 14:48:02.808201 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile, {st_mode=S_IFREG|0755, st_size=0, ...}) = 0 7385 14:48:02.808277 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile, {st_mode=S_IFREG|0755, st_size=0, ...}) = 0 7385 14:48:02.808348 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile, {st_mode=S_IFREG|0755,
[jira] [Created] (HIVE-5016) Local mode FileNotFoundException: File does not exist
Abin Shahab created HIVE-5016: - Summary: Local mode FileNotFoundException: File does not exist Key: HIVE-5016 URL: https://issues.apache.org/jira/browse/HIVE-5016 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: Centos 6.3 (final) Hadoop 2.0.2-alpha Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Hive libs: ls -1 lib/ antlr-2.7.7.jar antlr-runtime-3.0.1.jar avro-1.7.1.jar avro-mapred-1.7.1.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-dbcp-1.4.jar commons-lang-2.4.jar commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar commons-pool-1.5.4.jar datanucleus-connectionpool-2.0.3.jar datanucleus-core-2.0.3.jar datanucleus-enhancer-2.0.3.jar datanucleus-rdbms-2.0.3.jar derby-10.4.2.0.jar guava-r09.jar hbase-0.92.0.jar hbase-0.92.0-tests.jar hive-builtins-0.10.0.jar hive-cli-0.10.0.jar hive-common-0.10.0.jar hive-contrib-0.10.0.jar hive-exec-0.10.0.jar hive-hbase-handler-0.10.0.jar hive-hwi-0.10.0.jar hive-hwi-0.10.0.war hive-jdbc-0.10.0.jar hive-metastore-0.10.0.jar hive-pdk-0.10.0.jar hive-serde-0.10.0.jar hive-service-0.10.0.jar hive-shims-0.10.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar JavaEWAH-0.3.2.jar javolution-5.5.1.jar jdo2-api-2.3-ec.jar jetty-6.1.26.jar jetty-util-6.1.26.jar jline-0.9.94.jar json-20090211.jar libfb303-0.9.0.jar libthrift-0.9.0.jar log4j-1.2.16.jar php py servlet-api-2.5-20081211.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar sqlline-1_0_2.jar stringtemplate-3.1-b1.jar xz-1.0.jar zookeeper-3.4.3.jar Reporter: Abin Shahab Priority: Critical Hive jobs in local mode fail with the error posted below. The jar file that's not being found exists and has the following access: ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar rw-rw-r-- 1 ashahab ashahab 3914 Dec 18 2012 hive-0.10.0/lib/hive-builtins-0.10.0.jar Steps to reproduce: hive set hive.exec.mode.local.auto=true; hive set hive.exec.mode.local.auto; hive.exec.mode.local.auto=true hive select count(*) from abin_test_table; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/08/06 21:37:11 WARN conf.Configuration: file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log java.io.FileNotFoundException: File does not exist: /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)