[jira] [Commented] (HIVE-6670) ClassNotFound with Serde

2014-03-26 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948620#comment-13948620
 ] 

Abin Shahab commented on HIVE-6670:
---

Thanks for rolling it forward!



 ClassNotFound with Serde
 

 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.1.patch, 
 HIVE-6670.patch


 We are finding a ClassNotFound exception when we use 
 CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
 This is happening because MapredLocalTask does not pass the local added jars 
 to ExecDriver when that is launched.
 ExecDriver's classpath does not include the added jars. Therefore, when the 
 plan is deserialized, it throws a ClassNotFoundException in the 
 deserialization code, and results in a TableDesc object with a Null 
 DeserializerClass.
 This results in an NPE during Fetch.
 Steps to reproduce:
 wget 
 https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
  into somewhere local eg. 
 /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
 Place some sample SCV files in HDFS as follows:
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
 hdfs dfs -put /home/soam/sampleJoinTarget.csv 
 /user/soam/HiveSerdeIssue/sampleJoinTarget/
 
 create the tables in hive:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 create external table sampleCSV (md5hash string, filepath string)
 row format serde 'com.bizo.hive.serde.csv.CSVSerde'
 stored as textfile
 location '/user/soam/HiveSerdeIssue/sampleCSV/'
 ;
 create external table sampleJoinTarget (md5hash string, filepath string, 
 datestamp string, nblines string, nberrors string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
 ;
 ===
 Now, try the following JOIN:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 —
 This will fail with the error:
 Execution log at: /tmp/soam/.log
 java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
 Continuing ...
 2014-03-11 10:35:03 Starting to launch local task to process map join; 
 maximum memory = 238551040
 Execution failed with exit status: 2
 Obtaining error information
 Task failed!
 Task ID:
 Stage-4
 Logs:
 /var/log/hive/soam/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
 Try the following LEFT JOIN. This will work:
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 LEFT JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6670) ClassNotFound with Serde

2014-03-25 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-6670:
--

Attachment: HIVE-6670.patch
HIVE-6670-branch-0.12.patch

 ClassNotFound with Serde
 

 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab
 Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch


 We are finding a ClassNotFound exception when we use 
 CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
 This is happening because MapredLocalTask does not pass the local added jars 
 to ExecDriver when that is launched.
 ExecDriver's classpath does not include the added jars. Therefore, when the 
 plan is deserialized, it throws a ClassNotFoundException in the 
 deserialization code, and results in a TableDesc object with a Null 
 DeserializerClass.
 This results in an NPE during Fetch.
 Steps to reproduce:
 wget 
 https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
  into somewhere local eg. 
 /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
 Place some sample SCV files in HDFS as follows:
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
 hdfs dfs -put /home/soam/sampleJoinTarget.csv 
 /user/soam/HiveSerdeIssue/sampleJoinTarget/
 
 create the tables in hive:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 create external table sampleCSV (md5hash string, filepath string)
 row format serde 'com.bizo.hive.serde.csv.CSVSerde'
 stored as textfile
 location '/user/soam/HiveSerdeIssue/sampleCSV/'
 ;
 create external table sampleJoinTarget (md5hash string, filepath string, 
 datestamp string, nblines string, nberrors string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
 ;
 ===
 Now, try the following JOIN:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 —
 This will fail with the error:
 Execution log at: /tmp/soam/.log
 java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
 Continuing ...
 2014-03-11 10:35:03 Starting to launch local task to process map join; 
 maximum memory = 238551040
 Execution failed with exit status: 2
 Obtaining error information
 Task failed!
 Task ID:
 Stage-4
 Logs:
 /var/log/hive/soam/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
 Try the following LEFT JOIN. This will work:
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 LEFT JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6670) ClassNotFound with Serde

2014-03-25 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-6670:
--

Status: Patch Available  (was: Open)

Added patches for both trunk and branch.

 ClassNotFound with Serde
 

 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab
 Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch


 We are finding a ClassNotFound exception when we use 
 CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
 This is happening because MapredLocalTask does not pass the local added jars 
 to ExecDriver when that is launched.
 ExecDriver's classpath does not include the added jars. Therefore, when the 
 plan is deserialized, it throws a ClassNotFoundException in the 
 deserialization code, and results in a TableDesc object with a Null 
 DeserializerClass.
 This results in an NPE during Fetch.
 Steps to reproduce:
 wget 
 https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
  into somewhere local eg. 
 /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
 Place some sample SCV files in HDFS as follows:
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
 hdfs dfs -put /home/soam/sampleJoinTarget.csv 
 /user/soam/HiveSerdeIssue/sampleJoinTarget/
 
 create the tables in hive:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 create external table sampleCSV (md5hash string, filepath string)
 row format serde 'com.bizo.hive.serde.csv.CSVSerde'
 stored as textfile
 location '/user/soam/HiveSerdeIssue/sampleCSV/'
 ;
 create external table sampleJoinTarget (md5hash string, filepath string, 
 datestamp string, nblines string, nberrors string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
 ;
 ===
 Now, try the following JOIN:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 —
 This will fail with the error:
 Execution log at: /tmp/soam/.log
 java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
 Continuing ...
 2014-03-11 10:35:03 Starting to launch local task to process map join; 
 maximum memory = 238551040
 Execution failed with exit status: 2
 Obtaining error information
 Task failed!
 Task ID:
 Stage-4
 Logs:
 /var/log/hive/soam/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
 Try the following LEFT JOIN. This will work:
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 LEFT JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6670) ClassNotFound with Serde

2014-03-25 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947077#comment-13947077
 ] 

Abin Shahab commented on HIVE-6670:
---

[~hashutosh] I can write a test case. Is there a similar testcase that I can 
look at?
I'm not sure how to create a ReviewBoard entry. It'd be great if you can do 
that once I upload the test.


 ClassNotFound with Serde
 

 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch


 We are finding a ClassNotFound exception when we use 
 CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
 This is happening because MapredLocalTask does not pass the local added jars 
 to ExecDriver when that is launched.
 ExecDriver's classpath does not include the added jars. Therefore, when the 
 plan is deserialized, it throws a ClassNotFoundException in the 
 deserialization code, and results in a TableDesc object with a Null 
 DeserializerClass.
 This results in an NPE during Fetch.
 Steps to reproduce:
 wget 
 https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
  into somewhere local eg. 
 /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
 Place some sample SCV files in HDFS as follows:
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
 hdfs dfs -put /home/soam/sampleJoinTarget.csv 
 /user/soam/HiveSerdeIssue/sampleJoinTarget/
 
 create the tables in hive:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 create external table sampleCSV (md5hash string, filepath string)
 row format serde 'com.bizo.hive.serde.csv.CSVSerde'
 stored as textfile
 location '/user/soam/HiveSerdeIssue/sampleCSV/'
 ;
 create external table sampleJoinTarget (md5hash string, filepath string, 
 datestamp string, nblines string, nberrors string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
 ;
 ===
 Now, try the following JOIN:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 —
 This will fail with the error:
 Execution log at: /tmp/soam/.log
 java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
 Continuing ...
 2014-03-11 10:35:03 Starting to launch local task to process map join; 
 maximum memory = 238551040
 Execution failed with exit status: 2
 Obtaining error information
 Task failed!
 Task ID:
 Stage-4
 Logs:
 /var/log/hive/soam/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
 Try the following LEFT JOIN. This will work:
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 LEFT JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6670) ClassNotFound with Serde

2014-03-25 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947420#comment-13947420
 ] 

Abin Shahab commented on HIVE-6670:
---

But I don't want to overwrite existing added jars.



 ClassNotFound with Serde
 

 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch


 We are finding a ClassNotFound exception when we use 
 CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
 This is happening because MapredLocalTask does not pass the local added jars 
 to ExecDriver when that is launched.
 ExecDriver's classpath does not include the added jars. Therefore, when the 
 plan is deserialized, it throws a ClassNotFoundException in the 
 deserialization code, and results in a TableDesc object with a Null 
 DeserializerClass.
 This results in an NPE during Fetch.
 Steps to reproduce:
 wget 
 https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
  into somewhere local eg. 
 /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
 Place some sample SCV files in HDFS as follows:
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
 hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
 hdfs dfs -put /home/soam/sampleJoinTarget.csv 
 /user/soam/HiveSerdeIssue/sampleJoinTarget/
 
 create the tables in hive:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 create external table sampleCSV (md5hash string, filepath string)
 row format serde 'com.bizo.hive.serde.csv.CSVSerde'
 stored as textfile
 location '/user/soam/HiveSerdeIssue/sampleCSV/'
 ;
 create external table sampleJoinTarget (md5hash string, filepath string, 
 datestamp string, nblines string, nberrors string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
 ;
 ===
 Now, try the following JOIN:
 ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 —
 This will fail with the error:
 Execution log at: /tmp/soam/.log
 java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
 Continuing ...
 2014-03-11 10:35:03 Starting to launch local task to process map join; 
 maximum memory = 238551040
 Execution failed with exit status: 2
 Obtaining error information
 Task failed!
 Task ID:
 Stage-4
 Logs:
 /var/log/hive/soam/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
 Try the following LEFT JOIN. This will work:
 SELECT 
 sampleCSV.md5hash, 
 sampleCSV.filepath 
 FROM sampleCSV
 LEFT JOIN sampleJoinTarget
 ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
 ;
 ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6670) ClassNotFound with Serde

2014-03-14 Thread Abin Shahab (JIRA)
Abin Shahab created HIVE-6670:
-

 Summary: ClassNotFound with Serde
 Key: HIVE-6670
 URL: https://issues.apache.org/jira/browse/HIVE-6670
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Abin Shahab


We are finding a ClassNotFound exception when we use 
CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
This is happening because MapredLocalTask does not pass the local added jars to 
ExecDriver when that is launched.
ExecDriver's classpath does not include the added jars. Therefore, when the 
plan is deserialized, it throws a ClassNotFoundException in the deserialization 
code, and results in a TableDesc object with a Null DeserializerClass.
This results in an NPE during Fetch.
Steps to reproduce:
wget 
https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
 into somewhere local eg. 
/home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
Place the sample files attached to this ticket in HDFS as follows:
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
hdfs dfs -put /home/soam/sampleJoinTarget.csv 
/user/soam/HiveSerdeIssue/sampleJoinTarget/

create the tables in hive (this might cause a problem in dogfood since i've 
already created tables in those names, so you'll have to change the table names 
or delete mine):
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
create external table sampleCSV (md5hash string, filepath string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile
location '/user/soam/HiveSerdeIssue/sampleCSV/'
;
create external table sampleJoinTarget (md5hash string, filepath string, 
datestamp string, nblines string, nberrors string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
;
===
Now, try the following JOIN:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
—
This will fail with the error:
Execution log at: /tmp/soam/.log
java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
Continuing ...
2014-03-11 10:35:03 Starting to launch local task to process map join; maximum 
memory = 238551040
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-4
Logs:
/var/log/hive/soam/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Try the following LEFT JOIN. This will work:
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
LEFT JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
==



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6670) ClassNotFound with Serde

2014-03-14 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-6670:
--

Description: 
We are finding a ClassNotFound exception when we use 
CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
This is happening because MapredLocalTask does not pass the local added jars to 
ExecDriver when that is launched.
ExecDriver's classpath does not include the added jars. Therefore, when the 
plan is deserialized, it throws a ClassNotFoundException in the deserialization 
code, and results in a TableDesc object with a Null DeserializerClass.
This results in an NPE during Fetch.
Steps to reproduce:
wget 
https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
 into somewhere local eg. 
/home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
Place some sample SCV files in HDFS as follows:
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
hdfs dfs -put /home/soam/sampleJoinTarget.csv 
/user/soam/HiveSerdeIssue/sampleJoinTarget/

create the tables in hive:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
create external table sampleCSV (md5hash string, filepath string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile
location '/user/soam/HiveSerdeIssue/sampleCSV/'
;
create external table sampleJoinTarget (md5hash string, filepath string, 
datestamp string, nblines string, nberrors string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
;
===
Now, try the following JOIN:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
—
This will fail with the error:
Execution log at: /tmp/soam/.log
java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
Continuing ...
2014-03-11 10:35:03 Starting to launch local task to process map join; maximum 
memory = 238551040
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-4
Logs:
/var/log/hive/soam/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Try the following LEFT JOIN. This will work:
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
LEFT JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
==

  was:
We are finding a ClassNotFound exception when we use 
CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
This is happening because MapredLocalTask does not pass the local added jars to 
ExecDriver when that is launched.
ExecDriver's classpath does not include the added jars. Therefore, when the 
plan is deserialized, it throws a ClassNotFoundException in the deserialization 
code, and results in a TableDesc object with a Null DeserializerClass.
This results in an NPE during Fetch.
Steps to reproduce:
wget 
https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
 into somewhere local eg. 
/home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
Place the sample files attached to this ticket in HDFS as follows:
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
hdfs dfs -put /home/soam/sampleJoinTarget.csv 
/user/soam/HiveSerdeIssue/sampleJoinTarget/

create the tables in hive (this might cause a problem in dogfood since i've 
already created tables in those names, so you'll have to change the table names 
or delete mine):
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
create external table sampleCSV (md5hash string, filepath string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile
location '/user/soam/HiveSerdeIssue/sampleCSV/'
;
create external table sampleJoinTarget (md5hash string, filepath string, 
datestamp string, nblines string, nberrors string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
;
===
Now, try the following JOIN:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
—
This will fail with the error:
Execution log at: /tmp/soam/.log
java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
Continuing ...
2014-03-11 10:35:03 Starting to launch local task to process map join; maximum 
memory = 238551040
Execution failed with exit 

[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4

2014-02-26 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913481#comment-13913481
 ] 

Abin Shahab commented on HIVE-5112:
---

Steven, you can try compiling the orc related files with the correct version of 
protobuf(2.5.0).

 Upgrade protobuf to 2.5 from 2.4
 

 Key: HIVE-5112
 URL: https://issues.apache.org/jira/browse/HIVE-5112
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Owen O'Malley
 Fix For: 0.13.0

 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch


 Hadoop and Hbase have both upgraded protobuf. We should as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6468) HS2 out of memory error when curl sends a get request

2014-02-21 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-6468:
--

Description: 
We see an out of memory error when we run simple beeline calls.
(The hive.server2.transport.mode is binary)

curl localhost:1

Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
space
at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

  was:
We see an out of memory error when we run simple beeline calls.
(The hive.server2.transport.mode is binary)
beeline -u jdbc:hive2://localhost:1 -n user1 -d 
org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int;

Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
space
at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Summary: HS2 out of memory error when curl sends a get request  (was: 
HS2 out of memory error with Beeline)

 HS2 out of memory error when curl sends a get request
 -

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab

 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 curl localhost:1
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2014-02-19 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905687#comment-13905687
 ] 

Abin Shahab commented on HIVE-4501:
---

What is the progress on this issue?

 HS2 memory leak - FileSystem objects in FileSystem.CACHE
 

 Key: HIVE-4501
 URL: https://issues.apache.org/jira/browse/HIVE-4501
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0

 Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, 
 HIVE-4501.trunk.patch


 org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
 FileSystem.CACHE, with HS2 in unsecure mode.
 As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
 fs.file.impl.disable.cache to true.
 Users should not have to bother with this extra configuration. 
 As a workaround disable impersonation by setting hive.server2.enable.doAs to 
 false.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6468) HS2 out of memory error with Beeline

2014-02-19 Thread Abin Shahab (JIRA)
Abin Shahab created HIVE-6468:
-

 Summary: HS2 out of memory error with Beeline
 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab


We see an out of memory error when we run simple beeline calls.
(The hive.server2.transport.mode is binary)
beeline -u jdbc:hive2://localhost:1 -n user1 -d 
org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int;

Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
space
at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline

2014-02-19 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906417#comment-13906417
 ] 

Abin Shahab commented on HIVE-6468:
---

I am suspecting that too. But which component versions must match?



 HS2 out of memory error with Beeline
 

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab

 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 beeline -u jdbc:hive2://localhost:1 -n user1 -d 
 org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int;
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline

2014-02-19 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906510#comment-13906510
 ] 

Abin Shahab commented on HIVE-6468:
---

Hmm, I'm seeing: libfb303-0.9.0.jar,  libthrift-0.9.0.jar,
and hive-service-0.12.0.jar
Are these not correct?





 HS2 out of memory error with Beeline
 

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab

 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 beeline -u jdbc:hive2://localhost:1 -n user1 -d 
 org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int;
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6468) HS2 out of memory error with Beeline

2014-02-19 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906649#comment-13906649
 ] 

Abin Shahab commented on HIVE-6468:
---

This is how we build hive:
export HADOOP_VERSION=2.2.0
ant clean package tar -Dhadoop.version=${HADOOP_VERSION} 
-Dhadoop-0.23.version=${HADOOP_VERSION} -Dhadoop.mr.rev=23 
-Dmvn.hadoop.profile=hadoop23 -Dhadoop23.version=${HADOOP_VERSION} 

 HS2 out of memory error with Beeline
 

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab

 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 beeline -u jdbc:hive2://localhost:1 -n user1 -d 
 org.apache.hive.jdbc.HiveDriver -e create table test1 (id) int;
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4

2014-02-03 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890348#comment-13890348
 ] 

Abin Shahab commented on HIVE-5112:
---

Hi All,
I notice the following exception when we try to use hive-0.12 orcfile format 
with hadoop-2.2. This goes away when we use it with hadoop-2.0.5. My hunch is 
that this is caused by the protobuf-2.4.1 code in hive-0.12.
Should the bug be reopened?
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.UnsupportedOperationException: This is supposed to be 
overridden by subclasses.
at 
com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSerializedSize(OrcProto.java:7281)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.getSerializedSize(OrcProto.java:9054)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.writeTo(OrcProto.java:9007)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeFooter(WriterImpl.java:1804)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1869)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)


 Upgrade protobuf to 2.5 from 2.4
 

 Key: HIVE-5112
 URL: https://issues.apache.org/jira/browse/HIVE-5112
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Owen O'Malley
 Fix For: 0.13.0

 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch


 Hadoop and Hbase have both upgraded protobuf. We should as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-19 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-5016:
--

Reproduced In: 0.10.0
   Status: Patch Available  (was: Open)

The patch changes 1 line in JobSubmitter, submitting the full classpath to the 
DistributedCache instead of the truncated path. 

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical
 Attachments: HIVE-5016.patch


 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 

[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-19 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-5016:
--

Attachment: HIVE-5016.patch

Changes the classpaths submitted by the JobSubmitter to full path.

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical
 Attachments: HIVE-5016.patch


 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
   at 

[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-19 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-5016:
--

Status: Open  (was: Patch Available)

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical
 Attachments: HIVE-5016.patch


 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
   at 

[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-19 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-5016:
--

Release Note: This change will allow hive to operate in local mode.
  Status: Patch Available  (was: Open)

This changes the classpath submitted by the jobsubmitter to the distributed 
cache. The old path did not include the protocol. The new path includes the 
protocol. This allows the classpath to be resolved for local mode.

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical
 Attachments: HIVE-5016.patch


 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at 

[jira] [Updated] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-19 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-5016:
--

Attachment: HIVE-5016.patch

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical
 Attachments: HIVE-5016.patch, HIVE-5016.patch


 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
   at 

[jira] [Commented] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-15 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741219#comment-13741219
 ] 

Abin Shahab commented on HIVE-5016:
---

Root cause of this issue is:
The classpath for jars in local mode points to a real file on disk. However, 
the JobSubmitter was cutting off the protocol part of the path. By default 
DistributedCache assumes that a protocol-less file is from HDFS, and that was 
causing the FileNotFound exception.
The solution is to the entire path to the DistributedCache, which allowed 
DistributedCache to find it in the file system.

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical

 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at 

[jira] [Commented] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-08 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733660#comment-13733660
 ] 

Abin Shahab commented on HIVE-5016:
---

Root cause of this issue is Not picking the YarnRunner as the job runner when 
hive.exec.mode.local.auto=true
mapreduce.framework.name gets set to 'local' instead of 'yarn'. This results in 
the LocalJobRunner being used as the JobRunner, and messes up the path 
creation. 

 Local mode FileNotFoundException: File does not exist
 -

 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
 Hadoop 2.0.2-alpha
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Hive libs:
 ls -1 lib/
 antlr-2.7.7.jar
 antlr-runtime-3.0.1.jar
 avro-1.7.1.jar
 avro-mapred-1.7.1.jar
 commons-cli-1.2.jar
 commons-codec-1.4.jar
 commons-collections-3.2.1.jar
 commons-compress-1.4.1.jar
 commons-configuration-1.6.jar
 commons-dbcp-1.4.jar
 commons-lang-2.4.jar
 commons-logging-1.0.4.jar
 commons-logging-api-1.0.4.jar
 commons-pool-1.5.4.jar
 datanucleus-connectionpool-2.0.3.jar
 datanucleus-core-2.0.3.jar
 datanucleus-enhancer-2.0.3.jar
 datanucleus-rdbms-2.0.3.jar
 derby-10.4.2.0.jar
 guava-r09.jar
 hbase-0.92.0.jar
 hbase-0.92.0-tests.jar
 hive-builtins-0.10.0.jar
 hive-cli-0.10.0.jar
 hive-common-0.10.0.jar
 hive-contrib-0.10.0.jar
 hive-exec-0.10.0.jar
 hive-hbase-handler-0.10.0.jar
 hive-hwi-0.10.0.jar
 hive-hwi-0.10.0.war
 hive-jdbc-0.10.0.jar
 hive-metastore-0.10.0.jar
 hive-pdk-0.10.0.jar
 hive-serde-0.10.0.jar
 hive-service-0.10.0.jar
 hive-shims-0.10.0.jar
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 JavaEWAH-0.3.2.jar
 javolution-5.5.1.jar
 jdo2-api-2.3-ec.jar
 jetty-6.1.26.jar
 jetty-util-6.1.26.jar
 jline-0.9.94.jar
 json-20090211.jar
 libfb303-0.9.0.jar
 libthrift-0.9.0.jar
 log4j-1.2.16.jar
 php
 py
 servlet-api-2.5-20081211.jar
 slf4j-api-1.6.1.jar
 slf4j-log4j12-1.6.1.jar
 sqlline-1_0_2.jar
 stringtemplate-3.1-b1.jar
 xz-1.0.jar
 zookeeper-3.4.3.jar
Reporter: Abin Shahab
Priority: Critical

 Hive jobs in local mode fail with the error posted below. The jar file that's 
 not being found exists and has the following access:
  ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
 rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
 hive-0.10.0/lib/hive-builtins-0.10.0.jar
 Steps to reproduce:
 hive set hive.exec.mode.local.auto=true;
 hive set hive.exec.mode.local.auto;
 hive.exec.mode.local.auto=true
 hive select count(*) from abin_test_table;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 13/08/06 21:37:11 WARN conf.Configuration: 
 file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Execution log at: 
 /tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
 java.io.FileNotFoundException: File does not exist: 
 /home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at 

[jira] [Commented] (HIVE-4881) hive local mode: java.io.FileNotFoundException: emptyFile

2013-08-08 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733895#comment-13733895
 ] 

Abin Shahab commented on HIVE-4881:
---

I find that this bug only happens when you have an empty table.

 hive local mode: java.io.FileNotFoundException: emptyFile
 -

 Key: HIVE-4881
 URL: https://issues.apache.org/jira/browse/HIVE-4881
 Project: Hive
  Issue Type: Bug
 Environment: hive 0.9.0+158-1.cdh4.1.3.p0.23~squeeze-cdh4.1.3
Reporter: Bartosz Cisek
Priority: Critical

 Our hive jobs fail due to strange error pasted below. Strace showed that 
 process created this file, accessed it a few times and then it throwed 
 exception that it couldn't find file it just accessed. In next step it 
 unliked it. Yay.
 Very similar problem was reported [in already closed 
 task|https://issues.apache.org/jira/browse/HIVE-1633?focusedCommentId=13598983page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13598983]
  or left unresolved on [mailing 
 lists|http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3c94f02eb368b740ebbcd94df4d5d1d...@amxpr03mb054.eurprd03.prod.outlook.com%3E].
 I'll be happy to provide required additional details. 
 {code:title=Stack trace}
 2013-07-18 12:49:46,109 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException 
 as:username (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not 
 exist: 
 /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile
 2013-07-18 12:49:46,113 ERROR exec.ExecDriver 
 (SessionState.java:printError(403)) - Job Submission failed with exception 
 'java.io.FileNotFoundException(File does not exist: 
 /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile)'
 java.io.FileNotFoundException: File does not exist: 
 /tmp/username/hive_2013-07-18_12-49-45_218_605775464480014480/-mr-1/1/emptyFile
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
 at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:462)
 at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
 at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1040)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1032)
 at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:895)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:895)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:869)
 at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 {code}
 {code:title=strace with grep emptyFile}
 7385  14:48:02.808096 
 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile,
  {st_mode=S_IFREG|0755, st_size=0, ...}) = 0
 7385  14:48:02.808201 
 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile,
  {st_mode=S_IFREG|0755, st_size=0, ...}) = 0
 7385  14:48:02.808277 
 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile,
  {st_mode=S_IFREG|0755, st_size=0, ...}) = 0
 7385  14:48:02.808348 
 stat(/tmp/username/hive_2013-07-18_14-48-00_700_8005967322498387476/-mr-1/1/emptyFile,
  {st_mode=S_IFREG|0755, 

[jira] [Created] (HIVE-5016) Local mode FileNotFoundException: File does not exist

2013-08-06 Thread Abin Shahab (JIRA)
Abin Shahab created HIVE-5016:
-

 Summary: Local mode FileNotFoundException: File does not exist
 Key: HIVE-5016
 URL: https://issues.apache.org/jira/browse/HIVE-5016
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: Centos 6.3 (final)
Hadoop 2.0.2-alpha
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)

Hive libs:
ls -1 lib/
antlr-2.7.7.jar
antlr-runtime-3.0.1.jar
avro-1.7.1.jar
avro-mapred-1.7.1.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-dbcp-1.4.jar
commons-lang-2.4.jar
commons-logging-1.0.4.jar
commons-logging-api-1.0.4.jar
commons-pool-1.5.4.jar
datanucleus-connectionpool-2.0.3.jar
datanucleus-core-2.0.3.jar
datanucleus-enhancer-2.0.3.jar
datanucleus-rdbms-2.0.3.jar
derby-10.4.2.0.jar
guava-r09.jar
hbase-0.92.0.jar
hbase-0.92.0-tests.jar
hive-builtins-0.10.0.jar
hive-cli-0.10.0.jar
hive-common-0.10.0.jar
hive-contrib-0.10.0.jar
hive-exec-0.10.0.jar
hive-hbase-handler-0.10.0.jar
hive-hwi-0.10.0.jar
hive-hwi-0.10.0.war
hive-jdbc-0.10.0.jar
hive-metastore-0.10.0.jar
hive-pdk-0.10.0.jar
hive-serde-0.10.0.jar
hive-service-0.10.0.jar
hive-shims-0.10.0.jar
jackson-core-asl-1.8.8.jar
jackson-jaxrs-1.8.8.jar
jackson-mapper-asl-1.8.8.jar
jackson-xc-1.8.8.jar
JavaEWAH-0.3.2.jar
javolution-5.5.1.jar
jdo2-api-2.3-ec.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
json-20090211.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.16.jar
php
py
servlet-api-2.5-20081211.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
sqlline-1_0_2.jar
stringtemplate-3.1-b1.jar
xz-1.0.jar
zookeeper-3.4.3.jar

Reporter: Abin Shahab
Priority: Critical


Hive jobs in local mode fail with the error posted below. The jar file that's 
not being found exists and has the following access:
 ls -l hive-0.10.0/lib/hive-builtins-0.10.0.jar
rw-rw-r-- 1 ashahab ashahab 3914 Dec 18  2012 
hive-0.10.0/lib/hive-builtins-0.10.0.jar

Steps to reproduce:

hive set hive.exec.mode.local.auto=true;
hive set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=true
hive select count(*) from abin_test_table;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
13/08/06 21:37:11 WARN conf.Configuration: 
file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
13/08/06 21:37:11 WARN conf.Configuration: 
file:/tmp/ashahab/hive_2013-08-06_21-37-09_046_3263640403676309186/-local-10002/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Execution log at: 
/tmp/ashahab/ashahab_20130806213737_7d26b796-5f55-44ca-a755-8898153d963b.log
java.io.FileNotFoundException: File does not exist: 
/home/ashahab/dev/hive-0.10.0/lib/hive-builtins-0.10.0.jar
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)