aha! All's well that ends well then! :)
On Thu, Jun 6, 2013 at 9:49 AM, Sachin Sudarshana <sachin.had...@gmail.com>wrote: > Hi Stephen, > > Thank you for your reply. > > But, its the silliest error from my side. Its a typo! > > The codec is : org.apache.hadoop.io.compress.*GzipCodec* and not > org.apache.hadoop.io.compress.*GZipCodec.* > * > * > I regret making that mistake. > > Thank you, > Sachin > > > On Thu, Jun 6, 2013 at 10:07 PM, Stephen Sprague <sprag...@gmail.com>wrote: > >> Hi Sachin, >> LIke you say looks like something to do with the GZipCodec all right. And >> that would make sense given your original problem. >> >> Yeah, one would think it'd be in there by default but for whatever reason >> its not finding it but at least the problem is now identified. >> >> Now _my guess_ is that maybe your hadoop core-site.xml file might need to >> list the codecs available under the property name: >> "io.compression.codecs". Can you chase that up as a possibility and let us >> know what you find out? >> >> >> >> >> On Thu, Jun 6, 2013 at 4:02 AM, Sachin Sudarshana < >> sachin.had...@gmail.com> wrote: >> >>> Hi Stephen, >>> >>> *hive> show create table facts520_normal_text;* >>> *OK* >>> *CREATE TABLE facts520_normal_text(* >>> * fact_key bigint,* >>> * products_key int,* >>> * retailers_key int,* >>> * suppliers_key int,* >>> * time_key int,* >>> * units int)* >>> *ROW FORMAT DELIMITED* >>> * FIELDS TERMINATED BY ','* >>> * LINES TERMINATED BY '\n'* >>> *STORED AS INPUTFORMAT* >>> * 'org.apache.hadoop.mapred.TextInputFormat'* >>> *OUTPUTFORMAT* >>> * 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'* >>> *LOCATION* >>> * 'hdfs:// >>> aana1.ird.com/user/hive/warehouse/facts_520.db/facts520_normal_text'* >>> *TBLPROPERTIES (* >>> * 'numPartitions'='0',* >>> * 'numFiles'='1',* >>> * 'transient_lastDdlTime'='1369395430',* >>> * 'numRows'='0',* >>> * 'totalSize'='545216508',* >>> * 'rawDataSize'='0')* >>> *Time taken: 0.353 seconds* >>> >>> >>> The syserror log shows this: >>> >>> *java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:543) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >>> * at >>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >>> * at >>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >>> * at >>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546) >>> * >>> * at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)* >>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)* >>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>> * at java.security.AccessController.doPrivileged(Native Method)* >>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>> * at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> * >>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>> *Caused by: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.io.compress.GZipCodec not found* >>> * at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >>> * >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >>> * >>> * ... 21 more* >>> *java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)* >>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>> * at java.security.AccessController.doPrivileged(Native Method)* >>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>> * at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> * >>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>> *Caused by: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.io.compress.GZipCodec not found* >>> * at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >>> * >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >>> * >>> * ... 16 more* >>> *org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)* >>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>> * at java.security.AccessController.doPrivileged(Native Method)* >>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>> * at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> * >>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>> *Caused by: java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >>> * >>> * ... 14 more* >>> *Caused by: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.io.compress.GZipCodec not found* >>> * at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >>> * >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >>> * >>> * ... 16 more* >>> *org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)* >>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>> * at java.security.AccessController.doPrivileged(Native Method)* >>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>> * at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> * >>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>> *Caused by: java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >>> * >>> * ... 14 more* >>> *Caused by: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.io.compress.GZipCodec not found* >>> * at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >>> * >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >>> * >>> * ... 16 more* >>> *org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >>> * >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >>> * at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)* >>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>> * at java.security.AccessController.doPrivileged(Native Method)* >>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>> * at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> * >>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>> *Caused by: java.lang.IllegalArgumentException: Compression codec >>> org.apache.hadoop.io.compress.GZipCodec was not found.* >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >>> * >>> * at >>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >>> * >>> * ... 14 more* >>> *Caused by: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.io.compress.GZipCodec not found* >>> * at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >>> * >>> * at >>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >>> * >>> * ... 16 more* >>> >>> It says that GZipCodec is not found. >>> Isn't Snappy,GZip and BZip codecs available on Hadoop by default? >>> >>> Thank you, >>> Sachin >>> >>> >>> >>> >>> >>> On Wed, Jun 5, 2013 at 11:58 PM, Stephen Sprague <sprag...@gmail.com>wrote: >>> >>>> well... the hiveException has the word "metadata" in it. maybe >>>> that's a hint or a red-herrring. :) Let's try the following: >>>> >>>> 1. show create table * facts520_normal_text; >>>> >>>> * >>>> *2. anything useful at this URL? ** >>>> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002or >>>> is it just the same stack dump? >>>> >>>> >>>> * >>>> >>>> >>>> On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana < >>>> sachin.had...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster. >>>>> >>>>> I have a table facts520_normal_text stored as a textfile. I'm trying >>>>> to create a compressed table from this table using GZip codec. >>>>> >>>>> *hive> SET hive.exec.compress.output=true;* >>>>> *hive> SET >>>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec; >>>>> * >>>>> *hive> SET mapred.output.compression.type=BLOCK;* >>>>> * >>>>> * >>>>> *hive>* >>>>> * > Create table facts520_gzip_text* >>>>> * > (fact_key BIGINT,* >>>>> * > products_key INT,* >>>>> * > retailers_key INT,* >>>>> * > suppliers_key INT,* >>>>> * > time_key INT,* >>>>> * > units INT)* >>>>> * > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','* >>>>> * > LINES TERMINATED BY '\n'* >>>>> * > STORED AS TEXTFILE;* >>>>> * >>>>> * >>>>> *hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from >>>>> facts520_normal_text;* >>>>> >>>>> >>>>> When I run the above queries, the MR job fails. >>>>> >>>>> The error that the Hive CLI itself shows is the following: >>>>> >>>>> *Total MapReduce jobs = 3* >>>>> *Launching Job 1 out of 3* >>>>> *Number of reduce tasks is set to 0 since there's no reduce operator* >>>>> *Starting Job = job_201306051948_0010, Tracking URL = >>>>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* >>>>> *Kill Command = /usr/lib/hadoop/bin/hadoop job -kill >>>>> job_201306051948_0010* >>>>> *Hadoop job information for Stage-1: number of mappers: 3; number of >>>>> reducers: 0* >>>>> *2013-06-05 21:09:42,281 Stage-1 map = 0%, reduce = 0%* >>>>> *2013-06-05 21:10:11,446 Stage-1 map = 100%, reduce = 100%* >>>>> *Ended Job = job_201306051948_0010 with errors* >>>>> *Error during job, obtaining debugging information...* >>>>> *Job Tracking URL: >>>>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* >>>>> *Examining task ID: task_201306051948_0010_m_000004 (and more) from >>>>> job job_201306051948_0010* >>>>> *Examining task ID: task_201306051948_0010_m_000001 (and more) from >>>>> job job_201306051948_0010* >>>>> * >>>>> * >>>>> *Task with the most failures(4):* >>>>> *-----* >>>>> *Task ID:* >>>>> * task_201306051948_0010_m_000002* >>>>> * >>>>> * >>>>> *URL:* >>>>> * >>>>> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002 >>>>> * >>>>> *-----* >>>>> *Diagnostic Messages for this Task:* >>>>> *java.lang.RuntimeException: >>>>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >>>>> processing row >>>>> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} >>>>> * >>>>> * at >>>>> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)* >>>>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)* >>>>> * at >>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>>>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>>>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>>>> * at java.security.AccessController.doPrivileged(Native Method) >>>>> * >>>>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>>>> * at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>>> * >>>>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>>>> *Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive >>>>> Runtime Error while processing row >>>>> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} >>>>> * >>>>> * at org.apach* >>>>> * >>>>> * >>>>> *FAILED: Execution Error, return code 2 from >>>>> org.apache.hadoop.hive.ql.exec.MapRedTask* >>>>> *MapReduce Jobs Launched:* >>>>> *Job 0: Map: 3 HDFS Read: 0 HDFS Write: 0 FAIL* >>>>> *Total MapReduce CPU Time Spent: 0 msec* >>>>> >>>>> >>>>> I'm unable to figure out why this is happening. It looks like the data >>>>> is not being able to be copied properly. >>>>> Or is it that GZip codec is not supported on textfiles? >>>>> >>>>> Any help in this issue is greatly appreciated! >>>>> >>>>> Thank you, >>>>> Sachin >>>>> >>>>> >>>>> >>>> >>> >> >