we can write custom codecs On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <bejo...@outlook.com> wrote: > Yes Manish, Zip is not supported in hadoop. You may have to use gzip > instead. > > Regards > Bejoy KS > > > ________________________________ > Subject: RE: zip file or tar file cosumption > From: manishbh...@rocketmail.com > To: user@hive.apache.org > CC: chuck.conn...@nuance.com > Date: Sun, 30 Sep 2012 20:35:35 +0530 > > Thanks Bejoy. I have zip file there is sense to convert into gzip again. > > Chuck, I got what you are trying to say. So I need to process it outside > HDFS and bring the text file into HDFS. > > > On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: > > Hi Manish > > Gzip works well if you have the compression codec available in > 'io.compression.codes' . Gzip codec is present in default. > > I don't think untar ing world be done by map reduce jobs. So tar files may > not work with hive, you need to untar the files out of hadoop hive as a > prerequisite. > > > > Regards > > Bejoy KS > > > ________________________________ > > To: user@hive.apache.org; keshav.c.sav...@fisglobal.com > Subject: Re: zip file or tar file cosumption > From: manishbh...@rocketmail.com > Date: Sun, 30 Sep 2012 12:32:15 +0000 > > What about .gz OR tar file. Does this unzip require at HDFS and load into > hive? How you resolve it. > > Sent from my BlackBerry, pls excuse typo > > ________________________________ > > From: "Connell, Chuck" <chuck.conn...@nuance.com> > > Date: Sun, 30 Sep 2012 12:24:37 +0000 > > To: user@hive.apache.org<user@hive.apache.org>; Savant, > Keshav<keshav.c.sav...@fisglobal.com> > > ReplyTo: user@hive.apache.org > > Subject: RE: zip file or tar file cosumption > > > > I have seen that error when I try to overwrite an existing file. > > But, more importantly, Hive cannot understand ZIP files. There was a long > thread about this just a few days ago. Your table def says "stored as > textfile" but you are not giving it a text file. > > Chuck > > > ________________________________ > > From: Manish [manishbh...@rocketmail.com] > Sent: Sunday, September 30, 2012 7:38 AM > To: Savant, Keshav > Cc: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > I am getting below error when loading zip file > > Driver returned: 9. Errors: Hive history > file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt > Loading data to table default.pageview_zip > Failed with exception Error moving: > hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: > /user/manish/input/zip > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > > My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' > OVERWRITE INTO TABLE `pageview_zip` > > Table definition: > CREATE external TABLE pageview_zip > ( > C_0 STRING, > C_1 STRING, > C_7 MAP<STRING,STRING>, > C_8 STRING, > C_13 MAP<STRING,STRING>, > C_21 STRING > ) > COMMENT 'Page View' > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY > ';' MAP KEYS TERMINATED BY '=' > STORED AS TEXTFILE LOCATION '/user/manish/input/zip' > > Thank You, > Manish > > > > On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: > > True Manish. > > > > Keshav C Savant > > > > > From: Manish.Bhoge [mailto:manish.bh...@target.com] > Sent: Thursday, September 27, 2012 4:26 PM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Thanks Savant. I believe this will hold good for .zip file also. > > > > Thank You, > > Manish. > > > > From: Savant, Keshav [mailto:keshav.c.sav...@fisglobal.com] > Sent: Thursday, September 27, 2012 10:19 AM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Manish the table that has been created for zipped text files should be > defined as sequence file, for example > > > > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' stored as sequencefile; > > > > After this you can use regular load command to load these files, for example > > > > load data local inpath 'path-to-csv-file.gz' into table my_table_zip; > > > > hope this helps > > > > Keshav C Savant > > > > > From: Manish Bhoge [mailto:manishbh...@rocketmail.com] > Sent: Wednesday, September 26, 2012 9:43 PM > To: user@hive.apache.org > Subject: Re: zip file or tar file cosumption > > > > > Hi Richin, > > Thanks! Yes this is what I wanted to understand how to load zip file to Hive > table. Now, I'll try this option. > > Thank You, > Manish. > > Sent from my BlackBerry, pls excuse typo > > > ________________________________ > > From:<richin.j...@nokia.com> > > > Date:Wed, 26 Sep 2012 14:51:39 +0000 > > > To:<user@hive.apache.org> > > > ReplyTo:user@hive.apache.org > > > Subject:RE: zip file or tar file cosumption > > > > > > You are right Chuck. I thought his question was how to use zip files or any > compressed files in Hive tables. > > > > Yeah, seems like you can’t do that > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E > > But you can always compress your files in gzip format and they should be > good to go. > > > > Richin > > > > From: ext Connell, Chuck [mailto:chuck.conn...@nuance.com] > Sent: Wednesday, September 26, 2012 10:44 AM > To: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > But TEXTFILE in Hive always has newline as the record delimiter. How could > this possibly work with a zip/tar file that can contain ASCII 10 characters > at random locations, and certainly does not have ASCII 10 at the end of each > data record? > > > > Chuck Connell > > Nuance R&D Data Team > > Burlington, MA > > > > > > > From:richin.j...@nokia.com [mailto:richin.j...@nokia.com] > Sent: Wednesday, September 26, 2012 10:14 AM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Hi Manish, > > > > If you have your zip file at location - /home/manish/zipfile, you can just > point your external table to that location like > > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT > DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE > LOCATION ‘/home/manish/zipfile’; > > > > OR > > > > If you already have external table pointing to a certain location you can > load this zip file into your table as > > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test; > > > > Hope this helps. > > > > Richin > > > > From: ext Manish Bhoge [mailto:manishbh...@rocketmail.com] > Sent: Wednesday, September 26, 2012 9:13 AM > To: user@hive.apache.org > Subject: Re: zip file or tar file cosumption > > > > > Hi Savant, > > Got it. But I still need to understand that how to load zip? Can I directly > use zip file in external table. can u pls help to get the load statement. > > Sent from my BlackBerry, pls excuse typo > > > ________________________________ > > From:"Savant, Keshav" <keshav.c.sav...@fisglobal.com> > > > Date:Wed, 26 Sep 2012 12:25:38 +0000 > > > To:user@hive.apache.org<user@hive.apache.org> > > > ReplyTo:user@hive.apache.org > > > Cc:manish.bh...@target.com<manish.bh...@target.com>; > chuck.conn...@nuance.com<chuck.conn...@nuance.com> > > > Subject:RE: zip file or tar file cosumption > > > > > > Another solution would be > > > > Using shell script do following > > 1. unzip txt files, > > 2. one by one merge those 50 (or N number of) text files into one text > file, > > 3. then the zip/tar that bigger text file, > > 4. then that big zip/tar file can be uploaded into hive. > > > > Keshav C Savant > > > > > From: Connell, Chuck [mailto:chuck.conn...@nuance.com] > Sent: Wednesday, September 26, 2012 4:04 PM > To: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > This could be a problem. Hive uses newline as the record separator. A ZIP > file will certainly newline characters. So I doubt this is possible. > > BUT, I would like to hear from anyone who has solved the "newline is always > a record separator" problem, because we ran into it for another type of > compressed file. > > Chuck > > ________________________________ > > From: Manish.Bhoge [manish.bh...@target.com] > Sent: Wednesday, September 26, 2012 3:17 AM > To: user@hive.apache.org > Subject: zip file or tar file cosumption > > > Hivers, > > > > I want to understand that would it be possible to utilize zip/tar files > directly into Hive. All the files has similar schema (structure). Say 50 > *.txt files are zipped into a single zip file can we load data directly from > this zip file OR should we need to unzip first? > > > > Thanks & Regards > > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: > 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite > > > > > _____________ > The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete the > message and all copies; (ii) do not disclose, distribute or use the message > in any manner; and (iii) notify the sender immediately. In addition, please > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. > > > _____________ > The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete the > message and all copies; (ii) do not disclose, distribute or use the message > in any manner; and (iii) notify the sender immediately. In addition, please > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. > > > _____________ > The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete the > message and all copies; (ii) do not disclose, distribute or use the message > in any manner; and (iii) notify the sender immediately. In addition, please > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. > > > > >
-- Raja Thiruvathuru