Yes Manish, Zip is not supported in hadoop. You may have to use gzip 
instead.Regards
Bejoy KS

Subject: RE: zip file or tar file cosumption
From: manishbh...@rocketmail.com
To: user@hive.apache.org
CC: chuck.conn...@nuance.com
Date: Sun, 30 Sep 2012 20:35:35 +0530




  
  


Thanks Bejoy. I have zip file there is sense to convert into gzip again.



Chuck, I got what you are trying to say. So I need to process it outside HDFS 
and bring the text file into HDFS.





On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: 

    Hi Manish

    

    Gzip works well if you have the compression codec available in 
'io.compression.codes' . Gzip codec is present in default.

    

    I don't think untar ing world be done by map reduce jobs. So tar files may 
not work with hive, you need to untar the files out of hadoop hive as a 
prerequisite.


    

    



    Regards


    Bejoy KS

    



    



    



    To: user@hive.apache.org; keshav.c.sav...@fisglobal.com

    Subject: Re: zip file or tar file cosumption

    From: manishbh...@rocketmail.com

    Date: Sun, 30 Sep 2012 12:32:15 +0000

    

    What about .gz OR tar file. Does this unzip require at HDFS and load into 
hive? How you resolve it.

    



    Sent from my BlackBerry, pls excuse typo


    




    From: "Connell, Chuck" <chuck.conn...@nuance.com>


    Date: Sun, 30 Sep 2012 12:24:37 +0000


    To: user@hive.apache.org<user@hive.apache.org>; Savant, 
Keshav<keshav.c.sav...@fisglobal.com>


    ReplyTo: user@hive.apache.org


    Subject: RE: zip file or tar file cosumption


    

    



    I have seen that error when I try to overwrite an existing file. 

    

    But, more importantly, Hive cannot understand ZIP files. There was a long 
thread about this just a few days ago. Your table def says "stored as textfile" 
but you are not giving it a text file.

    

    Chuck

    

    



    




    From: Manish [manishbh...@rocketmail.com]

    Sent: Sunday, September 30, 2012 7:38 AM

    To: Savant, Keshav

    Cc: user@hive.apache.org

    Subject: RE: zip file or tar file cosumption

    

    



    



    

    I am getting below error when loading zip file 
Driver returned: 9.  Errors: Hive history 
file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: 
hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: 
/user/manish/input/zip
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' 
OVERWRITE INTO TABLE `pageview_zip`

Table definition: 
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY 
';' MAP KEYS TERMINATED BY '=' 
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish

    

    

    On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 

    
        True Manish.

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Manish.Bhoge [mailto:manish.bh...@target.com] 

        Sent: Thursday, September 27, 2012 4:26 PM

        To: user@hive.apache.org; manishbh...@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Thanks Savant. I believe this will hold good for .zip file also.

        

         

        

        Thank You,

        

        Manish.

        

         

        

        From: Savant, Keshav [mailto:keshav.c.sav...@fisglobal.com] 

        Sent: Thursday, September 27, 2012 10:19 AM

        To: user@hive.apache.org; manishbh...@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Manish the table that has been created for zipped text files should be 
defined as sequence file, for example

        

         

        

        CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' stored as sequencefile;

        

         

        

        After this you can use regular load command to load these files, for 
example

        

         

        

        load data local inpath 'path-to-csv-file.gz' into table my_table_zip;

        

         

        

        hope this helps

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Manish Bhoge [mailto:manishbh...@rocketmail.com] 

        Sent: Wednesday, September 26, 2012 9:43 PM

        To: user@hive.apache.org

        Subject: Re: zip file or tar file cosumption

        

        

         

        

        Hi Richin,

        

        Thanks! Yes this is what I wanted to understand how to load zip file to 
Hive table. Now, I'll try this option.

        

        Thank You,

        Manish. 

        

        Sent from my BlackBerry, pls excuse typo

        

        

    


    
        


    


    
        From:<richin.j...@nokia.com> 

        

        

        Date:Wed, 26 Sep 2012 14:51:39 +0000

        

        

        To:<user@hive.apache.org>

        

        

        ReplyTo:user@hive.apache.org 

        

        

        Subject:RE: zip file or tar file cosumption

        

        

         

        

        

        You are right Chuck. I thought his question was how to use zip files or 
any compressed files in Hive tables.

        

         

        

        Yeah, seems like you can’t do that 
see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E

        

        But you can always compress your files in gzip format and they should 
be good to go.

        

         

        

        Richin

        

         

        

        From: ext Connell, Chuck [mailto:chuck.conn...@nuance.com] 

        Sent: Wednesday, September 26, 2012 10:44 AM

        To: user@hive.apache.org

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        But TEXTFILE in Hive always has newline as the record delimiter. How 
could this possibly work with a zip/tar file that can contain ASCII 10 
characters at random locations, and certainly does not have ASCII 10 at the end 
of each data record?

        

         

        

        Chuck Connell

        

        Nuance R&D Data Team

        

        Burlington, MA

        

         

        

         

        

        

        From:richin.j...@nokia.com [mailto:richin.j...@nokia.com] 

        Sent: Wednesday, September 26, 2012 10:14 AM

        To: user@hive.apache.org; manishbh...@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Hi Manish,

        

         

        

        If you have your zip file at location -  /home/manish/zipfile, you can 
just point your external table to that location like

        

        CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW 
FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS 
TEXTFILE LOCATION ‘/home/manish/zipfile’;

        

         

        

        OR

        

         

        

        If you already have external table pointing to a certain location you 
can load this zip file into your table as

        

        LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;

        

         

        

        Hope this helps.

        

         

        

        Richin

        

         

        

        From: ext Manish Bhoge [mailto:manishbh...@rocketmail.com] 

        Sent: Wednesday, September 26, 2012 9:13 AM

        To: user@hive.apache.org

        Subject: Re: zip file or tar file cosumption

        

        

         

        

        Hi Savant,

        

        Got it. But I still need to understand that how to load zip? Can I 
directly use zip file in external table. can u pls help to get the load 
statement.

        

        Sent from my BlackBerry, pls excuse typo

        

        

    


    
        


    


    
        From:"Savant, Keshav" <keshav.c.sav...@fisglobal.com>

        

        

        Date:Wed, 26 Sep 2012 12:25:38 +0000

        

        

        To:user@hive.apache.org<user@hive.apache.org>

        

        

        ReplyTo:user@hive.apache.org

        

        

        Cc:manish.bh...@target.com<manish.bh...@target.com>; 
chuck.conn...@nuance.com<chuck.conn...@nuance.com>

        

        

        Subject:RE: zip file or tar file cosumption

        

        

         

        

        

        Another solution would be

        

         

        

        Using shell script do following

        

        1.      unzip txt files, 

        

        2.      one by one merge those 50 (or N number of) text files into one 
text file,

        

        3.      then the zip/tar that bigger text file,

        

        4.      then that big zip/tar file can be uploaded into hive.

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Connell, Chuck [mailto:chuck.conn...@nuance.com] 

        Sent: Wednesday, September 26, 2012 4:04 PM

        To: user@hive.apache.org

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        This could be a problem. Hive uses newline as the record separator. A 
ZIP file will certainly newline characters. So I doubt this is possible.

        

        BUT, I would like to hear from anyone who has solved the "newline is 
always a record separator" problem, because we ran into it for another type of 
compressed file.

        

        Chuck

        

    


    
        


    


    
        From: Manish.Bhoge [manish.bh...@target.com]

        Sent: Wednesday, September 26, 2012 3:17 AM

        To: user@hive.apache.org

        Subject: zip file or tar file cosumption

        

        

        Hivers,

        

         

        

        I want to understand that would it be possible to utilize zip/tar files 
directly into Hive. All the files has similar schema (structure).  Say 50 *.txt 
files are zipped into a single zip file can we load data directly from this zip 
file OR should we need to unzip first?

        

         

        

        Thanks & Regards

        

        Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) 
Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite

        

         

        

        

        _____________

        The information contained in this message is proprietary and/or 
confidential. If you are not the intended recipient, please: (i) delete the 
message and all copies; (ii) do not disclose, distribute or use the message in 
any manner; and (iii) notify the sender immediately. In addition, please be 
aware that any message addressed to our domain is subject to archiving and 
review by persons other than the intended recipient. Thank you.

        

        

        _____________

        The information contained in this message is proprietary and/or 
confidential. If you are not the intended recipient, please: (i) delete the 
message and all copies; (ii) do not disclose, distribute or use the message in 
any manner; and (iii) notify the sender immediately. In addition, please be 
aware that any message addressed to our domain is subject to archiving and 
review by persons other than the intended recipient. Thank you.

        

        

        _____________

        The information contained in this message is proprietary and/or 
confidential. If you are not the intended recipient, please: (i) delete the 
message and all copies; (ii) do not disclose, distribute or use the message in 
any manner; and (iii) notify the sender immediately. In addition, please be 
aware that any message addressed to our domain is subject to archiving and 
review by persons other than the intended recipient. Thank you.

        

    
    

    

    



                                          

Reply via email to