as Akex suggested, Please use row format in your query like CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' and give it a try
On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <[email protected]> wrote: > 106,"2003-02-03",20,2,"A","2","2","037" > 106,"2003-02-03",20,3,"A","2","2","037" > 106,"2003-02-03",8,2,"A","2","2","037" > > > > > > > > > > > Thanks > Jay > > On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <[email protected]> > wrote: > >> Jay can you give first 3 lines of your gz file >> >> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <[email protected]> >> wrote: >> >>> Alex, >>> >>> >>> I followed the same steps as mentioned in the site. Once I load data >>> into table which is create below >>> >>> >>> >>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY >>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint) >>> STORED AS TEXTFILE; >>> >>> and loaded it with data. >>> >>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE >>> raw; >>> >>> >>> >>> when I say select * from raw it shows all null values. >>> >>> >>> NULLNULLNULLNULLNULLNULLNULLNULL >>> NULLNULLNULLNULLNULLNULLNULLNULL >>> NULLNULLNULLNULLNULLNULLNULLNULL >>> NULLNULLNULLNULLNULLNULLNULLNULL >>> Why is not show showing the actual data in file. will it show once I >>> load it to parque table? >>> >>> Please let me know if I am doing anything wrong. >>> >>> I appreciate your help. >>> >>> >>> Thanks >>> jay >>> >>> >>> >>> Thank you very much for you help Alex, >>> >>> >>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov < >>> [email protected]> wrote: >>> >>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and >>>> specify CSVSerde >>>> >>>> if using hive-0.14 and newer use this >>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde >>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde >>>> >>>> You do not even need to unzgip the file. hive automatically unzgip data >>>> on select. >>>> >>>> 2. run simple query to load data >>>> insert overwrite table <orc_table> >>>> select * from <csv_table> >>>> >>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <[email protected]> >>>> wrote: >>>> >>>>> Hello All, >>>>> >>>>> >>>>> I have this table >>>>> >>>>> >>>>> CREATE TABLE DBCLOC( >>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse', >>>>> BLsdat string COMMENT 'DATE Sales Date', >>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#', >>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#', >>>>> BLscnr string COMMENT 'CHAR(1) Scenario', >>>>> BLareq string COMMENT 'CHAR(1) Act Requested', >>>>> BLatak string COMMENT 'CHAR(1) Act Taken', >>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code') >>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint) >>>>> STORED AS PARQUET; >>>>> >>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to >>>>> the table above >>>>> >>>>> >>>>> Can any one tell me what is the most efficient way of doing it. >>>>> >>>>> >>>>> Thanks >>>>> Jay >>>>> >>>> >>>> >>> >> >> >> -- >> Nitin Pawar >> > > -- Nitin Pawar
