Thanks Szehon. My mine is stored as a SEQUENCEFILE, not TEXTFILE. Kim
On Mon, Mar 10, 2014 at 1:25 PM, Szehon Ho <sze...@cloudera.com> wrote: > No there is no ignoring of key, you can declare a different key column if > you dont want it to be in your 'value'. Say if you want to create a table > with two fields separated by some separator (say '\t' in your case?), then > you would do: > > CREATE TABLE TEST(key INT, value STRING) ROW FORMAT DELIMITED FIELDS > TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; > > Exact DDL details are at: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL. > > Hope that helps. > Szehon > > > > > On Mon, Mar 10, 2014 at 12:38 PM, Kim Chew <kchew...@gmail.com> wrote: > >> So I have generated my input file in SequenceFile format like this >> <Length of value> <value+"\n"> >> <Length of value> is typed IntWritable >> <value+"\n"> is typed Text >> >> For example, >> 167 1105|11748184969223627771|172.31.2.71|0|sta1|... >> >> And I create my table like this, >> >> CREATE TABLE if not exists KIM_TEST_SEQ ( >> value string) >> ROW FORMAT DELIMITED LINES TERMINATED BY '\n' >> STORED AS SEQUENCEFILE; >> >> If I understand correctly, <Length of value> is the key and is ignored >> when reading and <value> is the row object, however <value> is still being >> truncated. Is my schema correct? >> >> Thanks. >> >> Kim >> >> >> On Fri, Mar 7, 2014 at 5:53 PM, Szehon Ho <sze...@cloudera.com> wrote: >> >>> Hi, did you try specifying row, field delimiter on create table ? >>> >>> Thanks >>> Szehon >>> >>> >>> >>> On Fri, Mar 7, 2014 at 5:27 PM, Kim Chew <kchew...@gmail.com> wrote: >>> >>>> I have an input file in Sequence File format which has the format, >>>> <length of value><value> >>>> which has the type <IntWritable><Text> >>>> >>>> Then I created a table, >>>> >>>> CREATE TABLE if not exists TEST ( >>>> value string) >>>> STORED AS SEQUENCEFILE; >>>> >>>> and then I load the input file to the table. However when I do a query, >>>> >>>> select value from TEST; >>>> >>>> I found that 'value is truncated. For example, the value in my input >>>> file is >>>> >>>> >>>> 14031|11748184969223627771|172.31.2.71|0|sta1|1365546305|976912181|10.196.121.204|172.26.4.10|HTTP|NURL|0|1|1|-420|1|PST|PDT| >>>> >>>> what returned from the query is, >>>> >>>> 031|11748184969223627771|172.31.2.71|0|sta1|13655463 >>>> >>>> What went wrong? >>>> >>>> TIA >>>> >>>> Kim >>>> >>>> >>> >> >