在 2011-5-7 上午6:48,"bichonfrise74" <bichonfris...@gmail.com>写道: > > Hi, > > I am using this to load the apache log into Hadoop via Hive (my version is 0.4.1). > > CREATE TABLE apache_log ( > ... > logdate STRING, > ... > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(\\w+\/\\w+\/\\w+)\:(\\d+:\\d+:\\d+) ... > ... > > The date is coming in this format: dd/mmm/yyyy. > I would like to be able to load the data using this date format: yyyy-mmm-dd. > > 1. Has anyone done this before loading the date in a different a different format? > 2. Also, how do you specify in the create table statement above that the partition is the logdate? > 3. And when I tried to convert the old date into unixtime format via this sql, hive complains. > > hive> select from_unixtime( unix_timestamp( logdate, 'dd/MMM/yyyy')) from apache_log; > FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch from_unixtime: Looking for UDF "from_unixtime" with parameters [class org.apache.hadoop.io.LongWritable]
The unix_timestamp func returns bigint while the from_unixtime func only accepts int as its parameter.so you should use cast: from_unixtime(cast( unix_timestamp( logdate, 'dd/MMM/yyyy') as int)) > Has anyone encountered these issues before? > > Thanks.