Ty this: hive > select time, to_date(from_unixtime(unix_timestamp(regexp_extract(time, '^.([^:]*):.*', 1),"dd/MMM/yyyy"))) yyyymmdd from apachelog limit 10;
You can do the same sort of thing to get the hhmmss into a format that is more lexicographically friendly. There’s more info on these functions here: http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF -Loren On Mar 18, 2011, at 3:40 PM, bichonfrise74 wrote: > Hi, > > I am trying to use this: > > add jar ../build/contrib/hive_contrib.jar; > > CREATE TABLE apachelog ( > host STRING, > identity STRING, > user STRING, > time STRING, > request STRING, > status STRING, > size STRING, > referer STRING, > agent STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ > \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ > \"]*|\".*\"))?", > "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" > ) > STORED AS TEXTFILE; > > And it works great. My problem is how query the table with respect to the > time column since it still has this format: [01/Mar/2011:13:01:10 -0700] > > So, I do not know how to execute these kinds of queries: > > 1. select time, count(*) from weblog where time < '01/Mar/2010'; > 2. select hours, count(*) from weblog group by hours;
