Re: Apache Web Log Question

Loren Siebert Fri, 18 Mar 2011 17:20:45 -0700

Ty this:
hive > select time, to_date(from_unixtime(unix_timestamp(regexp_extract(time, 
'^.([^:]*):.*', 1),"dd/MMM/yyyy"))) yyyymmdd from apachelog limit 10;


You can do the same sort of thing to get the hhmmss into a format that is more 
lexicographically friendly.

There’s more info on these functions here: 
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

-Loren

On Mar 18, 2011, at 3:40 PM, bichonfrise74 wrote:

> Hi, 
> 
> I am trying to use this:
> 
> add jar ../build/contrib/hive_contrib.jar;
> 
> CREATE TABLE apachelog (
>   host STRING,
>   identity STRING,
>   user STRING,
>   time STRING,
>   request STRING,
>   status STRING,
>   size STRING,
>   referer STRING,
>   agent STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ 
> \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ 
> \"]*|\".*\"))?",
>   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
> )
> STORED AS TEXTFILE;
> 
> And it works great. My problem is how query the table with respect to the 
> time column since it still has this format: [01/Mar/2011:13:01:10 -0700]  
> 
> So, I do not know how to execute these kinds of queries:
> 
> 1. select time, count(*) from weblog where time < '01/Mar/2010';
> 2. select hours, count(*) from weblog group by hours;

Re: Apache Web Log Question

Reply via email to