Thanks Dudu, both the queries worked like a charm. I personally liked
second query as it is quite easy to remember.

--Manish

On Tue, Sep 20, 2016 at 8:41 PM, Markovitz, Dudu <dmarkov...@paypal.com>
wrote:

> Or
>
>
>
> create view elb_raw_log_detailed
>
> as
>
> select request_date, elbname, requestip, requestport, backendip,
> backendport, requestprocessingtime, backendprocessingtime,
> clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes,
> sentbytes, requestverb, url, parse_url(url, 'QUERY','aid') as aid,
> parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid,
> parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher,
> ssl_protocol
>
> from elblog;
>
>
>
> Dudu
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com]
> *Sent:* Tuesday, September 20, 2016 6:06 PM
> *To:* user@hive.apache.org
> *Subject:* RE: ELB Log processing
>
>
>
> create view elb_raw_log_detailed
>
> as
>
> select request_date, elbname, requestip, requestport, backendip,
> backendport, requestprocessingtime, backendprocessingtime,
> clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes,
> sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol,
> useragent, ssl_cipher, ssl_protocol
>
> from elblog
>
> LATERAL VIEW 
> parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did')
> u as eid,tid,aid,did
>
> ;
>
>
>
> Dudu
>
>
>
> *From:* Manish Rangari [mailto:linuxtricksfordev...@gmail.com
> <linuxtricksfordev...@gmail.com>]
> *Sent:* Tuesday, September 20, 2016 4:09 PM
> *To:* user@hive.apache.org
> *Subject:* Re: ELB Log processing
>
>
>
> Guys,
>
>
>
> I am struggling to create this view. I am keep getting the error in bold.
> I found that I need to use lateral view but still I am not able to get the
> syntax right.
>
>
>
> hive> create view elb_raw_log_detailed as select request_date, elbname,
> requestip, requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid')
> as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url,
> 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from
> elblogz;
>
>
>
> *FAILED: SemanticException [Error 10081]: UDTF's are not supported outside
> the SELECT clause, nor nested in expressions*
>
>
>
> On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <
> linuxtricksfordev...@gmail.com> wrote:
>
> Yes views looks like a way to go
>
>
>
> On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.ca...@gmail.com>
> wrote:
>
> The royal way to do that is a view IMHO.
>
>
>
> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com
> >:
>
> Thanks for the reply Damien. The suggestion you gave is really useful.
> Currently I am achieving my desired output by performing below steps. But I
> want to achieve the desired result in one step instead of two. Do we have
> any way so that I can get the aid, did etc in create table statement? If
> not I will have to look for the option that you mentioned
>
>
>
> 1.
>
> CREATE TABLE elblog (
>
> Request_date STRING,
>
>       ELBName STRING,
>
>       RequestIP STRING,
>
>       RequestPort INT,
>
>       BackendIP STRING,
>
>       BackendPort INT,
>
>       RequestProcessingTime DOUBLE,
>
>       BackendProcessingTime DOUBLE,
>
>       ClientResponseTime DOUBLE,
>
>       ELBResponseCode STRING,
>
>       BackendResponseCode STRING,
>
>       ReceivedBytes BIGINT,
>
>       SentBytes BIGINT,
>
>       RequestVerb STRING,
>
>       URL STRING,
>
>       Protocol STRING,
>
> Useragent STRING,
>
> ssl_cipher STRING,
>
> ssl_protocol STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>
> WITH SERDEPROPERTIES (
>
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>
> )
>
> STORED AS TEXTFILE;
>
>
>
> 2.
>
> create table elb_raw_log as select request_date, elbname, requestip,
> requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
> useragent, ssl_cipher, ssl_protocol from elblog;
>
>
>
> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.ca...@gmail.com>
> wrote:
>
> see the udf *parse_url_tuple*
>
> SELECT b.*
>
> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#
> LanguageManualUDF-parse_url_tuple
>
>
>
> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com
> >:
>
> Guys,
>
>
>
> I want to get the field of elb logs. A sample elb log is given below and I
> am using below create table definition. It is working fine. I am getting
> what I wanted but now I want the bold part as well. For example eid, tid,
> aid. Can anyone help me how can I match them as well.
>
>
>
> NOTE: The position of aid, eid, tid is not fixed and it may change.
>
>
>
> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021
> 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?
> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT
> 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>
>
>
>
>
> CREATE TABLE elblog (
>
> Request_date STRING,
>
>       ELBName STRING,
>
>       RequestIP STRING,
>
>       RequestPort INT,
>
>       BackendIP STRING,
>
>       BackendPort INT,
>
>       RequestProcessingTime DOUBLE,
>
>       BackendProcessingTime DOUBLE,
>
>       ClientResponseTime DOUBLE,
>
>       ELBResponseCode STRING,
>
>       BackendResponseCode STRING,
>
>       ReceivedBytes BIGINT,
>
>       SentBytes BIGINT,
>
>       RequestVerb STRING,
>
>       URL STRING,
>
>       Protocol STRING,
>
> Useragent STRING,
>
> ssl_cipher STRING,
>
> ssl_protocol STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>
> WITH SERDEPROPERTIES (
>
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>
> )
>
> STORED AS TEXTFILE;
>
>
>
>
>
>
>
>
>
>
>

Reply via email to