Thanks Dudu, both the queries worked like a charm. I personally liked second query as it is quite easy to remember.
--Manish On Tue, Sep 20, 2016 at 8:41 PM, Markovitz, Dudu <dmarkov...@paypal.com> wrote: > Or > > > > create view elb_raw_log_detailed > > as > > select request_date, elbname, requestip, requestport, backendip, > backendport, requestprocessingtime, backendprocessingtime, > clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, > sentbytes, requestverb, url, parse_url(url, 'QUERY','aid') as aid, > parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid, > parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher, > ssl_protocol > > from elblog; > > > > Dudu > > > > *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com] > *Sent:* Tuesday, September 20, 2016 6:06 PM > *To:* user@hive.apache.org > *Subject:* RE: ELB Log processing > > > > create view elb_raw_log_detailed > > as > > select request_date, elbname, requestip, requestport, backendip, > backendport, requestprocessingtime, backendprocessingtime, > clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, > sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol, > useragent, ssl_cipher, ssl_protocol > > from elblog > > LATERAL VIEW > parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did') > u as eid,tid,aid,did > > ; > > > > Dudu > > > > *From:* Manish Rangari [mailto:linuxtricksfordev...@gmail.com > <linuxtricksfordev...@gmail.com>] > *Sent:* Tuesday, September 20, 2016 4:09 PM > *To:* user@hive.apache.org > *Subject:* Re: ELB Log processing > > > > Guys, > > > > I am struggling to create this view. I am keep getting the error in bold. > I found that I need to use lateral view but still I am not able to get the > syntax right. > > > > hive> create view elb_raw_log_detailed as select request_date, elbname, > requestip, requestport, backendip, backendport, requestprocessingtime, > backendprocessingtime, clientresponsetime, elbresponsecode, > backendresponsecode, receivedbytes, sentbytes, requestverb, url, > parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid') > as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url, > 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from > elblogz; > > > > *FAILED: SemanticException [Error 10081]: UDTF's are not supported outside > the SELECT clause, nor nested in expressions* > > > > On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari < > linuxtricksfordev...@gmail.com> wrote: > > Yes views looks like a way to go > > > > On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.ca...@gmail.com> > wrote: > > The royal way to do that is a view IMHO. > > > > 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com > >: > > Thanks for the reply Damien. The suggestion you gave is really useful. > Currently I am achieving my desired output by performing below steps. But I > want to achieve the desired result in one step instead of two. Do we have > any way so that I can get the aid, did etc in create table statement? If > not I will have to look for the option that you mentioned > > > > 1. > > CREATE TABLE elblog ( > > Request_date STRING, > > ELBName STRING, > > RequestIP STRING, > > RequestPort INT, > > BackendIP STRING, > > BackendPort INT, > > RequestProcessingTime DOUBLE, > > BackendProcessingTime DOUBLE, > > ClientResponseTime DOUBLE, > > ELBResponseCode STRING, > > BackendResponseCode STRING, > > ReceivedBytes BIGINT, > > SentBytes BIGINT, > > RequestVerb STRING, > > URL STRING, > > Protocol STRING, > > Useragent STRING, > > ssl_cipher STRING, > > ssl_protocol STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' > > WITH SERDEPROPERTIES ( > > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ > ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) > ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" > > ) > > STORED AS TEXTFILE; > > > > 2. > > create table elb_raw_log as select request_date, elbname, requestip, > requestport, backendip, backendport, requestprocessingtime, > backendprocessingtime, clientresponsetime, elbresponsecode, > backendresponsecode, receivedbytes, sentbytes, requestverb, url, > regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, > regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, > regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, > regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, > useragent, ssl_cipher, ssl_protocol from elblog; > > > > On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.ca...@gmail.com> > wrote: > > see the udf *parse_url_tuple* > > SELECT b.* > > FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', > 'QUERY:id') b as host, path, query, query_id LIMIT 1; > > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF# > LanguageManualUDF-parse_url_tuple > > > > 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com > >: > > Guys, > > > > I want to get the field of elb logs. A sample elb log is given below and I > am using below create table definition. It is working fine. I am getting > what I wanted but now I want the bold part as well. For example eid, tid, > aid. Can anyone help me how can I match them as well. > > > > NOTE: The position of aid, eid, tid is not fixed and it may change. > > > > 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021 > 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek? > *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT > 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 > Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 > > > > > > CREATE TABLE elblog ( > > Request_date STRING, > > ELBName STRING, > > RequestIP STRING, > > RequestPort INT, > > BackendIP STRING, > > BackendPort INT, > > RequestProcessingTime DOUBLE, > > BackendProcessingTime DOUBLE, > > ClientResponseTime DOUBLE, > > ELBResponseCode STRING, > > BackendResponseCode STRING, > > ReceivedBytes BIGINT, > > SentBytes BIGINT, > > RequestVerb STRING, > > URL STRING, > > Protocol STRING, > > Useragent STRING, > > ssl_cipher STRING, > > ssl_protocol STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' > > WITH SERDEPROPERTIES ( > > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ > ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) > ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" > > ) > > STORED AS TEXTFILE; > > > > > > > > > > >