Guys, I am struggling to create this view. I am keep getting the error in bold. I found that I need to use lateral view but still I am not able to get the syntax right.
hive> create view elb_raw_log_detailed as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid') as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url, 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from elblogz; *FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions* On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari < linuxtricksfordev...@gmail.com> wrote: > Yes views looks like a way to go > > On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.ca...@gmail.com> > wrote: > >> The royal way to do that is a view IMHO. >> >> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com >> >: >> >>> Thanks for the reply Damien. The suggestion you gave is really useful. >>> Currently I am achieving my desired output by performing below steps. But I >>> want to achieve the desired result in one step instead of two. Do we have >>> any way so that I can get the aid, did etc in create table statement? If >>> not I will have to look for the option that you mentioned >>> >>> 1. >>> CREATE TABLE elblog ( >>> Request_date STRING, >>> ELBName STRING, >>> RequestIP STRING, >>> RequestPort INT, >>> BackendIP STRING, >>> BackendPort INT, >>> RequestProcessingTime DOUBLE, >>> BackendProcessingTime DOUBLE, >>> ClientResponseTime DOUBLE, >>> ELBResponseCode STRING, >>> BackendResponseCode STRING, >>> ReceivedBytes BIGINT, >>> SentBytes BIGINT, >>> RequestVerb STRING, >>> URL STRING, >>> Protocol STRING, >>> Useragent STRING, >>> ssl_cipher STRING, >>> ssl_protocol STRING >>> ) >>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' >>> WITH SERDEPROPERTIES ( >>> "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ >>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) >>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" >>> ) >>> STORED AS TEXTFILE; >>> >>> 2. >>> create table elb_raw_log as select request_date, elbname, requestip, >>> requestport, backendip, backendport, requestprocessingtime, >>> backendprocessingtime, clientresponsetime, elbresponsecode, >>> backendresponsecode, receivedbytes, sentbytes, requestverb, url, >>> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, >>> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, >>> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, >>> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, >>> useragent, ssl_cipher, ssl_protocol from elblog; >>> >>> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.ca...@gmail.com> >>> wrote: >>> >>>> see the udf >>>> *parse_url_tuple* >>>> SELECT b.* >>>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', >>>> 'QUERY:id') b as host, path, query, query_id LIMIT 1; >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan >>>> ual+UDF#LanguageManualUDF-parse_url_tuple >>>> >>>> 2016-09-20 11:22 GMT+02:00 Manish Rangari < >>>> linuxtricksfordev...@gmail.com>: >>>> >>>>> Guys, >>>>> >>>>> I want to get the field of elb logs. A sample elb log is given below >>>>> and I am using below create table definition. It is working fine. I am >>>>> getting what I wanted but now I want the bold part as well. For example >>>>> eid, tid, aid. Can anyone help me how can I match them as well. >>>>> >>>>> NOTE: The position of aid, eid, tid is not fixed and it may change. >>>>> >>>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 >>>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET >>>>> https://site1.example.com:443/peek? >>>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 >>>>> (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) >>>>> Chrome/45.0.2454.85 >>>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 >>>>> >>>>> >>>>> CREATE TABLE elblog ( >>>>> Request_date STRING, >>>>> ELBName STRING, >>>>> RequestIP STRING, >>>>> RequestPort INT, >>>>> BackendIP STRING, >>>>> BackendPort INT, >>>>> RequestProcessingTime DOUBLE, >>>>> BackendProcessingTime DOUBLE, >>>>> ClientResponseTime DOUBLE, >>>>> ELBResponseCode STRING, >>>>> BackendResponseCode STRING, >>>>> ReceivedBytes BIGINT, >>>>> SentBytes BIGINT, >>>>> RequestVerb STRING, >>>>> URL STRING, >>>>> Protocol STRING, >>>>> Useragent STRING, >>>>> ssl_cipher STRING, >>>>> ssl_protocol STRING >>>>> ) >>>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' >>>>> WITH SERDEPROPERTIES ( >>>>> "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ >>>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) >>>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" >>>>> ) >>>>> STORED AS TEXTFILE; >>>>> >>>> >>>> >>> >> >