[jira] [Commented] (DRILL-7364) Timeout reading from Kafka topic using Kafka plugin
[ https://issues.apache.org/jira/browse/DRILL-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923940#comment-16923940 ] Abhishek Ravi commented on DRILL-7364: -- In my experience, the issue of "Failed to fetch messages" on MapR Streams is hit when "default stream" has not been configured in plugin config. To answer your additional questions 1) Yes, with 20 partitions the query will run with 20 minor fragments. 2) Each minor fragment is equivalent to a Kafka Consumer and will be part of a group. > Timeout reading from Kafka topic using Kafka plugin > --- > > Key: DRILL-7364 > URL: https://issues.apache.org/jira/browse/DRILL-7364 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Kafka >Affects Versions: 1.15.0, 1.16.0 >Reporter: Aditya Allamraju >Priority: Major > > When we try to query Mapr-streams(similar to Apache Kafka) topic using Kafka > plugin, we see the below timeout being thrown. > {code:java} > 0: jdbc:drill:drillbit=10.10.75.158:31010> > 0: jdbc:drill:drillbit=10.10.75.158:31010> select count(*) from > `/sample-stream:fast-messages` where k<100; > Error: DATA_READ ERROR: Failed to fetch messages within 200 milliseconds. > Consider increasing the value of the property : store.kafka.poll.timeout > DATA_READ ERROR: Failed to fetch messages within 200 milliseconds. Consider > increasing the value of the property : store.kafka.poll.timeout > [Error Id: 27112f7b-afd8-43cb-9376-32f4c63ad2d8 ] > Fragment 0:0 > [Error Id: 27112f7b-afd8-43cb-9376-32f4c63ad2d8 on > vm75-158.support.mapr.com:31010] (state=,code=0) > 0: jdbc:drill:drillbit=10.10.75.158:31010> ALTER SYSTEM SET > `store.kafka.poll.timeout` = 5; > +---++ > | ok | summary | > +---++ > | true | store.kafka.poll.timeout updated. | > +---++ > 1 row selected (0.148 seconds) > 0: jdbc:drill:drillbit=10.10.75.158:31010> > {code} > The other interesting behavior is that: > 1) Even after increasing the timeout value to 50secs, the drill query failed > after the execution time crossed 50 secs(~51 secs). > This pattern continued to whatever value we increased. For ex, after > increasing the timeout to 100secs, query failed with above error after 101 > secs of execution time. > The user is using Drill 1.15. > I tried to reproduce this on my test cluster. But this was not consistently > reproducing in the test cluster. Whereas in the client's cluster, we were > able to reproduce this behavior consistently. We collected the logs. But they > have very little info on what's happening. > I believe it is now essential to know how(and why) the timeout parameter > "store.kafka.poll.timeout" is related to the Query execution time to > understand this bug. > I also have few more questions where i couldn't find much documentation. > _1) Is there a one-to-one mapping between number of partitions of a topic to > minor fragments of a query? For ex, If a given topic(t) has say 20 > partitions, then will the query most likely have 20 minor fragments, other > parameters being fairly sized._ > _2) Does each drillbit in the cluster equivalent to a Kafka-Consumer and all > such drillbits of a cluster treated as part of a consumer group?_ > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7367) Remove Server details from response headers
[ https://issues.apache.org/jira/browse/DRILL-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7367: --- Labels: ready-to-commit (was: ) > Remove Server details from response headers > --- > > Key: DRILL-7367 > URL: https://issues.apache.org/jira/browse/DRILL-7367 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Drill response headers include Server information which is considered to be a > vulnerability. > {noformat} > curl http://localhost:8047/cluster.json -v -k > * Trying ::1... > * TCP_NODELAY set > * Connected to localhost (::1) port 8047 (#0) > > GET /cluster.json HTTP/1.1 > > Host: localhost:8047 > > User-Agent: curl/7.54.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Date: Thu, 05 Sep 2019 12:47:53 GMT > < Content-Type: application/json > < Content-Length: 436 > < Server: Jetty(9.3.25.v20180904) > ... > {noformat} > https://pentest-tools.com/blog/essential-http-security-headers/ > After the fix headers should be without server information: > {noformat} > curl http://localhost:8047/cluster.json -v -k > * Trying ::1... > * TCP_NODELAY set > * Connected to localhost (::1) port 8047 (#0) > > GET /cluster.json HTTP/1.1 > > Host: localhost:8047 > > User-Agent: curl/7.54.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Date: Thu, 05 Sep 2019 13:55:25 GMT > < Content-Type: application/json > < Content-Length: 436 > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7362) COUNT(*) on JSON with outer list results in JsonParse error
[ https://issues.apache.org/jira/browse/DRILL-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7362: --- Labels: ready-to-commit (was: ) > COUNT(*) on JSON with outer list results in JsonParse error > --- > > Key: DRILL-7362 > URL: https://issues.apache.org/jira/browse/DRILL-7362 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Assignee: Oleg Zinoviev >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Count from a JSON file with a outer array results in JsonParseException: > Cannot read from the middle of a record. Current token was START_ARRAY. > P.S. A simple select from such file works normally. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7367) Remove Server details from response headers
[ https://issues.apache.org/jira/browse/DRILL-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7367: Description: Drill response headers include Server information which is considered to be a vulnerability. {noformat} curl http://localhost:8047/cluster.json -v -k * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 8047 (#0) > GET /cluster.json HTTP/1.1 > Host: localhost:8047 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Thu, 05 Sep 2019 12:47:53 GMT < Content-Type: application/json < Content-Length: 436 < Server: Jetty(9.3.25.v20180904) ... {noformat} https://pentest-tools.com/blog/essential-http-security-headers/ After the fix headers should be without server information: {noformat} curl http://localhost:8047/cluster.json -v -k * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 8047 (#0) > GET /cluster.json HTTP/1.1 > Host: localhost:8047 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Thu, 05 Sep 2019 13:55:25 GMT < Content-Type: application/json < Content-Length: 436 ... {noformat} was: Drill response headers include Server information which is considered to be a vulnerability. {noformat} curl http://localhost:8047/cluster.json -v -k * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 8047 (#0) > GET /cluster.json HTTP/1.1 > Host: localhost:8047 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Thu, 05 Sep 2019 12:47:53 GMT < Content-Type: application/json < Content-Length: 436 < Server: Jetty(9.3.25.v20180904) {noformat} https://pentest-tools.com/blog/essential-http-security-headers/ > Remove Server details from response headers > --- > > Key: DRILL-7367 > URL: https://issues.apache.org/jira/browse/DRILL-7367 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.17.0 > > > Drill response headers include Server information which is considered to be a > vulnerability. > {noformat} > curl http://localhost:8047/cluster.json -v -k > * Trying ::1... > * TCP_NODELAY set > * Connected to localhost (::1) port 8047 (#0) > > GET /cluster.json HTTP/1.1 > > Host: localhost:8047 > > User-Agent: curl/7.54.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Date: Thu, 05 Sep 2019 12:47:53 GMT > < Content-Type: application/json > < Content-Length: 436 > < Server: Jetty(9.3.25.v20180904) > ... > {noformat} > https://pentest-tools.com/blog/essential-http-security-headers/ > After the fix headers should be without server information: > {noformat} > curl http://localhost:8047/cluster.json -v -k > * Trying ::1... > * TCP_NODELAY set > * Connected to localhost (::1) port 8047 (#0) > > GET /cluster.json HTTP/1.1 > > Host: localhost:8047 > > User-Agent: curl/7.54.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Date: Thu, 05 Sep 2019 13:55:25 GMT > < Content-Type: application/json > < Content-Length: 436 > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7367) Remove Server details from response headers
[ https://issues.apache.org/jira/browse/DRILL-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923529#comment-16923529 ] ASF GitHub Bot commented on DRILL-7367: --- arina-ielchiieva commented on pull request #1851: DRILL-7367: Remove Server details from response headers URL: https://github.com/apache/drill/pull/1851 Jira - [DRILL-7367](https://issues.apache.org/jira/browse/DRILL-7367). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove Server details from response headers > --- > > Key: DRILL-7367 > URL: https://issues.apache.org/jira/browse/DRILL-7367 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.17.0 > > > Drill response headers include Server information which is considered to be a > vulnerability. > {noformat} > curl http://localhost:8047/cluster.json -v -k > * Trying ::1... > * TCP_NODELAY set > * Connected to localhost (::1) port 8047 (#0) > > GET /cluster.json HTTP/1.1 > > Host: localhost:8047 > > User-Agent: curl/7.54.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Date: Thu, 05 Sep 2019 12:47:53 GMT > < Content-Type: application/json > < Content-Length: 436 > < Server: Jetty(9.3.25.v20180904) > {noformat} > https://pentest-tools.com/blog/essential-http-security-headers/ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7343: Reviewer: Arina Ielchiieva > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7343: Labels: doc-impacting ready-to-commit (was: ) > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923482#comment-16923482 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on issue #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#issuecomment-528391368 Looks good, +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923477#comment-16923477 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321296574 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,56 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apaceh Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. + +### Usage +Using this function is fairly simple. The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. +``` +SELECT parse_user_agent( columns[0] ) as ua +FROM dfs.`/Users/cgivre/drill-httpd/ua.csv`; +``` +The query above returns: +``` +{ + "DeviceClass":"Desktop", + "DeviceName":"Macintosh", + "DeviceBrand":"Apple", + "OperatingSystemClass":"Desktop", + "OperatingSystemName":"Mac OS X", + "OperatingSystemVersion":"10.10.1", + "OperatingSystemNameVersion":"Mac OS X 10.10.1", + "LayoutEngineClass":"Browser", + "LayoutEngineName":"Blink", + "LayoutEngineVersion":"39.0", + "LayoutEngineVersionMajor":"39", + "LayoutEngineNameVersion":"Blink 39.0", + "LayoutEngineNameVersionMajor":"Blink 39", + "AgentClass":"Browser", + "AgentName":"Chrome", + "AgentVersion":"39.0.2171.99", + "AgentVersionMajor":"39", + "AgentNameVersion":"Chrome 39.0.2171.99", + "AgentNameVersionMajor":"Chrome 39", + "DeviceCpu":"Intel" +} +``` +The function returns a Drill map, so you can access any of the fields using Drill's table.map.key notation. For example, the query below illustrates how to extract a field from this map and summarize it: + +``` +SELECT uadata.ua.AgentNameVersion AS Browser, +COUNT( * ) AS BrowserCount +FROM ( + SELECT parse_user_agent( columns[0] ) AS ua + FROM dfs.drillworkshop.`user-agents.csv` +) AS uadata +GROUP BY uadata.ua.AgentNameVersion +ORDER BY BrowserCount DESC +``` +The function can also be called with an optional field as an argument. IE: Review comment: @cgivre please fix this one, this the last one :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (DRILL-7368) Query from Iceberg Metastore fails if filter column contains null
Arina Ielchiieva created DRILL-7368: --- Summary: Query from Iceberg Metastore fails if filter column contains null Key: DRILL-7368 URL: https://issues.apache.org/jira/browse/DRILL-7368 Project: Apache Drill Issue Type: Bug Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Fix For: 1.17.0 When querying data from Drill Iceberg Metastore query fails if filter column contains null. Problem is in Iceberg implementation - https://github.com/apache/incubator-iceberg/pull/443 Fix steps: upgrade to latest Iceberg commit which includes appropriate fix. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (DRILL-7367) Remove Server details from response headers
Arina Ielchiieva created DRILL-7367: --- Summary: Remove Server details from response headers Key: DRILL-7367 URL: https://issues.apache.org/jira/browse/DRILL-7367 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Fix For: 1.17.0 Drill response headers include Server information which is considered to be a vulnerability. {noformat} curl http://localhost:8047/cluster.json -v -k * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 8047 (#0) > GET /cluster.json HTTP/1.1 > Host: localhost:8047 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Thu, 05 Sep 2019 12:47:53 GMT < Content-Type: application/json < Content-Length: 436 < Server: Jetty(9.3.25.v20180904) {noformat} https://pentest-tools.com/blog/essential-http-security-headers/ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923420#comment-16923420 ] ASF GitHub Bot commented on DRILL-7343: --- cgivre commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321247492 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/UserAgentFunctions.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.annotations.Workspace; +import org.apache.drill.exec.expr.holders.NullableVarCharHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; + +import javax.inject.Inject; + +public class UserAgentFunctions { + + @FunctionTemplate(name = "parse_user_agent", +scope = FunctionTemplate.FunctionScope.SIMPLE + ) + public static class UserAgentFunction implements DrillSimpleFunc { +@Param +VarCharHolder input; + +@Output +BaseWriter.ComplexWriter outWriter; + +@Inject +DrillBuf outBuffer; + +@Workspace +nl.basjes.parse.useragent.UserAgentAnalyzerDirect uaa; + +public void setup() { + uaa = nl.basjes.parse.useragent.UserAgentAnalyzerDirect.newBuilder().dropTests().hideMatcherLoadStats().build(); + uaa.getAllPossibleFieldNamesSorted(); +} + +public void eval() { + org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter queryMapWriter = outWriter.rootAsMap(); + + String userAgentString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(input); + + if (userAgentString.isEmpty() || userAgentString.equals("null")) { Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923421#comment-16923421 ] ASF GitHub Bot commented on DRILL-7343: --- cgivre commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321247553 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,58 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apache Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. + +### Usage +The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. +``` +SELECT parse_user_agent( columns[0] ) as ua +FROM dfs.`/tmp/data/drill-httpd/ua.csv`; +``` +The query above returns: +``` +{ + "DeviceClass":"Desktop", + "DeviceName":"Macintosh", + "DeviceBrand":"Apple", + "OperatingSystemClass":"Desktop", + "OperatingSystemName":"Mac OS X", + "OperatingSystemVersion":"10.10.1", + "OperatingSystemNameVersion":"Mac OS X 10.10.1", + "LayoutEngineClass":"Browser", + "LayoutEngineName":"Blink", + "LayoutEngineVersion":"39.0", + "LayoutEngineVersionMajor":"39", + "LayoutEngineNameVersion":"Blink 39.0", + "LayoutEngineNameVersionMajor":"Blink 39", + "AgentClass":"Browser", + "AgentName":"Chrome", + "AgentVersion":"39.0.2171.99", + "AgentVersionMajor":"39", + "AgentNameVersion":"Chrome 39.0.2171.99", + "AgentNameVersionMajor":"Chrome 39", + "DeviceCpu":"Intel" +} +``` +The function returns a Drill map, so you can access any of the fields using Drill's table.map.key notation. For example, the query below illustrates how to extract a field from this map and summarize it: + +``` +SELECT uadata.ua.AgentNameVersion AS Browser, +COUNT( * ) AS BrowserCount +FROM ( + SELECT parse_user_agent( columns[0] ) AS ua + FROM dfs.drillworkshop.`user-agents.csv` +) AS uadata +GROUP BY uadata.ua.AgentNameVersion +ORDER BY BrowserCount DESC +``` +The function can also be called with an optional field as an argument. IE: +``` +SELECT parse_user_agent( `user_agent`, 'AgentName` ) as AgentName ... +``` +which will just return the requested field. If the user agent string is empty, all fields will have the value of `Hacker`. + +Note: This function does not accept `NULL` as input. Review comment: Removed and Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923422#comment-16923422 ] ASF GitHub Bot commented on DRILL-7343: --- cgivre commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321247660 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestUserAgentFunctions.java ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.HashMap; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestUserAgentFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testParseUserAgentString() throws Exception { +String query = "SELECT t1.ua.DeviceClass AS DeviceClass,\n" + + "t1.ua.DeviceName AS DeviceName,\n" + + "t1.ua.DeviceBrand AS DeviceBrand,\n" + + "t1.ua.DeviceCpuBits AS DeviceCpuBits,\n" + + "t1.ua.OperatingSystemClass AS OperatingSystemClass,\n" + + "t1.ua.OperatingSystemName AS OperatingSystemName,\n" + + "t1.ua.OperatingSystemVersion AS OperatingSystemVersion,\n" + + "t1.ua.OperatingSystemVersionMajor AS OperatingSystemVersionMajor,\n" + + "t1.ua.OperatingSystemNameVersion AS OperatingSystemNameVersion,\n" + + "t1.ua.OperatingSystemNameVersionMajor AS OperatingSystemNameVersionMajor,\n" + + "t1.ua.LayoutEngineClass AS LayoutEngineClass,\n" + + "t1.ua.LayoutEngineName AS LayoutEngineName,\n" + + "t1.ua.LayoutEngineVersion AS LayoutEngineVersion,\n" + + "t1.ua.LayoutEngineVersionMajor AS LayoutEngineVersionMajor,\n" + + "t1.ua.LayoutEngineNameVersion AS LayoutEngineNameVersion,\n" + + "t1.ua.LayoutEngineBuild AS LayoutEngineBuild,\n" + + "t1.ua.AgentClass AS AgentClass,\n" + + "t1.ua.AgentName AS AgentName,\n" + + "t1.ua.AgentVersion AS AgentVersion,\n" + + "t1.ua.AgentVersionMajor AS AgentVersionMajor,\n" + + "t1.ua.AgentNameVersionMajor AS AgentNameVersionMajor,\n" + + "t1.ua.AgentLanguage AS AgentLanguage,\n" + + "t1.ua.AgentLanguageCode AS AgentLanguageCode,\n" + + "t1.ua.AgentSecurity AS AgentSecurity\n" + + "FROM (SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') AS ua FROM (values(1))) AS t1"; + +testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("DeviceClass", "DeviceName", "DeviceBrand", "DeviceCpuBits", "OperatingSystemClass", "OperatingSystemName", "OperatingSystemVersion", "OperatingSystemVersionMajor", "OperatingSystemNameVersion", "OperatingSystemNameVersionMajor", "LayoutEngineClass", "LayoutEngineName", "LayoutEngineVersion", "LayoutEngineVersionMajor", "LayoutEngineNameVersion", "LayoutEngineBuild", "AgentClass", "AgentName", "AgentVersion", "AgentVersionMajor", "AgentNameVersionMajor", "AgentLanguage", "AgentLanguageCode", "AgentSecurity") + .baselineValues("Desktop", "Desktop", "Unknown", "32", "Desktop", "Windows NT", "XP", "XP", "Windows XP", "Windows XP", "Browser", "Gecko", "1.8.1.11", "1", "Gecko 1.8.1.11", "20071127", "Browser", "Firefox", "2.0.0.11", "2", "Firefox 2", "English (United States)", "en-us", "Strong security") + .go(); + } + + @Test + public void testGetHostName() throws Exception { +String query = "SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'AgentSecurity') AS agent FROM " + + "(values(1))"; +testBuilder() +
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923387#comment-16923387 ] ASF GitHub Bot commented on DRILL-7343: --- KazydubB commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321233721 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/UserAgentFunctions.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.annotations.Workspace; +import org.apache.drill.exec.expr.holders.NullableVarCharHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; + +import javax.inject.Inject; + +public class UserAgentFunctions { + + @FunctionTemplate(name = "parse_user_agent", +scope = FunctionTemplate.FunctionScope.SIMPLE + ) + public static class UserAgentFunction implements DrillSimpleFunc { +@Param +VarCharHolder input; + +@Output +BaseWriter.ComplexWriter outWriter; + +@Inject +DrillBuf outBuffer; + +@Workspace +nl.basjes.parse.useragent.UserAgentAnalyzerDirect uaa; + +public void setup() { + uaa = nl.basjes.parse.useragent.UserAgentAnalyzerDirect.newBuilder().dropTests().hideMatcherLoadStats().build(); + uaa.getAllPossibleFieldNamesSorted(); +} + +public void eval() { + org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter queryMapWriter = outWriter.rootAsMap(); + + String userAgentString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(input); + + if (userAgentString.isEmpty() || userAgentString.equals("null")) { Review comment: The `userAgentString.isEmpty()` check can be dropped (here and below). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923386#comment-16923386 ] ASF GitHub Bot commented on DRILL-7343: --- KazydubB commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321231642 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,58 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apache Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. + +### Usage +The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. +``` +SELECT parse_user_agent( columns[0] ) as ua +FROM dfs.`/tmp/data/drill-httpd/ua.csv`; +``` +The query above returns: +``` +{ + "DeviceClass":"Desktop", + "DeviceName":"Macintosh", + "DeviceBrand":"Apple", + "OperatingSystemClass":"Desktop", + "OperatingSystemName":"Mac OS X", + "OperatingSystemVersion":"10.10.1", + "OperatingSystemNameVersion":"Mac OS X 10.10.1", + "LayoutEngineClass":"Browser", + "LayoutEngineName":"Blink", + "LayoutEngineVersion":"39.0", + "LayoutEngineVersionMajor":"39", + "LayoutEngineNameVersion":"Blink 39.0", + "LayoutEngineNameVersionMajor":"Blink 39", + "AgentClass":"Browser", + "AgentName":"Chrome", + "AgentVersion":"39.0.2171.99", + "AgentVersionMajor":"39", + "AgentNameVersion":"Chrome 39.0.2171.99", + "AgentNameVersionMajor":"Chrome 39", + "DeviceCpu":"Intel" +} +``` +The function returns a Drill map, so you can access any of the fields using Drill's table.map.key notation. For example, the query below illustrates how to extract a field from this map and summarize it: + +``` +SELECT uadata.ua.AgentNameVersion AS Browser, +COUNT( * ) AS BrowserCount +FROM ( + SELECT parse_user_agent( columns[0] ) AS ua + FROM dfs.drillworkshop.`user-agents.csv` +) AS uadata +GROUP BY uadata.ua.AgentNameVersion +ORDER BY BrowserCount DESC +``` +The function can also be called with an optional field as an argument. IE: +``` +SELECT parse_user_agent( `user_agent`, 'AgentName` ) as AgentName ... +``` +which will just return the requested field. If the user agent string is empty, all fields will have the value of `Hacker`. + +Note: This function does not accept `NULL` as input. Review comment: I believe, this line may be removed? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923369#comment-16923369 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321228335 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestUserAgentFunctions.java ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.HashMap; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestUserAgentFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testParseUserAgentString() throws Exception { +String query = "SELECT t1.ua.DeviceClass AS DeviceClass,\n" + + "t1.ua.DeviceName AS DeviceName,\n" + + "t1.ua.DeviceBrand AS DeviceBrand,\n" + + "t1.ua.DeviceCpuBits AS DeviceCpuBits,\n" + + "t1.ua.OperatingSystemClass AS OperatingSystemClass,\n" + + "t1.ua.OperatingSystemName AS OperatingSystemName,\n" + + "t1.ua.OperatingSystemVersion AS OperatingSystemVersion,\n" + + "t1.ua.OperatingSystemVersionMajor AS OperatingSystemVersionMajor,\n" + + "t1.ua.OperatingSystemNameVersion AS OperatingSystemNameVersion,\n" + + "t1.ua.OperatingSystemNameVersionMajor AS OperatingSystemNameVersionMajor,\n" + + "t1.ua.LayoutEngineClass AS LayoutEngineClass,\n" + + "t1.ua.LayoutEngineName AS LayoutEngineName,\n" + + "t1.ua.LayoutEngineVersion AS LayoutEngineVersion,\n" + + "t1.ua.LayoutEngineVersionMajor AS LayoutEngineVersionMajor,\n" + + "t1.ua.LayoutEngineNameVersion AS LayoutEngineNameVersion,\n" + + "t1.ua.LayoutEngineBuild AS LayoutEngineBuild,\n" + + "t1.ua.AgentClass AS AgentClass,\n" + + "t1.ua.AgentName AS AgentName,\n" + + "t1.ua.AgentVersion AS AgentVersion,\n" + + "t1.ua.AgentVersionMajor AS AgentVersionMajor,\n" + + "t1.ua.AgentNameVersionMajor AS AgentNameVersionMajor,\n" + + "t1.ua.AgentLanguage AS AgentLanguage,\n" + + "t1.ua.AgentLanguageCode AS AgentLanguageCode,\n" + + "t1.ua.AgentSecurity AS AgentSecurity\n" + + "FROM (SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') AS ua FROM (values(1))) AS t1"; + +testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("DeviceClass", "DeviceName", "DeviceBrand", "DeviceCpuBits", "OperatingSystemClass", "OperatingSystemName", "OperatingSystemVersion", "OperatingSystemVersionMajor", "OperatingSystemNameVersion", "OperatingSystemNameVersionMajor", "LayoutEngineClass", "LayoutEngineName", "LayoutEngineVersion", "LayoutEngineVersionMajor", "LayoutEngineNameVersion", "LayoutEngineBuild", "AgentClass", "AgentName", "AgentVersion", "AgentVersionMajor", "AgentNameVersionMajor", "AgentLanguage", "AgentLanguageCode", "AgentSecurity") + .baselineValues("Desktop", "Desktop", "Unknown", "32", "Desktop", "Windows NT", "XP", "XP", "Windows XP", "Windows XP", "Browser", "Gecko", "1.8.1.11", "1", "Gecko 1.8.1.11", "20071127", "Browser", "Firefox", "2.0.0.11", "2", "Firefox 2", "English (United States)", "en-us", "Strong security") + .go(); + } + + @Test + public void testGetHostName() throws Exception { +String query = "SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'AgentSecurity') AS agent FROM " + + "(values(1))"; +testBuilder() +
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923367#comment-16923367 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321228229 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestUserAgentFunctions.java ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.HashMap; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestUserAgentFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testParseUserAgentString() throws Exception { +String query = "SELECT t1.ua.DeviceClass AS DeviceClass,\n" + + "t1.ua.DeviceName AS DeviceName,\n" + + "t1.ua.DeviceBrand AS DeviceBrand,\n" + + "t1.ua.DeviceCpuBits AS DeviceCpuBits,\n" + + "t1.ua.OperatingSystemClass AS OperatingSystemClass,\n" + + "t1.ua.OperatingSystemName AS OperatingSystemName,\n" + + "t1.ua.OperatingSystemVersion AS OperatingSystemVersion,\n" + + "t1.ua.OperatingSystemVersionMajor AS OperatingSystemVersionMajor,\n" + + "t1.ua.OperatingSystemNameVersion AS OperatingSystemNameVersion,\n" + + "t1.ua.OperatingSystemNameVersionMajor AS OperatingSystemNameVersionMajor,\n" + + "t1.ua.LayoutEngineClass AS LayoutEngineClass,\n" + + "t1.ua.LayoutEngineName AS LayoutEngineName,\n" + + "t1.ua.LayoutEngineVersion AS LayoutEngineVersion,\n" + + "t1.ua.LayoutEngineVersionMajor AS LayoutEngineVersionMajor,\n" + + "t1.ua.LayoutEngineNameVersion AS LayoutEngineNameVersion,\n" + + "t1.ua.LayoutEngineBuild AS LayoutEngineBuild,\n" + + "t1.ua.AgentClass AS AgentClass,\n" + + "t1.ua.AgentName AS AgentName,\n" + + "t1.ua.AgentVersion AS AgentVersion,\n" + + "t1.ua.AgentVersionMajor AS AgentVersionMajor,\n" + + "t1.ua.AgentNameVersionMajor AS AgentNameVersionMajor,\n" + + "t1.ua.AgentLanguage AS AgentLanguage,\n" + + "t1.ua.AgentLanguageCode AS AgentLanguageCode,\n" + + "t1.ua.AgentSecurity AS AgentSecurity\n" + + "FROM (SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') AS ua FROM (values(1))) AS t1"; + +testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("DeviceClass", "DeviceName", "DeviceBrand", "DeviceCpuBits", "OperatingSystemClass", "OperatingSystemName", "OperatingSystemVersion", "OperatingSystemVersionMajor", "OperatingSystemNameVersion", "OperatingSystemNameVersionMajor", "LayoutEngineClass", "LayoutEngineName", "LayoutEngineVersion", "LayoutEngineVersionMajor", "LayoutEngineNameVersion", "LayoutEngineBuild", "AgentClass", "AgentName", "AgentVersion", "AgentVersionMajor", "AgentNameVersionMajor", "AgentLanguage", "AgentLanguageCode", "AgentSecurity") + .baselineValues("Desktop", "Desktop", "Unknown", "32", "Desktop", "Windows NT", "XP", "XP", "Windows XP", "Windows XP", "Browser", "Gecko", "1.8.1.11", "1", "Gecko 1.8.1.11", "20071127", "Browser", "Firefox", "2.0.0.11", "2", "Firefox 2", "English (United States)", "en-us", "Strong security") + .go(); + } + + @Test + public void testGetHostName() throws Exception { +String query = "SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'AgentSecurity') AS agent FROM " + + "(values(1))"; +testBuilder() +
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923368#comment-16923368 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321226710 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,58 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apache Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. + +### Usage +The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. Review comment: ```suggestion The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923366#comment-16923366 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321228185 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestUserAgentFunctions.java ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.HashMap; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestUserAgentFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testParseUserAgentString() throws Exception { +String query = "SELECT t1.ua.DeviceClass AS DeviceClass,\n" + + "t1.ua.DeviceName AS DeviceName,\n" + + "t1.ua.DeviceBrand AS DeviceBrand,\n" + + "t1.ua.DeviceCpuBits AS DeviceCpuBits,\n" + + "t1.ua.OperatingSystemClass AS OperatingSystemClass,\n" + + "t1.ua.OperatingSystemName AS OperatingSystemName,\n" + + "t1.ua.OperatingSystemVersion AS OperatingSystemVersion,\n" + + "t1.ua.OperatingSystemVersionMajor AS OperatingSystemVersionMajor,\n" + + "t1.ua.OperatingSystemNameVersion AS OperatingSystemNameVersion,\n" + + "t1.ua.OperatingSystemNameVersionMajor AS OperatingSystemNameVersionMajor,\n" + + "t1.ua.LayoutEngineClass AS LayoutEngineClass,\n" + + "t1.ua.LayoutEngineName AS LayoutEngineName,\n" + + "t1.ua.LayoutEngineVersion AS LayoutEngineVersion,\n" + + "t1.ua.LayoutEngineVersionMajor AS LayoutEngineVersionMajor,\n" + + "t1.ua.LayoutEngineNameVersion AS LayoutEngineNameVersion,\n" + + "t1.ua.LayoutEngineBuild AS LayoutEngineBuild,\n" + + "t1.ua.AgentClass AS AgentClass,\n" + + "t1.ua.AgentName AS AgentName,\n" + + "t1.ua.AgentVersion AS AgentVersion,\n" + + "t1.ua.AgentVersionMajor AS AgentVersionMajor,\n" + + "t1.ua.AgentNameVersionMajor AS AgentNameVersionMajor,\n" + + "t1.ua.AgentLanguage AS AgentLanguage,\n" + + "t1.ua.AgentLanguageCode AS AgentLanguageCode,\n" + + "t1.ua.AgentSecurity AS AgentSecurity\n" + + "FROM (SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') AS ua FROM (values(1))) AS t1"; + +testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("DeviceClass", "DeviceName", "DeviceBrand", "DeviceCpuBits", "OperatingSystemClass", "OperatingSystemName", "OperatingSystemVersion", "OperatingSystemVersionMajor", "OperatingSystemNameVersion", "OperatingSystemNameVersionMajor", "LayoutEngineClass", "LayoutEngineName", "LayoutEngineVersion", "LayoutEngineVersionMajor", "LayoutEngineNameVersion", "LayoutEngineBuild", "AgentClass", "AgentName", "AgentVersion", "AgentVersionMajor", "AgentNameVersionMajor", "AgentLanguage", "AgentLanguageCode", "AgentSecurity") + .baselineValues("Desktop", "Desktop", "Unknown", "32", "Desktop", "Windows NT", "XP", "XP", "Windows XP", "Windows XP", "Browser", "Gecko", "1.8.1.11", "1", "Gecko 1.8.1.11", "20071127", "Browser", "Firefox", "2.0.0.11", "2", "Firefox 2", "English (United States)", "en-us", "Strong security") + .go(); + } + + @Test + public void testGetHostName() throws Exception { +String query = "SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'AgentSecurity') AS agent FROM " + + "(values(1))"; +testBuilder() +
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923370#comment-16923370 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321228379 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestUserAgentFunctions.java ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.HashMap; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestUserAgentFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testParseUserAgentString() throws Exception { +String query = "SELECT t1.ua.DeviceClass AS DeviceClass,\n" + + "t1.ua.DeviceName AS DeviceName,\n" + + "t1.ua.DeviceBrand AS DeviceBrand,\n" + + "t1.ua.DeviceCpuBits AS DeviceCpuBits,\n" + + "t1.ua.OperatingSystemClass AS OperatingSystemClass,\n" + + "t1.ua.OperatingSystemName AS OperatingSystemName,\n" + + "t1.ua.OperatingSystemVersion AS OperatingSystemVersion,\n" + + "t1.ua.OperatingSystemVersionMajor AS OperatingSystemVersionMajor,\n" + + "t1.ua.OperatingSystemNameVersion AS OperatingSystemNameVersion,\n" + + "t1.ua.OperatingSystemNameVersionMajor AS OperatingSystemNameVersionMajor,\n" + + "t1.ua.LayoutEngineClass AS LayoutEngineClass,\n" + + "t1.ua.LayoutEngineName AS LayoutEngineName,\n" + + "t1.ua.LayoutEngineVersion AS LayoutEngineVersion,\n" + + "t1.ua.LayoutEngineVersionMajor AS LayoutEngineVersionMajor,\n" + + "t1.ua.LayoutEngineNameVersion AS LayoutEngineNameVersion,\n" + + "t1.ua.LayoutEngineBuild AS LayoutEngineBuild,\n" + + "t1.ua.AgentClass AS AgentClass,\n" + + "t1.ua.AgentName AS AgentName,\n" + + "t1.ua.AgentVersion AS AgentVersion,\n" + + "t1.ua.AgentVersionMajor AS AgentVersionMajor,\n" + + "t1.ua.AgentNameVersionMajor AS AgentNameVersionMajor,\n" + + "t1.ua.AgentLanguage AS AgentLanguage,\n" + + "t1.ua.AgentLanguageCode AS AgentLanguageCode,\n" + + "t1.ua.AgentSecurity AS AgentSecurity\n" + + "FROM (SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') AS ua FROM (values(1))) AS t1"; + +testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("DeviceClass", "DeviceName", "DeviceBrand", "DeviceCpuBits", "OperatingSystemClass", "OperatingSystemName", "OperatingSystemVersion", "OperatingSystemVersionMajor", "OperatingSystemNameVersion", "OperatingSystemNameVersionMajor", "LayoutEngineClass", "LayoutEngineName", "LayoutEngineVersion", "LayoutEngineVersionMajor", "LayoutEngineNameVersion", "LayoutEngineBuild", "AgentClass", "AgentName", "AgentVersion", "AgentVersionMajor", "AgentNameVersionMajor", "AgentLanguage", "AgentLanguageCode", "AgentSecurity") + .baselineValues("Desktop", "Desktop", "Unknown", "32", "Desktop", "Windows NT", "XP", "XP", "Windows XP", "Windows XP", "Browser", "Gecko", "1.8.1.11", "1", "Gecko 1.8.1.11", "20071127", "Browser", "Firefox", "2.0.0.11", "2", "Firefox 2", "English (United States)", "en-us", "Strong security") + .go(); + } + + @Test + public void testGetHostName() throws Exception { +String query = "SELECT parse_user_agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'AgentSecurity') AS agent FROM " + + "(values(1))"; +testBuilder() +
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923365#comment-16923365 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321226651 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,58 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apache Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. + +### Usage +The function `parse_user_agent()` takes a user agent string as an argument and returns a map of the available fields. Note that not every field will be present in every user agent string. +``` +SELECT parse_user_agent( columns[0] ) as ua +FROM dfs.`/tmp/data/drill-httpd/ua.csv`; +``` +The query above returns: +``` +{ + "DeviceClass":"Desktop", + "DeviceName":"Macintosh", + "DeviceBrand":"Apple", + "OperatingSystemClass":"Desktop", + "OperatingSystemName":"Mac OS X", + "OperatingSystemVersion":"10.10.1", + "OperatingSystemNameVersion":"Mac OS X 10.10.1", + "LayoutEngineClass":"Browser", + "LayoutEngineName":"Blink", + "LayoutEngineVersion":"39.0", + "LayoutEngineVersionMajor":"39", + "LayoutEngineNameVersion":"Blink 39.0", + "LayoutEngineNameVersionMajor":"Blink 39", + "AgentClass":"Browser", + "AgentName":"Chrome", + "AgentVersion":"39.0.2171.99", + "AgentVersionMajor":"39", + "AgentNameVersion":"Chrome 39.0.2171.99", + "AgentNameVersionMajor":"Chrome 39", + "DeviceCpu":"Intel" +} +``` +The function returns a Drill map, so you can access any of the fields using Drill's table.map.key notation. For example, the query below illustrates how to extract a field from this map and summarize it: + +``` +SELECT uadata.ua.AgentNameVersion AS Browser, +COUNT( * ) AS BrowserCount +FROM ( + SELECT parse_user_agent( columns[0] ) AS ua + FROM dfs.drillworkshop.`user-agents.csv` +) AS uadata +GROUP BY uadata.ua.AgentNameVersion +ORDER BY BrowserCount DESC +``` +The function can also be called with an optional field as an argument. IE: +``` +SELECT parse_user_agent( `user_agent`, 'AgentName` ) as AgentName ... +``` +which will just return the requested field. If the user agent string is empty, all fields will have the value of `Hacker`. Review comment: ```suggestion which will just return the requested field. If the user agent string is empty, all fields will have the value of `Hacker`. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7343) Add User-Agent UDFs to Drill
[ https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923371#comment-16923371 ] ASF GitHub Bot commented on DRILL-7343: --- arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent UDFs to Drill URL: https://github.com/apache/drill/pull/1840#discussion_r321226983 ## File path: contrib/udfs/README.md ## @@ -0,0 +1,58 @@ +# Drill User Defined Functions + +This `README` documents functions which users have submitted to Apache Drill. + +## User Agent Functions +Drill UDF for parsing User Agent Strings. +This function is based on Niels Basjes Java library for parsing user agent strings which is available here: https://github.com/nielsbasjes/yauaa. Review comment: Please make a link: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#links This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add User-Agent UDFs to Drill > > > Key: DRILL-7343 > URL: https://issues.apache.org/jira/browse/DRILL-7343 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > This collection of UDFs adds the ability to parse user agent strings which is > useful for security data analysis. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7251: Description: Described in DRILL-3290 design doc. > Read Hive array w/o nulls > - > > Key: DRILL-7251 > URL: https://issues.apache.org/jira/browse/DRILL-7251 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7253) Read Hive struct w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7253: Description: Described in DRILL-3290 design doc. > Read Hive struct w/o nulls > -- > > Key: DRILL-7253 > URL: https://issues.apache.org/jira/browse/DRILL-7253 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7252) Read Hive map using canonical Map vector
[ https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7252: Description: Described in DRILL-3290 design doc. > Read Hive map using canonical Map vector > - > > Key: DRILL-7252 > URL: https://issues.apache.org/jira/browse/DRILL-7252 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)