Hi Idoor, 
The regex in the example is the pattern which matches MySQL logs. What you have 
to do is write a regex that maps your data and extracts the fields.   Just 
eyeballing it you might have something like:
(\d{2} \w{3} \d{4}) (\d{2}:\d{2}:\d{2})\s\([.+?\])\s(.+?)\[(.+?)\]

(Note: you’ll have to add an additional \ before every slash)

This regex has 5 capturing groups, so you’ll need to define 5 fields in the 
schema section of the format plugin.  I would test this out on regexpal.com 
<http://regexpal.com/> or regex101.com <http://regex101.com/> and see if it 
works.
—C


> On Nov 14, 2018, at 11:40, idoor do <[email protected]> wrote:
> 
> Hi Charles,
> 
> Thanks for your help, now I got it working the mysql log file, but I have
> issues with a different log file format like this:
> 
> 01 Oct 2018 09:30:32 [ID# ] - Query Request [ Datasource : tydy ]
> 
> So the eventDate is     01 Oct 2018
> the eventTime is          09:30:32
> the PID as string is       [ID# ]
> action as string is       - Query Request
> query as string is .      [ Datasource : tydy ]
> 
> so how the log plugin knows the boundaries among all the neighboring
> fields? right now I got
> *Error: PARSE ERROR: Too many errors.  Max error threshold exceeded.*
> in the sqline.log, it said Unmatached line: 01 Oct 2018 09:30:33 [ID# ] -
> Query Request  [ Datasource : tydy ]
> 
> The config I am using is as follows:
> 
> "log": {
>      "type": "logRegex",
>      "regex":
> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)",
>      "extension": "log",
>      "maxErrors": 10,
>      "schema": [
>        {
>          "fieldName": "eventDate",
>          "fieldType": "DATE",
>          "format": "dd MMM yyyy"
>        },
>        {
>          "fieldName": "eventTime",
>          "fieldType": "TIME",
>          "format": "HH:mm:ss"
>        },
>        {
>          "fieldName": "PID"
>        },
>        {
>          "fieldName": "action"
>        },
>        {
>          "fieldName": "query"
>        }
>      ]
>    }
> 
> Thanks very much for your help.
> Idoor
> 
> On Wed, Nov 14, 2018 at 11:01 AM Charles Givre <[email protected]> wrote:
> 
>> Hi idoor,
>> For some reason the documentation for this is an old and incorrect
>> version.  Here is a link to the correct documentation:
>> 
>> 
>> https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
>> <
>> https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
>>> 
>> 
>> It’s actually a lot easier…
>> — C
>> 
>>> On Nov 14, 2018, at 10:53, idoor do <[email protected]> wrote:
>>> 
>>> Could somebody help me with this issue ? I have been stuck on this issue
>>> for a couple of days.
>>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>>> 
>>> I installed drill-logfile-plugin-1.0.0 JAR file to
>>> <drill_install>/jars/3rdParty/ directory, and configured dfs as the
>>> following, but I got error: "Please retry: error (invalid JSON mapping)",
>>> in the sqlline.log file, it shows an error: Unable to find constructor
>> for
>>> storage config named 'log' of type
>>> 'org.apache.drill.exec.store.log.LogFormatPlugin$LogFormatConfig, but I
>>> double checked the drill-logfile-plugin-1.0.0.jar file is in the
>>> jars/3rdParty folder:
>>> 
>>> My config for dfs with log plugin suport is:
>>> {
>>> "type": "file",
>>> "connection": "file:///",
>>> "config": null,
>>> "workspaces": {
>>> "root": {
>>> "location": "/",
>>> "writable": false,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> },
>>> "test": {
>>> "location": "/Users/tsd",
>>> "writable": false,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> },
>>> "tmp": {
>>> "location": "/tmp",
>>> "writable": true,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> }
>>> },
>>> "formats": {
>>> "log" : {
>>> "type" : "log",
>>> "extensions" : [ "log" ],
>>> "fieldNames" : [ "date", "time", "pid", "action", "query" ],
>>> "dataTypes" : [ "DATE", "TIME", "INT", "VARCHAR", "VARCHAR" ],
>>> "dateFormat" : "yyMMdd",
>>> "timeFormat" : "HH:mm:ss",
>>> "pattern" :
>> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)",
>>> "errorOnMismatch" : false
>>> }
>>> },
>>> "enabled": true
>>> }
>>> 
>>> If I configured the log section as this to remove some fields, the error
>>> will disappear, but some fields will be missing, and the query:
>>> 
>>> select * from `mysql.log` limit 10; returns error: ERROR
>>> o.a.calcite.runtime.CalciteException -
>>> org.apache.calcite.sql.validate.SqlValidatorException: Object 'mysql.log'
>>> not found
>>> 
>>> 
>>> and when I type show files;, it shows the mysql.log file is in the
>> /Users/tsd
>>> directory:
>>> 
>>> 
>>> "log": {
>>> "type": "log",
>>> "extensions": [
>>> "log"
>>> ],
>>> "fieldNames": [
>>> "date",
>>> "time",
>>> "pid",
>>> "action",
>>> "query"
>>> ],
>>> "pattern": "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)"
>>> }
>> 
>> 

Reply via email to