Hi Idoor,
The regex in the example is the pattern which matches MySQL logs. What you have
to do is write a regex that maps your data and extracts the fields. Just
eyeballing it you might have something like:
(\d{2} \w{3} \d{4}) (\d{2}:\d{2}:\d{2})\s\([.+?\])\s(.+?)\[(.+?)\]
(Note: you’ll have to add an additional \ before every slash)
This regex has 5 capturing groups, so you’ll need to define 5 fields in the
schema section of the format plugin. I would test this out on regexpal.com
<http://regexpal.com/> or regex101.com <http://regex101.com/> and see if it
works.
—C
> On Nov 14, 2018, at 11:40, idoor do <[email protected]> wrote:
>
> Hi Charles,
>
> Thanks for your help, now I got it working the mysql log file, but I have
> issues with a different log file format like this:
>
> 01 Oct 2018 09:30:32 [ID# ] - Query Request [ Datasource : tydy ]
>
> So the eventDate is 01 Oct 2018
> the eventTime is 09:30:32
> the PID as string is [ID# ]
> action as string is - Query Request
> query as string is . [ Datasource : tydy ]
>
> so how the log plugin knows the boundaries among all the neighboring
> fields? right now I got
> *Error: PARSE ERROR: Too many errors. Max error threshold exceeded.*
> in the sqline.log, it said Unmatached line: 01 Oct 2018 09:30:33 [ID# ] -
> Query Request [ Datasource : tydy ]
>
> The config I am using is as follows:
>
> "log": {
> "type": "logRegex",
> "regex":
> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)",
> "extension": "log",
> "maxErrors": 10,
> "schema": [
> {
> "fieldName": "eventDate",
> "fieldType": "DATE",
> "format": "dd MMM yyyy"
> },
> {
> "fieldName": "eventTime",
> "fieldType": "TIME",
> "format": "HH:mm:ss"
> },
> {
> "fieldName": "PID"
> },
> {
> "fieldName": "action"
> },
> {
> "fieldName": "query"
> }
> ]
> }
>
> Thanks very much for your help.
> Idoor
>
> On Wed, Nov 14, 2018 at 11:01 AM Charles Givre <[email protected]> wrote:
>
>> Hi idoor,
>> For some reason the documentation for this is an old and incorrect
>> version. Here is a link to the correct documentation:
>>
>>
>> https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
>> <
>> https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
>>>
>>
>> It’s actually a lot easier…
>> — C
>>
>>> On Nov 14, 2018, at 10:53, idoor do <[email protected]> wrote:
>>>
>>> Could somebody help me with this issue ? I have been stuck on this issue
>>> for a couple of days.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> I installed drill-logfile-plugin-1.0.0 JAR file to
>>> <drill_install>/jars/3rdParty/ directory, and configured dfs as the
>>> following, but I got error: "Please retry: error (invalid JSON mapping)",
>>> in the sqlline.log file, it shows an error: Unable to find constructor
>> for
>>> storage config named 'log' of type
>>> 'org.apache.drill.exec.store.log.LogFormatPlugin$LogFormatConfig, but I
>>> double checked the drill-logfile-plugin-1.0.0.jar file is in the
>>> jars/3rdParty folder:
>>>
>>> My config for dfs with log plugin suport is:
>>> {
>>> "type": "file",
>>> "connection": "file:///",
>>> "config": null,
>>> "workspaces": {
>>> "root": {
>>> "location": "/",
>>> "writable": false,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> },
>>> "test": {
>>> "location": "/Users/tsd",
>>> "writable": false,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> },
>>> "tmp": {
>>> "location": "/tmp",
>>> "writable": true,
>>> "defaultInputFormat": null,
>>> "allowAccessOutsideWorkspace": false
>>> }
>>> },
>>> "formats": {
>>> "log" : {
>>> "type" : "log",
>>> "extensions" : [ "log" ],
>>> "fieldNames" : [ "date", "time", "pid", "action", "query" ],
>>> "dataTypes" : [ "DATE", "TIME", "INT", "VARCHAR", "VARCHAR" ],
>>> "dateFormat" : "yyMMdd",
>>> "timeFormat" : "HH:mm:ss",
>>> "pattern" :
>> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)",
>>> "errorOnMismatch" : false
>>> }
>>> },
>>> "enabled": true
>>> }
>>>
>>> If I configured the log section as this to remove some fields, the error
>>> will disappear, but some fields will be missing, and the query:
>>>
>>> select * from `mysql.log` limit 10; returns error: ERROR
>>> o.a.calcite.runtime.CalciteException -
>>> org.apache.calcite.sql.validate.SqlValidatorException: Object 'mysql.log'
>>> not found
>>>
>>>
>>> and when I type show files;, it shows the mysql.log file is in the
>> /Users/tsd
>>> directory:
>>>
>>>
>>> "log": {
>>> "type": "log",
>>> "extensions": [
>>> "log"
>>> ],
>>> "fieldNames": [
>>> "date",
>>> "time",
>>> "pid",
>>> "action",
>>> "query"
>>> ],
>>> "pattern": "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)"
>>> }
>>
>>