Thanks Charles, I got it working now. Thanks again for your great help. IDoor
On Wed, Nov 14, 2018 at 11:49 AM Charles Givre <cgi...@gmail.com> wrote: > Hi Idoor, > The regex in the example is the pattern which matches MySQL logs. What you > have to do is write a regex that maps your data and extracts the fields. > Just eyeballing it you might have something like: > (\d{2} \w{3} \d{4}) (\d{2}:\d{2}:\d{2})\s\([.+?\])\s(.+?)\[(.+?)\] > > (Note: you’ll have to add an additional \ before every slash) > > This regex has 5 capturing groups, so you’ll need to define 5 fields in > the schema section of the format plugin. I would test this out on > regexpal.com <http://regexpal.com/> or regex101.com <http://regex101.com/> > and see if it works. > —C > > > > On Nov 14, 2018, at 11:40, idoor do <idoorla...@gmail.com> wrote: > > > > Hi Charles, > > > > Thanks for your help, now I got it working the mysql log file, but I have > > issues with a different log file format like this: > > > > 01 Oct 2018 09:30:32 [ID# ] - Query Request [ Datasource : tydy ] > > > > So the eventDate is 01 Oct 2018 > > the eventTime is 09:30:32 > > the PID as string is [ID# ] > > action as string is - Query Request > > query as string is . [ Datasource : tydy ] > > > > so how the log plugin knows the boundaries among all the neighboring > > fields? right now I got > > *Error: PARSE ERROR: Too many errors. Max error threshold exceeded.* > > in the sqline.log, it said Unmatached line: 01 Oct 2018 09:30:33 [ID# ] - > > Query Request [ Datasource : tydy ] > > > > The config I am using is as follows: > > > > "log": { > > "type": "logRegex", > > "regex": > > "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)", > > "extension": "log", > > "maxErrors": 10, > > "schema": [ > > { > > "fieldName": "eventDate", > > "fieldType": "DATE", > > "format": "dd MMM yyyy" > > }, > > { > > "fieldName": "eventTime", > > "fieldType": "TIME", > > "format": "HH:mm:ss" > > }, > > { > > "fieldName": "PID" > > }, > > { > > "fieldName": "action" > > }, > > { > > "fieldName": "query" > > } > > ] > > } > > > > Thanks very much for your help. > > Idoor > > > > On Wed, Nov 14, 2018 at 11:01 AM Charles Givre <cgi...@gmail.com> wrote: > > > >> Hi idoor, > >> For some reason the documentation for this is an old and incorrect > >> version. Here is a link to the correct documentation: > >> > >> > >> > https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md > >> < > >> > https://github.com/cgivre/drill/blob/24556d857cbbe7aa2baa1fc6cbd85fb614b5d975/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md > >>> > >> > >> It’s actually a lot easier… > >> — C > >> > >>> On Nov 14, 2018, at 10:53, idoor do <idoorla...@gmail.com> wrote: > >>> > >>> Could somebody help me with this issue ? I have been stuck on this > issue > >>> for a couple of days. > >>> > >>> > >>> > >>> Thanks > >>> > >>> > >>> > >>> > >>> > >>> I installed drill-logfile-plugin-1.0.0 JAR file to > >>> <drill_install>/jars/3rdParty/ directory, and configured dfs as the > >>> following, but I got error: "Please retry: error (invalid JSON > mapping)", > >>> in the sqlline.log file, it shows an error: Unable to find constructor > >> for > >>> storage config named 'log' of type > >>> 'org.apache.drill.exec.store.log.LogFormatPlugin$LogFormatConfig, but I > >>> double checked the drill-logfile-plugin-1.0.0.jar file is in the > >>> jars/3rdParty folder: > >>> > >>> My config for dfs with log plugin suport is: > >>> { > >>> "type": "file", > >>> "connection": "file:///", > >>> "config": null, > >>> "workspaces": { > >>> "root": { > >>> "location": "/", > >>> "writable": false, > >>> "defaultInputFormat": null, > >>> "allowAccessOutsideWorkspace": false > >>> }, > >>> "test": { > >>> "location": "/Users/tsd", > >>> "writable": false, > >>> "defaultInputFormat": null, > >>> "allowAccessOutsideWorkspace": false > >>> }, > >>> "tmp": { > >>> "location": "/tmp", > >>> "writable": true, > >>> "defaultInputFormat": null, > >>> "allowAccessOutsideWorkspace": false > >>> } > >>> }, > >>> "formats": { > >>> "log" : { > >>> "type" : "log", > >>> "extensions" : [ "log" ], > >>> "fieldNames" : [ "date", "time", "pid", "action", "query" ], > >>> "dataTypes" : [ "DATE", "TIME", "INT", "VARCHAR", "VARCHAR" ], > >>> "dateFormat" : "yyMMdd", > >>> "timeFormat" : "HH:mm:ss", > >>> "pattern" : > >> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)", > >>> "errorOnMismatch" : false > >>> } > >>> }, > >>> "enabled": true > >>> } > >>> > >>> If I configured the log section as this to remove some fields, the > error > >>> will disappear, but some fields will be missing, and the query: > >>> > >>> select * from `mysql.log` limit 10; returns error: ERROR > >>> o.a.calcite.runtime.CalciteException - > >>> org.apache.calcite.sql.validate.SqlValidatorException: Object > 'mysql.log' > >>> not found > >>> > >>> > >>> and when I type show files;, it shows the mysql.log file is in the > >> /Users/tsd > >>> directory: > >>> > >>> > >>> "log": { > >>> "type": "log", > >>> "extensions": [ > >>> "log" > >>> ], > >>> "fieldNames": [ > >>> "date", > >>> "time", > >>> "pid", > >>> "action", > >>> "query" > >>> ], > >>> "pattern": > "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)" > >>> } > >> > >> > >