Hi Vincent,

The log (regex) plugin uses newlines a the delimiter between records and so it 
cannot currently handle newlines within a record. That is, the plugin really 
only works for single-line messages, or cases in which we want to ignore all 
but the header line (say).

If you are up for a Java coding effort, you could modify the plugin to take 
another config parameter which is the record delimiter. (The text (CSV) plugin 
already does this.) You would need a unique marker that gives a context-free 
record split. The project would welcome such a contribution. If you made such 
an enhancement, you could have the plugin look for, say, double-newline as the 
record delimiter.

Recall that Drill works with HDFS files. Each scan operator may be given a 
block of a file. When reading the second or later block of a file, the reader 
must scan forward to find the start of the next record using the record 
delimiter.


For now, I'd suggest transforming your file to replace newlines with some other 
character, and replace any existing record delimiter with newline. Then you can 
use the log (regex) plugin.

That is:

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)

External [GLOBAL] macro [@PHASE_INPUT] registered OK


[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)

Reading Application Definition For [ACTUAL]

Becomes:

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)|External [GLOBAL] 
macro [@PHASE_INPUT] registered OK[Thu May  2 00:17:50 
2019]Local/ACTUAL///1/Info(1019008)|Reading Application Definition For [ACTUAL]


Maybe Charles has a better idea?

Thanks,
- Paul

 

    On Tuesday, July 23, 2019, 05:11:26 AM PDT, Vincent BENATIER 
<vbenat...@sp2.fr> wrote:  
 
 Hi all,

I was if the logfile plugin can handle multiline parsing ? 

When I try my regex syntax online, it works well but I seems that the
"\\r\\n" are note recognized when trying to configure a logfile plugin in
Apache Drill.
Or perhaps I there another way to do but I could not find anything in the
documentation or in the "Learning Apache Drill" book.

Someone could help ?

Vincent

Regex syntaxes I tried
--------------------------
"(\\[.+\\])(.+\\r\\n)(.+)"
"(\\[.+\\])(.+)(\\r\\n.+)"
"(\\[.+\\])(.+) \\r\\n (.+)"

File sample
--------------
[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)
External [GLOBAL] macro [@PHASE_INPUT] registered OK

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)
Reading Application Definition For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019009)
Reading Database Definition For [Actual]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019021)
Reading Database Mapping For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019010)
Writing Application Definition For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019011)
Writing Database Definition For [Actual]

  

Reply via email to