Concatenate adjacent lines with hadoop

Matthieu Labour Tue, 26 Feb 2013 17:36:41 -0800

Hi

Please find below the issue I need to solve. Thank you in advance for your
help/ tips.


I have log files where sometimes log lines are splited (this happens when
the log line exceeds a specific length)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Can I "reconcile"/ "concatenate" splited log lines with a hadoop map reduce
job?

On other words, using a map reduce job, can I concatenate the 2 following
adjacent lines (provided that I 'detect' them)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

into

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Thank you!

Concatenate adjacent lines with hadoop

Reply via email to