Re: Concatenate adjacent lines with hadoop

Azuryy Yu Tue, 26 Feb 2013 18:40:23 -0800

That's easy, in your example,

Map output key: FIELD-N ; Map output value: just original value.
In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
first log entry. if not, this is a splitted log entry. just get a sub
string and concat with the first log entry.


Am I explain clearly?



On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[email protected]>wrote:

> Hi
>
> Please find below the issue I need to solve. Thank you in advance for your
> help/ tips.
>
> I have log files where sometimes log lines are splited (this happens when
> the log line exceeds a specific length)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
> reduce job?
>
> On other words, using a map reduce job, can I concatenate the 2 following
> adjacent lines (provided that I 'detect' them)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> into
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Thank you!
>

Re: Concatenate adjacent lines with hadoop

Reply via email to