That's easy, in your example, Map output key: FIELD-N ; Map output value: just original value. In the reduece: if there is LOGTAG<TAB> in the value, then this is the first log entry. if not, this is a splitted log entry. just get a sub string and concat with the first log entry.
Am I explain clearly? On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[email protected]>wrote: > Hi > > Please find below the issue I need to solve. Thank you in advance for your > help/ tips. > > I have log files where sometimes log lines are splited (this happens when > the log line exceeds a specific length) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > Can I "reconcile"/ "concatenate" splited log lines with a hadoop map > reduce job? > > On other words, using a map reduce job, can I concatenate the 2 following > adjacent lines (provided that I 'detect' them) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > into > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > Thank you! >
