\begin{Bernhard L�der}
> I need to convert 2 lines of data to one.
> 
> This is the data I can extract from the data base:
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata2,Flexibledata3,Flexibledata4,Flexibledata5,Flexibled
> ata6,Flexibledata7,Neededdata2,Neededdata3,Flexibledata8
> 
> This is the data I want to use in a comma separated from:
> Neededdata1,Neededdata2,Neededdata3
> 
> How can I strip off the FIXEDdata and the Flexibledata so I only keep the
> Neededdata?
> 
> There is always multiple double lines of data with some single lines, like:
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata2,Flexibledata3,Flexibledata4,Flexibledata5,Flexibled
> ata6,Flexibledata7,Neededdata2,Neededdata3,Flexibledata8
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata2,Flexibledata3,Flexibledata4,Flexibledata5,Flexibled
> ata6,Flexibledata7,Neededdata2,Neededdata3,Flexibledata8
> FIXEDdata1,Flexibledata1,Neededdata1
> FIXEDdata1,Flexibledata2,Flexibledata3,Flexibledata4,Flexibledata5,Flexibled
> ata6,Flexibledata7,Neededdata2,Neededdata3,Flexibledata8
> .........
> 
> where do I start?

right, first problem is easy: uniq(1)

you could do that yourself in the scripting language, but its not worth it
unless you're worried about the overhead of execing another process.

(uniq only removes adjacent duplicate lines, so if you need to remove
duplicate lines wherever they appear use "sort | uniq")


joining two lines:

sed is no good for multiple line work. you can do it, but its just silly:

uniq | sed -ne 
's/^\([^,]*,\)\{2\}\([^,]*\)$/\2/;h;n;s/^\([^,]*,\)\{7\}\([^,]*,[^,]*\),.*$/,\2/;H;x;s/\\
//p'
(note the "embedded newline". you could make this a whole lot more legible
by writing the statements on multiple lines, but that's just rearranging
deck chairs)

as soon as you have delimited fields, you should be thinking awk (or perl,
etc). especially when you need to keep "state" between lines.

in awk, "NR" is the record (line) number and "$1,$2,$3.." are the individual
fields in the current record ($0 is the whole record).

so:

uniq | awk -F, 'BEGIN {OFS=","} NR%2 == 1 { x = $3 } NR%2 == 0 { print x, $8, $9 }'


just for comparison, perl is fairly similar:

uniq | perl -anl -F, -e '$,=","; if ($.%2) { $x=$F[2] } else { print $x,$F[7],$F[8] }'

or even:

uniq | perl -apl -F, -e 'undef$\;$_=$.%2?"$F[2],":"$F[7],$F[8]\n"'

(who said line noise? ;)

-- 
 - Gus
-- 
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to