Re: help me with this sed expression

2004-01-05 Thread Gautam Gopalakrishnan
On Tue, Jan 06, 2004 at 01:45:04PM +1030, Malcolm Kay wrote:
> On Tue, 6 Jan 2004 12:50, Gautam Gopalakrishnan wrote:
> > On Tue, Jan 06, 2004 at 12:30:42PM +1030, Malcolm Kay wrote:
> > > On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> > > > Hello. I've worked an hour to figure out a serial of sed command to
> > > > process some text (without any luck, you kown I'm kinda newbie). I
> > > > really appreciate your help.
> > > >
> > > > The original text file is in this form -- for each line:
> > > > one Chinese word then one or two English word seperated by space.
> > > >
> > > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first
> > > > \(.*\) is too greedy and included the rest [a-z].
> > >
> > > Well the greedy part is easily fixed with:
> > >   s/\([^a-z]*\)\([a-z]*\)/\2 \1/
> > >
> > > But this will not work for those lines with 2 english words. The
> > > following should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e
> > > 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target
> >
> > I think awk is easier:
> >
> > awk '{print $2 " " $3 " " $1}' original | tr -s > target
> 
> I'm not really very familiar with awk, but I must say this
> is a much simpler and rather magical solution.
> 
> How does awk know which part of the original line goes into $1, $2 and $3.
> (You will notice there is no space between the chinese and english words).
> 

It does not.  I did not read the earlier mail properly. But there
is an easier way than all those regexes: Prefix the first a-z char
with a space and use awk.

sed -e 's/\([a-z]\)/ \1/' | awk '{print $2" "$1} NF==3 {print $3" "$1}'

Gautam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Malcolm Kay
On Tue, 6 Jan 2004 12:50, Gautam Gopalakrishnan wrote:
> On Tue, Jan 06, 2004 at 12:30:42PM +1030, Malcolm Kay wrote:
> > On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> > > Hello. I've worked an hour to figure out a serial of sed command to
> > > process some text (without any luck, you kown I'm kinda newbie). I
> > > really appreciate your help.
> > >
> > > The original text file is in this form -- for each line:
> > > one Chinese word then one or two English word seperated by space.
> > >
> > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first
> > > \(.*\) is too greedy and included the rest [a-z].
> >
> > Well the greedy part is easily fixed with:
> >   s/\([^a-z]*\)\([a-z]*\)/\2 \1/
> >
> > But this will not work for those lines with 2 english words. The
> > following should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e
> > 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target
>
> I think awk is easier:
>
> awk '{print $2 " " $3 " " $1}' original | tr -s > target

I'm not really very familiar with awk, but I must say this
is a much simpler and rather magical solution.

How does awk know which part of the original line goes into $1, $2 and $3.
(You will notice there is no space between the chinese and english words).

I am also mystified how it generates two lines

  a 
  av 

from the input
  a av

Malcolm Kay
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Gautam Gopalakrishnan
On Tue, Jan 06, 2004 at 01:24:38PM +1100, Gautam Gopalakrishnan wrote:
> On Tue, Jan 06, 2004 at 01:20:52PM +1100, Gautam Gopalakrishnan wrote:
> > I think awk is easier:
> > 
> > awk '{print $2 " " $3 " " $1}' original | tr -s > target
> 
> Sorry, that must read:
>   awk '{print $2 " " $3 " " $1}' original | tr -s ' ' > target

So stupid of me. Just read the mail again...

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Gautam Gopalakrishnan
On Tue, Jan 06, 2004 at 01:20:52PM +1100, Gautam Gopalakrishnan wrote:
> I think awk is easier:
> 
> awk '{print $2 " " $3 " " $1}' original | tr -s > target

Sorry, that must read:
  awk '{print $2 " " $3 " " $1}' original | tr -s ' ' > target
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Gautam Gopalakrishnan
On Tue, Jan 06, 2004 at 12:30:42PM +1030, Malcolm Kay wrote:
> On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> > Hello. I've worked an hour to figure out a serial of sed command to process
> > some text (without any luck, you kown I'm kinda newbie). I really
> > appreciate your help.
> >
> > The original text file is in this form -- for each line:
> > one Chinese word then one or two English word seperated by space.
> >
> > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is
> > too greedy and included the rest [a-z].
> 
> Well the greedy part is easily fixed with:
>   s/\([^a-z]*\)\([a-z]*\)/\2 \1/
> 
> But this will not work for those lines with 2 english words. The following should:
> % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* 
> \([a-z]*\)/\2 \1/p' original > target


I think awk is easier:

awk '{print $2 " " $3 " " $1}' original | tr -s > target

Gautam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Malcolm Kay
On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to process
> some text (without any luck, you kown I'm kinda newbie). I really
> appreciate your help.
>
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
>
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English word
> following in the same line, repeat the Chinese word to satisfy 1).
>
> Define: Chinese word = one or more continous bytes of data where each byte
> is greater then 128 in value. (it is true in GB2312 Chinese charset which
> this email is written in.)
> Define: English word = one or more continous bytes of [a-z].
>
> Say, for the original file:
> ===
> 一a av
> 可歌可泣aaav
> 无可奉告aacm
> ===
> The target file should be:
> ===
> a 一
> av 一
> aaav 可歌可泣
> aacm 无可奉告
> ===
>
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is
> too greedy and included the rest [a-z].

Well the greedy part is easily fixed with:
  s/\([^a-z]*\)\([a-z]*\)/\2 \1/

But this will not work for those lines with 2 english words. The following should:
% sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 
\1/p' original > target

Malcolm Kay

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help me with this sed expression

2004-01-05 Thread Matthew Seaman
On Mon, Jan 05, 2004 at 07:49:43PM +0800, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to process 
> some text (without any luck, you kown I'm kinda newbie). I really 
> appreciate your help.
> 
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
> 
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word 
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English word 
> following in the same line, repeat the Chinese word to satisfy 1).
> 
> Define: Chinese word = one or more continous bytes of data where each byte 
> is greater then 128 in value. (it is true in GB2312 Chinese charset which 
> this email is written in.)
> Define: English word = one or more continous bytes of [a-z].
> 
> Say, for the original file:
> ===
> ??a av
> aaav
> aacm
> ===
> The target file should be:
> ===
> a ??
> av ??
> aaav 
> aacm 
> ===
> 
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is 
> too greedy and included the rest [a-z].

Dunno about sed(1) but you could do the job like this:

perl -ne '($c, $e) = m/^([\x{81}-\x{ff}]+)([a-z ]+)\z/; foreach $x (split / /, $e) 
{  print "$c $x\n"; }'  filename

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


help me with this sed expression

2004-01-05 Thread Zhang Weiwu
Hello. I've worked an hour to figure out a serial of sed command to process 
some text (without any luck, you kown I'm kinda newbie). I really 
appreciate your help.

The original text file is in this form -- for each line:
one Chinese word then one or two English word seperated by space.
I wish to change to:
1) target file: one English word, then a space, then a Chinese word 
coorisponding to that English word.
2) if in the original file one Chinese word has more than one English word 
following in the same line, repeat the Chinese word to satisfy 1).

Define: Chinese word = one or more continous bytes of data where each byte 
is greater then 128 in value. (it is true in GB2312 Chinese charset which 
this email is written in.)
Define: English word = one or more continous bytes of [a-z].

Say, for the original file:
===
一a av
可歌可泣aaav
无可奉告aacm
===
The target file should be:
===
a 一
av 一
aaav 可歌可泣
aacm 无可奉告
===
I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is 
too greedy and included the rest [a-z].

Thank you.

_
免费下载 MSN Explorer:   http://explorer.msn.com/lccn  

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"