Re: [SLUG] I wish to lowercase a character in a sed script

2005-03-16 Thread Angus Lees
At Mon, 14 Mar 2005 17:57:08 +1100, Michael Lake wrote:
 Suggested a perl script:
 cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_
 =~ s/\.html/\.html#$name/; print $_;'

equivalent to:
 perl -pe 's/\.html([^,]+),/.html#\L$1\E/'  titles.html

(sorry, should have paid attention earlier :P )

-- 
 - Gus

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] I wish to lowercase a character in a sed script

2005-03-13 Thread Daniel Bush
Michael Lake wrote:
Hi all
I have a titles.html file from someone that has several hundred 
authors listed in a table. e.g. trtd class=col1a 
href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td

At present the above link goes to the top of that file (the contents 
of that journal issue) but I want the link to directly go to the 
authors article in that directory. There are already name anchors in 
the file but they are lower case such as: a name=agrawal/a

The script below will take extract the authors name from after the 
link so that a href=111_12.htmlAgrawal, B.M becomes
a href=111_12.html#AgrawalAgrawal, B.M

but the name anchors in the many journal files are all lower case like 
this:
a name=agrawal/a

thus my links don't work.
#!/bin/bash
# trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, 
Virendra/a/td
# trtd class=col1a href=111_12.html#AgrawalAgrawal, B.M. and 
Kumar, Virendra/a/td
cat titles.html | sed 's/col1a 
href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\2\2,/' 
 test.html

How can I lower case the anchors i.e. #Agrawal to #agrawal? I know 
that tr can do that but the above is in a sed script adn I can't use 
tr there.
sed does not have a lower function.
Maybe I have to do in two passes somehow?

Ouch.  Do you have to use sed?
If you have perl installed, you could replace
   sed '...'
with
   perl -ne '...'
and you could probably solve the problem with something like:
   cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_ 
=~ s/\.html/\.html#$name/; print $_;'
That may be clumsy by perl standards, but I think it works at least if 
you have one instance per line in the html file.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] I wish to lowercase a character in a sed script

2005-03-13 Thread SEKINE Tatsuo
Hi

From: Daniel Bush [EMAIL PROTECTED]
Date: Sun, 13 Mar 2005 22:35:48 +1100

 #!/bin/bash
 # trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, 
 Virendra/a/td
 # trtd class=col1a href=111_12.html#AgrawalAgrawal, B.M. and 
 Kumar, Virendra/a/td
 cat titles.html | sed 's/col1a 
 href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\2\2,/' 
  test.html

 How can I lower case the anchors i.e. #Agrawal to #agrawal? I know 
 that tr can do that but the above is in a sed script adn I can't use 
 tr there.
 sed does not have a lower function.
 Maybe I have to do in two passes somehow?

 
 Ouch.  Do you have to use sed?
 If you have perl installed, you could replace
(snip)

sed is not useful for this purpose, but the following script
may work with GNU sed

#!/bin/bash
# trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, 
Virendra/a/td
# trtd class=col1a href=111_12.html#agrawalAgrawal, B.M. and Kumar, 
Virendra/a/td

cat titles.html | sed -n '
/col1a href=/ {
# copy whole line to hold space
h
# pick up the letter
s/.*col1a href=[^]*.html\(.\).*/\1/
y/[A-Z]/[a-z]/
# add it to the end of hold space
H
# retrieve hold space
x
# construct line
s/\(.*col1a 
href=[^]*.html\)\(.\)\([^,]*\)\(,.*\)\(.\)$/\1#\5\3\2\3\4/
P
b
}
p'  test.html

 sed does not have a lower function.

GNU sed have y command.
If you want to use XPG4/POSIX correct sed, please use
  s/A/a/; s/B/b/; ...; s/Z/z/;
as substitute for
  y/[A-z]/[a-z]/

-- 
SEKINE Tatsuo:
 [EMAIL PROTECTED]System Design  Research Inst. Co.,Ltd.
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


RE: [SLUG] I wish to lowercase a character in a sed script

2005-03-13 Thread Roger Barnes
 #!/bin/bash
 # trtd class=col1a href=111_12.htmlAgrawal, B.M. 
 and Kumar, Virendra/a/td # trtd class=col1a 
 href=111_12.html#AgrawalAgrawal, B.M. and Kumar, 
 Virendra/a/td cat titles.html | sed 's/col1a 
 href=\(.*\)\.html\([A-Z][a-z]*\),/col1a 
 href=\1.html#\2\2,/'  test.html
 
 How can I lower case the anchors i.e. #Agrawal to #agrawal? 
 I know that tr can do that but the above is in a sed script 
 adn I can't use tr there.
 sed does not have a lower function.
 Maybe I have to do in two passes somehow?

Can you not simplay add \l (force next element to lowercase) in your 
replacement?

Eg (untested) ...

cat titles.html | sed 's/col1a 
href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\l\2\2,/'  
test.html

- Rog
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] I wish to lowercase a character in a sed script

2005-03-13 Thread SEKINE Tatsuo
From: Roger Barnes [EMAIL PROTECTED]
Date: Mon, 14 Mar 2005 08:35:23 +1100

 Can you not simplay add \l (force next element to lowercase) in your 
 replacement?
 
 Eg (untested) ...
 
 cat titles.html | sed 's/col1a 
 href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\l\2\2,/'  
 test.html

It may work with GNU sed version 4.x(or later).
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] I wish to lowercase a character in a sed script

2005-03-13 Thread Michael Lake
Thank you all
Here is a summary of how it all went.
1. Roger Barnes:
Can you not simplay add \l (force next element to lowercase) in your
replacement?
Yes - its works fine. My version of sed is the GNU one 4.1.2. The /l 
works perfectly. Its not mentioned in the man pages for sed but it is 
mentioned in the info pages which I abhor :-)

3. Daniel Bush
Suggested a perl script:
cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_ =~ 
s/\.html/\.html#$name/; print $_;'

Yep that worked too. I tend to use perl for web stuff and don't do 
enough one liners like the above. It's neat.

3. Tatsuo Sekine
Suggested to grab the letter, then transliterate (what a wonderful word) 
it using y/[A-Z]/[a-z]/, then append it to the end of this thing called 
hold space with a H and then swap the contents of the hold apace and the 
pattern space using the x (exchange operator).
Well I have certainly learn something there :-)
Oh it worked fine too.

Caveats: In the html authors file there were some hyphenated names like 
Aldrich-Wright and some names with blanks like De Deckker. Each of the 
above methods, coupled with my the regex I use results in a few things 
to fix manually, but its only about 6 to do.
In the titles file there was one author per table row and
cat titles | grep 'tr' | wc -l
showed there were 541 rows.

You have saved me hours of work and I have some sed and perl snippets to 
save away in my HTML help file I :-)
Thanks all.

--
Michael Lake
Chemistry, Materials  Forensic Science, UTS
Ph: 9514 1725 Fx: 9514 1460
[pls ignore idiot lawyer's msg below]

--
UTS CRICOS Provider Code:  00099F
DISCLAIMER: This email message and any accompanying attachments may contain
confidential information.  If you are not the intended recipient, do not
read, use, disseminate, distribute or copy this message or attachments.  If
you have received this message in error, please notify the sender immediately
and delete this message. Any views expressed in this message are those of the
individual sender, except where the sender expressly, and with authority,
states them to be the views the University of Technology Sydney. Before
opening any attachments, please check them for viruses and defects.
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html