Re: [SLUG] I wish to lowercase a character in a sed script
At Mon, 14 Mar 2005 17:57:08 +1100, Michael Lake wrote: Suggested a perl script: cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_ =~ s/\.html/\.html#$name/; print $_;' equivalent to: perl -pe 's/\.html([^,]+),/.html#\L$1\E/' titles.html (sorry, should have paid attention earlier :P ) -- - Gus -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] I wish to lowercase a character in a sed script
Michael Lake wrote: Hi all I have a titles.html file from someone that has several hundred authors listed in a table. e.g. trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td At present the above link goes to the top of that file (the contents of that journal issue) but I want the link to directly go to the authors article in that directory. There are already name anchors in the file but they are lower case such as: a name=agrawal/a The script below will take extract the authors name from after the link so that a href=111_12.htmlAgrawal, B.M becomes a href=111_12.html#AgrawalAgrawal, B.M but the name anchors in the many journal files are all lower case like this: a name=agrawal/a thus my links don't work. #!/bin/bash # trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td # trtd class=col1a href=111_12.html#AgrawalAgrawal, B.M. and Kumar, Virendra/a/td cat titles.html | sed 's/col1a href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\2\2,/' test.html How can I lower case the anchors i.e. #Agrawal to #agrawal? I know that tr can do that but the above is in a sed script adn I can't use tr there. sed does not have a lower function. Maybe I have to do in two passes somehow? Ouch. Do you have to use sed? If you have perl installed, you could replace sed '...' with perl -ne '...' and you could probably solve the problem with something like: cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_ =~ s/\.html/\.html#$name/; print $_;' That may be clumsy by perl standards, but I think it works at least if you have one instance per line in the html file. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] I wish to lowercase a character in a sed script
Hi From: Daniel Bush [EMAIL PROTECTED] Date: Sun, 13 Mar 2005 22:35:48 +1100 #!/bin/bash # trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td # trtd class=col1a href=111_12.html#AgrawalAgrawal, B.M. and Kumar, Virendra/a/td cat titles.html | sed 's/col1a href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\2\2,/' test.html How can I lower case the anchors i.e. #Agrawal to #agrawal? I know that tr can do that but the above is in a sed script adn I can't use tr there. sed does not have a lower function. Maybe I have to do in two passes somehow? Ouch. Do you have to use sed? If you have perl installed, you could replace (snip) sed is not useful for this purpose, but the following script may work with GNU sed #!/bin/bash # trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td # trtd class=col1a href=111_12.html#agrawalAgrawal, B.M. and Kumar, Virendra/a/td cat titles.html | sed -n ' /col1a href=/ { # copy whole line to hold space h # pick up the letter s/.*col1a href=[^]*.html\(.\).*/\1/ y/[A-Z]/[a-z]/ # add it to the end of hold space H # retrieve hold space x # construct line s/\(.*col1a href=[^]*.html\)\(.\)\([^,]*\)\(,.*\)\(.\)$/\1#\5\3\2\3\4/ P b } p' test.html sed does not have a lower function. GNU sed have y command. If you want to use XPG4/POSIX correct sed, please use s/A/a/; s/B/b/; ...; s/Z/z/; as substitute for y/[A-z]/[a-z]/ -- SEKINE Tatsuo: [EMAIL PROTECTED]System Design Research Inst. Co.,Ltd. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
RE: [SLUG] I wish to lowercase a character in a sed script
#!/bin/bash # trtd class=col1a href=111_12.htmlAgrawal, B.M. and Kumar, Virendra/a/td # trtd class=col1a href=111_12.html#AgrawalAgrawal, B.M. and Kumar, Virendra/a/td cat titles.html | sed 's/col1a href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\2\2,/' test.html How can I lower case the anchors i.e. #Agrawal to #agrawal? I know that tr can do that but the above is in a sed script adn I can't use tr there. sed does not have a lower function. Maybe I have to do in two passes somehow? Can you not simplay add \l (force next element to lowercase) in your replacement? Eg (untested) ... cat titles.html | sed 's/col1a href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\l\2\2,/' test.html - Rog -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] I wish to lowercase a character in a sed script
From: Roger Barnes [EMAIL PROTECTED] Date: Mon, 14 Mar 2005 08:35:23 +1100 Can you not simplay add \l (force next element to lowercase) in your replacement? Eg (untested) ... cat titles.html | sed 's/col1a href=\(.*\)\.html\([A-Z][a-z]*\),/col1a href=\1.html#\l\2\2,/' test.html It may work with GNU sed version 4.x(or later). -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] I wish to lowercase a character in a sed script
Thank you all Here is a summary of how it all went. 1. Roger Barnes: Can you not simplay add \l (force next element to lowercase) in your replacement? Yes - its works fine. My version of sed is the GNU one 4.1.2. The /l works perfectly. Its not mentioned in the man pages for sed but it is mentioned in the info pages which I abhor :-) 3. Daniel Bush Suggested a perl script: cat titles.html | perl -ne 'm/\.html([^,]{1,}),/; $name=lc($1); $_ =~ s/\.html/\.html#$name/; print $_;' Yep that worked too. I tend to use perl for web stuff and don't do enough one liners like the above. It's neat. 3. Tatsuo Sekine Suggested to grab the letter, then transliterate (what a wonderful word) it using y/[A-Z]/[a-z]/, then append it to the end of this thing called hold space with a H and then swap the contents of the hold apace and the pattern space using the x (exchange operator). Well I have certainly learn something there :-) Oh it worked fine too. Caveats: In the html authors file there were some hyphenated names like Aldrich-Wright and some names with blanks like De Deckker. Each of the above methods, coupled with my the regex I use results in a few things to fix manually, but its only about 6 to do. In the titles file there was one author per table row and cat titles | grep 'tr' | wc -l showed there were 541 rows. You have saved me hours of work and I have some sed and perl snippets to save away in my HTML help file I :-) Thanks all. -- Michael Lake Chemistry, Materials Forensic Science, UTS Ph: 9514 1725 Fx: 9514 1460 [pls ignore idiot lawyer's msg below] -- UTS CRICOS Provider Code: 00099F DISCLAIMER: This email message and any accompanying attachments may contain confidential information. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. Any views expressed in this message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views the University of Technology Sydney. Before opening any attachments, please check them for viruses and defects. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html