Re: Another grep question
On Tue, Feb 08, 2005 at 03:44:47AM +0100, Anthony Atkielski wrote: Giorgos Keramidas writes: GK It may not be related to what you are seeing, but grep(1) GK is locale-aware. What it considers a text character GK depends on the current locale settings. I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and en_US.ISO8859-1, with no effect. The character in question is an opening double quotation mark in the Windows character set. I want to find it in my Web pages and replace it by an appropriate HTML escape sequence. I know it's out there, but grep isn't finding it, or I'm not telling it how to find the character correctly. Ah -- well, the beauty of Unix is that if the first tool you think of doesn't do the job, then the next one probably will. You can use perl to match and replace arbitrary characters: % perl -pi.bak -e 's/\x93/ldquo;/g' foo.html Or you could go for the bulk method and run HTML tidy(1) over the file, which is usually pretty good at converting any-old HTML into something that will pass validation: (ports: www/tidy) http://www.w3c.org/People/Raggett/tidy/ (ports: www/tidy-devel) http://tidy.sourceforge.net/ Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 8 Dane Court Manor School Rd PGP: http://www.infracaninophile.co.uk/pgpkey Tilmanstone Tel: +44 1304 617253 Kent, CT14 0JL UK pgpr3uxbeO8IL.pgp Description: PGP signature
Re: Another grep question
On Monday 07 February 2005 05:56 pm, Anthony Atkielski wrote: Does anyone know why grep -R \0x93 /www/htdocs turns up only binary files, even when I know there are text files in the directory that contain this character? Is there something special about the way I specify the search string that causes grep to behave differently? When I specify an 8-bit character like this alone for a search, it finds only binary files, even though this is a text character--as if it is looking at the search string and deciding that I want to search only binary files. The man page doesn't seem to say anything about this. Is it my imagination? I made a text file named test.log containing: aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD grep -R \0x93 /www/htdocs aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD the result of: grep -R \0x93 test.log is: grep -R \0x93 /www/htdocs Maybe you should test again -Mike ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Another grep question
Anthony Atkielski wrote: Does anyone know why grep -R \0x93 /www/htdocs turns up only binary files, even when I know there are text files in the directory that contain this character? Is there something special about the way I specify the search string that causes grep to behave differently? When I specify an 8-bit character like this alone for a search, it finds only binary files, even though this is a text character--as if it is looking at the search string and deciding that I want to search only binary files. It may not be related to what you are seeing, but grep(1) is locale-aware. What it considers a text character depends on the current locale settings. In my account, which has a Greek locale setup, it will consider all Greek 8-bit characters as text. This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, do not forward this email to any other person, delete this e-mail and destroy all copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Another grep question
Giorgos Keramidas writes: GK It may not be related to what you are seeing, but grep(1) GK is locale-aware. What it considers a text character GK depends on the current locale settings. I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and en_US.ISO8859-1, with no effect. The character in question is an opening double quotation mark in the Windows character set. I want to find it in my Web pages and replace it by an appropriate HTML escape sequence. I know it's out there, but grep isn't finding it, or I'm not telling it how to find the character correctly. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Another grep question
Michael C. Shultz writes: I made a text file named test.log containing: aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD grep -R \0x93 /www/htdocs aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD aj[[CFPWJJVCVMLKFD the result of: grep -R \0x93 test.log is: grep -R \0x93 /www/htdocs Maybe you should test again I'm looking for the hex character 93, which is an opening double quotation mark in the Windows character set, not the literal string \0x93. Unless I'm mistaken, \0x93 in a regular expression means the character whose hex value is 93. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Another grep question
On 2005-02-08 03:49, Anthony Atkielski [EMAIL PROTECTED] wrote: I'm looking for the hex character 93, which is an opening double quotation mark in the Windows character set, not the literal string \0x93. Unless I'm mistaken, \0x93 in a regular expression means the character whose hex value is 93. Not really. Unless you have a shell that understands this sort of thing and expands the command line arguments to arbitrary 8-bit characters. Otherwise, \0x93 means: A literal (escaped with '\') '0' character, followed by 'x', then followed by '9' and '3'. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]