Re: Another grep question

2005-02-08 Thread Matthew Seaman
On Tue, Feb 08, 2005 at 03:44:47AM +0100, Anthony Atkielski wrote:
 Giorgos Keramidas writes:
 
 GK It may not be related to what you are seeing, but grep(1)
 GK is locale-aware.  What it considers a text character
 GK depends on the current locale settings.
 
 I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and
 en_US.ISO8859-1, with no effect.  The character in question is an
 opening double quotation mark in the Windows character set.  I want to
 find it in my Web pages and replace it by an appropriate HTML escape
 sequence.  I know it's out there, but grep isn't finding it, or I'm not
 telling it how to find the character correctly.

Ah -- well, the beauty of Unix is that if the first tool you think of
doesn't do the job, then the next one probably will.

You can use perl to match and replace arbitrary characters:

% perl -pi.bak -e 's/\x93/ldquo;/g' foo.html

Or you could go for the bulk method and run HTML tidy(1) over the
file, which is usually pretty good at converting any-old HTML into
something that will pass validation:

(ports: www/tidy)   http://www.w3c.org/People/Raggett/tidy/
(ports: www/tidy-devel) http://tidy.sourceforge.net/

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   8 Dane Court Manor
  School Rd
PGP: http://www.infracaninophile.co.uk/pgpkey Tilmanstone
Tel: +44 1304 617253  Kent, CT14 0JL UK


pgpr3uxbeO8IL.pgp
Description: PGP signature


Re: Another grep question

2005-02-07 Thread Michael C. Shultz
On Monday 07 February 2005 05:56 pm, Anthony Atkielski wrote:
 Does anyone know why

 grep -R \0x93 /www/htdocs

 turns up only binary files, even when I know there are text files in
 the directory that contain this character?  Is there something
 special about the way I specify the search string that causes grep to
 behave differently?  When I specify an 8-bit character like this
 alone for a search, it finds only binary files, even though this is a
 text character--as if it is looking at the search string and deciding
 that I want to search only binary files.

 The man page doesn't seem to say anything about this.  Is it my
 imagination?

I made a text file named test.log containing:

aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
grep -R \0x93 /www/htdocs
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD
aj[[CFPWJJVCVMLKFD

the result of:

 grep -R \0x93 test.log

is:

grep -R \0x93 /www/htdocs


Maybe you should test again

-Mike



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Another grep question

2005-02-07 Thread Giorgos Keramidas
Anthony Atkielski wrote:
 Does anyone know why
 
 grep -R \0x93 /www/htdocs
 
 turns up only binary files, even when I know there are text
 files in the directory that contain this character?  Is
 there something special about the way I specify the search
 string that causes grep to behave differently?  When I
 specify an 8-bit character like this alone for a
 search, it finds only binary files, even though this is a text
 character--as if it is looking at the search string and
 deciding that I want to search only binary files.

It may not be related to what you are seeing, but grep(1)
is locale-aware.  What it considers a text character
depends on the current locale settings.

In my account, which has a Greek locale setup, it will
consider all Greek 8-bit characters as text.



This e-mail and any attachments may contain confidential and
privileged information. If you are not the intended recipient,
please notify the sender immediately by return e-mail, do not forward 
this email to any other person, delete this
e-mail and destroy all copies. Any dissemination or use of this
information by a person other than the intended recipient is
unauthorized and may be illegal.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Another grep question

2005-02-07 Thread Anthony Atkielski
Giorgos Keramidas writes:

GK It may not be related to what you are seeing, but grep(1)
GK is locale-aware.  What it considers a text character
GK depends on the current locale settings.

I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and
en_US.ISO8859-1, with no effect.  The character in question is an
opening double quotation mark in the Windows character set.  I want to
find it in my Web pages and replace it by an appropriate HTML escape
sequence.  I know it's out there, but grep isn't finding it, or I'm not
telling it how to find the character correctly.

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Another grep question

2005-02-07 Thread Anthony Atkielski
Michael C. Shultz writes:

 I made a text file named test.log containing:

 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 grep -R \0x93 /www/htdocs
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD
 aj[[CFPWJJVCVMLKFD

 the result of:

  grep -R \0x93 test.log

 is:

 grep -R \0x93 /www/htdocs


 Maybe you should test again

I'm looking for the hex character 93, which is an opening double
quotation mark in the Windows character set, not the literal string
\0x93.  Unless I'm mistaken, \0x93 in a regular expression means the
character whose hex value is 93.

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Another grep question

2005-02-07 Thread Giorgos Keramidas
On 2005-02-08 03:49, Anthony Atkielski [EMAIL PROTECTED] wrote:
 I'm looking for the hex character 93, which is an opening double
 quotation mark in the Windows character set, not the literal string
 \0x93.  Unless I'm mistaken, \0x93 in a regular expression means
 the character whose hex value is 93.

Not really.  Unless you have a shell that understands this sort of
thing and expands the command line arguments to arbitrary 8-bit
characters.

Otherwise, \0x93 means:

A literal (escaped with '\') '0' character,
followed by 'x', then
followed by '9' and '3'.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]