On Tue, 04 Mar 2008 10:49:54 +0530, Pradnyesh Sawant wrote: > I have a file which contains chinese characters. I just want to find out > all the places that these chinese characters occur. > > The following script doesn't seem to work :( > > ********************************************************************** > class RemCh(object): > def __init__(self, fName): > self.pattern = re.compile(r'[\u2F00-\u2FDF]+') > fp = open(fName, 'r') > content = fp.read() > s = re.search('[\u2F00-\u2fdf]', content, re.U) > if s: > print s.group(0) > if __name__ == '__main__': > rc = RemCh('/home/pradnyesh/removeChinese/delFolder.php') > ********************************************************************** > > the php file content is something like the following: > > ********************************************************************** > // Check if the folder still has subscribed blogs > $subCount = function1($param1, $param2); > if ($subCount > 0) { > $errors['summary'] = 'æÂï½ æ½å¤æ¤Ã¥Ã¯Â«Ã¥Ã©Ã©Â§Ã§Â²Ã¨'; > $errorMessage = 'æÂï½ æ½å¤æ¤Ã¥Ã¯Â«Ã¥Ã©Ã©Â§Ã§Â²Ã¨'; > }
Looks like an UTF-8 encoded file viewed as ISO-8859-1. Sou you should decode `content` to unicode before searching the chinese characters. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list