Re: Japanese text search problem

2001-08-10 Thread Benjamin Franz
On Fri, 10 Aug 2001, Martin Duerst wrote: > At 12:17 01/08/08 -0700, Benjamin Franz wrote: > > >In UTF8 the 'frame' problem doesn't exist because character start > >bytes _ALWAYS_ have bit eight set to 0 while continuation bytes _ALWAYS_ > >have bit eight set to 1. 'quotemeta' works fine if you u

Re: Japanese text search problem

2001-08-09 Thread Martin Duerst
At 12:17 01/08/08 -0700, Benjamin Franz wrote: >Oh, yeah. I forgot about that since I don't normally keep stuff in >JIS/SJIS/EUC-JP once I've acquired it. I always make my working store >UTF8. In UTF8 the 'frame' problem doesn't exist because character start >bytes _ALWAYS_ have bit eight set to

Re: Japanese text search problem

2001-08-08 Thread Benjamin Franz
On Wed, 8 Aug 2001, Dan Kogai wrote: > on 01.8.8 1:14 AM, Benjamin Franz at [EMAIL PROTECTED] wrote: > > On Tue, 7 Aug 2001, Ashutosh Salgarkar wrote: > > > > my $safe_key = quotemeta($key1); > > $searchStr =~ m/$safe_key/; > > > > is probably what you want. I am presuming you are trying to use m

Re: Japanese text search problem

2001-08-07 Thread Markus Kuhn
On Wed, 8 Aug 2001, Dan Kogai wrote: > I confess; so do I for most of the times. The biggest problem is that > there are still too few tools to edit utf8 files. Vim 6.0 supports UTF-8, and so does Emacs 20 (though still with compromises). Links and further UTF-8 editors are listed in http:

Re: Japanese text search problem

2001-08-07 Thread Markus Kuhn
On 7 Aug 2001, Andreas Marcel Riechert wrote: > Why should Unicode be the "de facto standard for internal > representation"? ...or "internal standard" to whom, or what? Because every new system designed after around 1995 has based its character encoding competely on ISO 10646. All the rest is slo

Re: Japanese text search problem

2001-08-07 Thread Andreas Marcel Riechert
Dan Kogai <[EMAIL PROTECTED]> writes: > on 01.8.8 1:54 AM, Andreas Marcel Riechert at [EMAIL PROTECTED] wrote: > > Why should Unicode be the "de facto standard for internal > > representation"? ...or "internal standard" to whom, or what? In perl > > that could happen, but as a general statement I

Re: Japanese text search problem

2001-08-07 Thread Dan Kogai
on 01.8.8 1:14 AM, Benjamin Franz at [EMAIL PROTECTED] wrote: > On Tue, 7 Aug 2001, Ashutosh Salgarkar wrote: > > my $safe_key = quotemeta($key1); > $searchStr =~ m/$safe_key/; > > is probably what you want. I am presuming you are trying to use m// to > search for exact string matches rather than

Re: Japanese text search problem

2001-08-07 Thread Benjamin Franz
On Tue, 7 Aug 2001, Ashutosh Salgarkar wrote: > Hi all, > > We are trying to search japanese keyword using a search string(in perl using pattern >matching). > We are facing problem while searching a particular keyword as given below, > $searchStr =~ m/$key1/i > > when $key1 contains シリーズ > We ge

Re: Japanese text search problem

2001-08-07 Thread Dan Kogai
on 01.8.7 10:53 PM, Jarkko Hietaniemi at [EMAIL PROTECTED] wrote: >> * Use perl 5.6.0 or above > > I would strongly urge using 5.6.1 at this point: several Unicode > bugs in 5.6.0 were fixed for 5.6.1. Everyone hear that? Use 5.6.1. That is also the version that is covered by 3rd Camel Book

Re: Japanese text search problem

2001-08-07 Thread Dan Kogai
on 01.8.8 1:54 AM, Andreas Marcel Riechert at [EMAIL PROTECTED] wrote: > Why should Unicode be the "de facto standard for internal > representation"? ...or "internal standard" to whom, or what? In perl > that could happen, but as a general statement I cannot agree, but > anyway I would like to hea

Re: Japanese text search problem

2001-08-07 Thread Andreas Marcel Riechert
Dan Kogai <[EMAIL PROTECTED]> writes: >Japanese is notorious for the number of character encodings used. JIS, > shift JIS, EUC, and now Unicode. JIS (ISO-2022-JP to be more exact) is a de > facto standard for e-mails. shift JIS is de facto standard for Win/Mac > files. EUC is de facto sta

Re: Japanese text search problem

2001-08-07 Thread Jarkko Hietaniemi
> * Use perl 5.6.0 or above I would strongly urge using 5.6.1 at this point: several Unicode bugs in 5.6.0 were fixed for 5.6.1. > * convert any string to utf8 using Jcode or other modules In the upcoming (hopefully in a few months) 5.8.0 the Unicode support is even better (e.g. regexes work mu

Re: Japanese text search problem

2001-08-07 Thread Dan Kogai
on 01.8.7 9:34 PM, Jarkko Hietaniemi at [EMAIL PROTECTED] wrote: > On Tue, Aug 07, 2001 at 05:37:00PM +0530, Ashutosh Salgarkar wrote: >> Hi all, >> >> We are trying to search japanese keyword using a search string(in perl using >> pattern matching). >> We are facing problem while searching a par

Re: Japanese text search problem

2001-08-07 Thread Jarkko Hietaniemi
On Tue, Aug 07, 2001 at 05:37:00PM +0530, Ashutosh Salgarkar wrote: > Hi all, > > We are trying to search japanese keyword using a search string(in perl using pattern >matching). > We are facing problem while searching a particular keyword as given below, > $searchStr =~ m/$key1/i > > when $key