I think I got it -- new patch in bugzilla.

On 3/6/07, jamie <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-03-05 at 21:55 -0500, Edward Duffy wrote:
> > On 3/5/07, jamie <[EMAIL PROTECTED]> wrote:
> > > On Mon, 2007-03-05 at 18:19 -0500, Edward Duffy wrote:
> > > > Hi Guys -
> > > >
> > > > I just wrote a patch for #377891[1], could I get some of you to test
> > > > it.  I ran some pdfs I found with google.fr and google.it, and it
> > > > seems to be working correctly...but more eyes the better.
> > >
> >
> > Both from http://software.wise-guys.nl/libtextcat/languages.html
> > > great stuff but we only support utf-8 - are all those language modules
> > > utf-8 based?
> > >
> > """Our main focus will be on compiling a list of fingerprints of UTF-8
> > encoded languages, since Unicode is clearly the way to go and UTF-8 is
> > usually the best way to do Unicode."""
> >
> > It works (for my tests) if I encode the buffer to UTF-8 first, and
> > I've been able to get away with just sending the first 1K of the file.
>
>
> before I accept patch can you:
>
> 1) just include langs we have stopwords/stemmers for
> 2) check and verify each lang we support with utf8 content
> 3) if (2) fails use g_convert to convert utf8 to necessary char_set
>
> I will fiddle with configure.ac once you have done the above,
>
>
>
> jamie.
>
>
>
>
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to