On 30. april 2013 11:44, Horacio Sanson wrote: > Great to see Sup getting back on track again.. > > I submitted some patches for the Gmail dumper of Heliotrope some time ago > but the lack of non alphabet languages (Japanese, Chinese) made it > impossible for me to keep using heliotrope/turnesole. > > The main issue to support Japanese/Chinese with heliotrope was that > whistlepig (indexer) lacked the ability to tokenize these languages. Also > the half baked UTF-8 support caused several issues with these languages. > > I would like to help in testing/implementing support for these languages, > starting with Japanese, but I would require some guidance. First I would > like to know is there is a way to configure the Xapian tokenizer > (segmenter) within sup? Please consider that I am new to both sup and to > Xapian.
Hi Horacio, consider opening an issue at https://github.com/sup-heliotrope/sup/issues to make sure this doesn't disappear. Some changes will probably be made to the indexer when going to Mail (from RMail), but I hope to be able to migrate the existing index. Perhaps its time to get it right for arbitrary languages as well. I am unfamiliar with Japanes/Chinese - does UTF-8 cover the needs? Mail is better at handling UTF-8 and I think there was some fork that had some extra support for Japanese. Regards, Gaute _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel