Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
> "PG" == Paul Gevers writes: PG> I fully understand the latter rationale, but I thought you meant PG> (by "some Indian languages do that") that it was already PG> possible. What are they than doing if not "fixing" festival PG> itself? I guess they do that because those scripts can't be represented in common 8-bit codings (if they can fit in any reasonable 8-bit set at all). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
On 20-05-14 21:20, Milan Zamazal wrote: >> "PG" == Paul Gevers writes: > > PG> On 20-05-14 17:01, Milan Zamazal wrote: > >> In theory, it's possible to implement the support for a given > >> language in UTF-8 (some Indian languages do that), but it's > >> cumbersome as all the UTF handling must be done manually, without > >> any support from Festival. > > PG> I don't understand your remark. Why would UTF-8 to ISO-8859-X be > PG> any more easy than ISO-8859-X to UTF-8? > > The text processing part of a language support in Festival processes > text input. The easiest way to implement the processing is to use 8-bit > (single byte) input coding. Using multibyte characters within Festival > is like using them in old C without wchar etc., i.e. something nobody > does unless really needed. So in order to use UTF-8 input directly > instead of some 8-bit coding, one would probably implement something > like UTF-8 -> 8-bit coding conversion in the Festival language pack, > which makes little sense when better and working conversion tools (such > as iconv) are already available. I fully understand the latter rationale, but I thought you meant (by "some Indian languages do that") that it was already possible. What are they than doing if not "fixing" festival itself? Paul signature.asc Description: OpenPGP digital signature
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
> "PG" == Paul Gevers writes: PG> On 20-05-14 17:01, Milan Zamazal wrote: >> In theory, it's possible to implement the support for a given >> language in UTF-8 (some Indian languages do that), but it's >> cumbersome as all the UTF handling must be done manually, without >> any support from Festival. PG> I don't understand your remark. Why would UTF-8 to ISO-8859-X be PG> any more easy than ISO-8859-X to UTF-8? The text processing part of a language support in Festival processes text input. The easiest way to implement the processing is to use 8-bit (single byte) input coding. Using multibyte characters within Festival is like using them in old C without wchar etc., i.e. something nobody does unless really needed. So in order to use UTF-8 input directly instead of some 8-bit coding, one would probably implement something like UTF-8 -> 8-bit coding conversion in the Festival language pack, which makes little sense when better and working conversion tools (such as iconv) are already available. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
On 20-05-14 17:01, Milan Zamazal wrote: > In theory, it's possible to implement the support for a given language > in UTF-8 (some Indian languages do that), but it's cumbersome as all the > UTF handling must be done manually, without any support from Festival. I don't understand your remark. Why would UTF-8 to ISO-8859-X be any more easy than ISO-8859-X to UTF-8? As nowadays most files in Debian (and in the wild?) are in UTF-8, the number of files to convert would be lower if the input is expected in UTF-8. > It's easier to recode the input text using iconv or another tool. That works both ways, doesn't it? > Perhaps the information about ISO 8859-2 coding should be added to some > of the README files in festival-czech (maybe the bug reporter would like > to provide a patch?). Sure, documentation is great, but I would also appreciate it when the majority of files would just work instead of the minority. However, I have no idea how to determine if I am right about the numbers of non-UTF-8 versus UTF-8. Paul signature.asc Description: OpenPGP digital signature
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
> "PG" == Paul Gevers writes: PG> As a side note, I think that czech can not be expressed in PG> ISO-8859-15, but I don't think festival is limited to -15, but PG> supports the other ISO-8859 character sets as well. Basically, Festival doesn't support any encoding and it works with whatever is defined in the particular language support. This is ISO 8859-2 in case of Czech. In theory, it's possible to implement the support for a given language in UTF-8 (some Indian languages do that), but it's cumbersome as all the UTF handling must be done manually, without any support from Festival. It's easier to recode the input text using iconv or another tool. Perhaps the information about ISO 8859-2 coding should be added to some of the README files in festival-czech (maybe the bug reporter would like to provide a patch?). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
Hi On 05-11-12 13:38, Michal Suchanek wrote: > I tried running festival with one of the czech voices (festival > --language czech) > > That language uses characters outside of the 7bit ascii range heavily > but any occurence of such character causes festival to say 'unknown'. From the README file in a related package (festival-ca): """ Festival expects **ISO-8859-15** encoding. Be sure that you use this encoding in your terminal or files. If your system uses UTF-8 (as do many distributions today) you need to convert the file before reading. Some front-ends, as gnopernicus, do the conversions for you. You can use the "save as" options in `gedit`; or use programs to convert the format, as `iconv`: $ iconv -f utf8 -t ISO-8859-15//TRANSLIT myfile_utf8.text > myfile_latin1.text """ I am aware that this does not solve your issue, nor does it invalidate this bug, but it explains what the issue is and how you can easily work around it. I think the bug should be reassigned to festival and the bug should be titled: "festival should support UTF-8" As a side note, I think that czech can not be expressed in ISO-8859-15, but I don't think festival is limited to -15, but supports the other ISO-8859 character sets as well. Paul signature.asc Description: OpenPGP digital signature
Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english
Package: festival Version: 1:2.1~release-5+b1 Severity: normal Hello, I tried running festival with one of the czech voices (festival --language czech) That language uses characters outside of the 7bit ascii range heavily but any occurence of such character causes festival to say 'unknown'. There is probably some encoding issue but my locale is generally correct and I don't see any option to set encoding. Thanks Michal -- System Information: Debian Release: wheezy/sid APT prefers testing APT policy: (910, 'testing'), (900, 'stable'), (410, 'unstable'), (200, 'experimental'), (150, 'precise-updates'), (150, 'precise-security'), (150, 'precise') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.5-trunk-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages festival depends on: ii adduser3.113+nmu3 ii alsa-utils 1.0.25-3 ii dpkg 1.16.9 ii install-info 4.13a.dfsg.1-10 ii libaudiofile1 0.3.4-2 ii libc6 2.13-35 ii libesd00.2.41-10+b1 ii libestools2.1 1:2.1~release-4 ii libgcc11:4.7.1-7 ii libncurses55.9-10 ii libstdc++6 4.7.1-7 ii libtinfo5 5.9-10 ii lsb-base 4.1+Debian7 ii sgml-base 1.26+nmu3 ii sysv-rc2.88dsf-32 Versions of packages festival recommends: ii festvox-kallpc16k [festival-voice] 1.4.0-5 ii festvox-kdlpc16k [festival-voice] 1.4.0-5 Versions of packages festival suggests: pn festival-freebsoft-utils pn pidgin-festival -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org