Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-20 Thread Milan Zamazal
> "PG" == Paul Gevers  writes:

PG> I fully understand the latter rationale, but I thought you meant
PG> (by "some Indian languages do that") that it was already
PG> possible. What are they than doing if not "fixing" festival
PG> itself?

I guess they do that because those scripts can't be represented in
common 8-bit codings (if they can fit in any reasonable 8-bit set at
all).


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-20 Thread Paul Gevers
On 20-05-14 21:20, Milan Zamazal wrote:
>> "PG" == Paul Gevers  writes:
> 
> PG> On 20-05-14 17:01, Milan Zamazal wrote:
> >> In theory, it's possible to implement the support for a given
> >> language in UTF-8 (some Indian languages do that), but it's
> >> cumbersome as all the UTF handling must be done manually, without
> >> any support from Festival.
> 
> PG> I don't understand your remark. Why would UTF-8 to ISO-8859-X be
> PG> any more easy than ISO-8859-X to UTF-8?
> 
> The text processing part of a language support in Festival processes
> text input.  The easiest way to implement the processing is to use 8-bit
> (single byte) input coding.  Using multibyte characters within Festival
> is like using them in old C without wchar etc., i.e. something nobody
> does unless really needed.  So in order to use UTF-8 input directly
> instead of some 8-bit coding, one would probably implement something
> like UTF-8 -> 8-bit coding conversion in the Festival language pack,
> which makes little sense when better and working conversion tools (such
> as iconv) are already available.

I fully understand the latter rationale, but I thought you meant (by
"some Indian languages do that") that it was already possible. What are
they than doing if not "fixing" festival itself?

Paul



signature.asc
Description: OpenPGP digital signature


Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-20 Thread Milan Zamazal
> "PG" == Paul Gevers  writes:

PG> On 20-05-14 17:01, Milan Zamazal wrote:
>> In theory, it's possible to implement the support for a given
>> language in UTF-8 (some Indian languages do that), but it's
>> cumbersome as all the UTF handling must be done manually, without
>> any support from Festival.

PG> I don't understand your remark. Why would UTF-8 to ISO-8859-X be
PG> any more easy than ISO-8859-X to UTF-8?

The text processing part of a language support in Festival processes
text input.  The easiest way to implement the processing is to use 8-bit
(single byte) input coding.  Using multibyte characters within Festival
is like using them in old C without wchar etc., i.e. something nobody
does unless really needed.  So in order to use UTF-8 input directly
instead of some 8-bit coding, one would probably implement something
like UTF-8 -> 8-bit coding conversion in the Festival language pack,
which makes little sense when better and working conversion tools (such
as iconv) are already available.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-20 Thread Paul Gevers
On 20-05-14 17:01, Milan Zamazal wrote:
> In theory, it's possible to implement the support for a given language
> in UTF-8 (some Indian languages do that), but it's cumbersome as all the
> UTF handling must be done manually, without any support from Festival.

I don't understand your remark. Why would UTF-8 to ISO-8859-X be any
more easy than ISO-8859-X to UTF-8? As nowadays most files in Debian
(and in the wild?) are in UTF-8, the number of files to convert would be
lower if the input is expected in UTF-8.

> It's easier to recode the input text using iconv or another tool.

That works both ways, doesn't it?

> Perhaps the information about ISO 8859-2 coding should be added to some
> of the README files in festival-czech (maybe the bug reporter would like
> to provide a patch?).

Sure, documentation is great, but I would also appreciate it when the
majority of files would just work instead of the minority. However, I
have no idea how to determine if I am right about the numbers of
non-UTF-8 versus UTF-8.

Paul




signature.asc
Description: OpenPGP digital signature


Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-20 Thread Milan Zamazal
> "PG" == Paul Gevers  writes:

PG> As a side note, I think that czech can not be expressed in
PG> ISO-8859-15, but I don't think festival is limited to -15, but
PG> supports the other ISO-8859 character sets as well.

Basically, Festival doesn't support any encoding and it works with
whatever is defined in the particular language support.  This is
ISO 8859-2 in case of Czech.

In theory, it's possible to implement the support for a given language
in UTF-8 (some Indian languages do that), but it's cumbersome as all the
UTF handling must be done manually, without any support from Festival.
It's easier to recode the input text using iconv or another tool.

Perhaps the information about ISO 8859-2 coding should be added to some
of the README files in festival-czech (maybe the bug reporter would like
to provide a patch?).


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2014-05-19 Thread Paul Gevers
Hi

On 05-11-12 13:38, Michal Suchanek wrote:
> I tried running festival with one of the czech voices (festival
> --language czech)
> 
> That language uses characters outside of the 7bit ascii range heavily
> but any occurence of such character causes festival to say 'unknown'.

From the README file in a related package (festival-ca):
"""
Festival expects **ISO-8859-15** encoding. Be sure that you use
this encoding in your terminal or files. If your system uses UTF-8 (as
do many distributions today) you need to convert the file before reading.
Some front-ends, as gnopernicus, do the conversions for you.

You can use the "save as" options in `gedit`; or use programs to convert
the
format, as `iconv`:

$ iconv -f utf8 -t ISO-8859-15//TRANSLIT myfile_utf8.text >
myfile_latin1.text
"""

I am aware that this does not solve your issue, nor does it invalidate
this bug, but it explains what the issue is and how you can easily work
around it. I think the bug should be reassigned to festival and the bug
should be titled: "festival should support UTF-8"

As a side note, I think that czech can not be expressed in ISO-8859-15,
but I don't think festival is limited to -15, but supports the other
ISO-8859 character sets as well.

Paul



signature.asc
Description: OpenPGP digital signature


Bug#692369: festival: cannot speak text containing non-ascii characters - required for non-english

2012-11-05 Thread Michal Suchanek
Package: festival
Version: 1:2.1~release-5+b1
Severity: normal


Hello,

I tried running festival with one of the czech voices (festival
--language czech)

That language uses characters outside of the 7bit ascii range heavily
but any occurence of such character causes festival to say 'unknown'.

There is probably some encoding issue but my locale is generally correct
and I don't see any option to set encoding.

Thanks

Michal

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (910, 'testing'), (900, 'stable'), (410, 'unstable'), (200, 
'experimental'), (150, 'precise-updates'), (150, 'precise-security'), (150, 
'precise')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.5-trunk-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages festival depends on:
ii  adduser3.113+nmu3
ii  alsa-utils 1.0.25-3
ii  dpkg   1.16.9
ii  install-info   4.13a.dfsg.1-10
ii  libaudiofile1  0.3.4-2
ii  libc6  2.13-35
ii  libesd00.2.41-10+b1
ii  libestools2.1  1:2.1~release-4
ii  libgcc11:4.7.1-7
ii  libncurses55.9-10
ii  libstdc++6 4.7.1-7
ii  libtinfo5  5.9-10
ii  lsb-base   4.1+Debian7
ii  sgml-base  1.26+nmu3
ii  sysv-rc2.88dsf-32

Versions of packages festival recommends:
ii  festvox-kallpc16k [festival-voice]  1.4.0-5
ii  festvox-kdlpc16k [festival-voice]   1.4.0-5

Versions of packages festival suggests:
pn  festival-freebsoft-utils  
pn  pidgin-festival   

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org