Bug#892228: libsphinxbase3: Causes pocketsphinx to FTBFS on 64-bit big-endian architectures (fills testsuite logs on disk with errors)

2018-03-07 Thread James Clarke
On 7 Mar 2018, at 21:39, Samuel Thibault  wrote:
> 
> Hello,
> 
> James Clarke, on mer. 07 mars 2018 00:33:05 +, wrote:
>> The build for pocketsphinx fails on 64-bit big-endian architectures,
> 
> I know, I had already reported the issue a long time ago, without
> feedback.

Yeah, I found the upstream issue after I reported this, but that doesn't
quite convey the problem seen here!

>> failing with "No space left on device", as the testsuite log files
>> fill up with hundreds of gigabytes of warnings.
> 
> Ouch!  Perhaps we should just abort the build before that happens for
> now.

Probably; either abort based on DEB_HOST_ARCH_ENDIAN and DEB_HOST_ARCH_BITS
(though maybe the 32-bit big-endian builds are broken enough to not be useful
and should be disabled too), or I guess we can mark it Not-For-Us on the
wanna-build side.

James



Bug#892228: libsphinxbase3: Causes pocketsphinx to FTBFS on 64-bit big-endian architectures (fills testsuite logs on disk with errors)

2018-03-07 Thread Samuel Thibault
Hello,

James Clarke, on mer. 07 mars 2018 00:33:05 +, wrote:
> The build for pocketsphinx fails on 64-bit big-endian architectures,

I know, I had already reported the issue a long time ago, without
feedback.

> failing with "No space left on device", as the testsuite log files
> fill up with hundreds of gigabytes of warnings.

Ouch!  Perhaps we should just abort the build before that happens for
now.

Samuel



Bug#892228: libsphinxbase3: Causes pocketsphinx to FTBFS on 64-bit big-endian architectures (fills testsuite logs on disk with errors)

2018-03-06 Thread James Clarke
Package: libsphinxbase3
Version: 0.8+5prealalpha+1-1
Severity: important
Tags: upstream
Control: affects -1 src:pocketsphinx

Hi,
The build for pocketsphinx fails on 64-bit big-endian architectures, failing
with "No space left on device", as the testsuite log files fill up with
hundreds of gigabytes of warnings. The first indication of the problem in the
log files is:

> Sorry, this does not support more than 33554432 n-grams of a particular 
> order.  Edit util/bit_packing.hh and fix the bit packing functions

where 33554432 is 0x200, i.e. 32 byte-swapped. This error isn't fatal
though, and libsphinxbase3 continues to try to build the trie, with tons of
duplicate word warnings, as it's reading all kinds of garbage. The issues stem
from a widespread use of using fread to read multi-byte values with no regard
for their endianness, with the first error, the wrong number of n-grams, coming
from reading into the "counts" array in ngram_model_trie_read_bin. The library
has functions like bio_fread which can do the byte-swapping for the caller, so
presumably these should be used instead, though for this file format there does
not seem to be an easy way to determine the endianness of the file based on
some header magic like for some of the others (but maybe it's intended to
always be little-endian).

32-bit big-endian architectures have the same underlying bugs, but it seems
they die a lot earlier, failing to calloc huge sizes (presumably these same
calls are made on 64-bit architectures but can be satisfied thanks to
overcommitting) and thus don't actually try to build the trie and spew all the
warnings.

There are "only" 62 calls to fread in sphinxbase (and a further 45 in
pocketsphinx) so it shouldn't be too hard for someone with knowledge of the
codebase to audit their uses, especially since my guess is that most of them
can be turned into something like `bio_fread(..., IS_BIG_ENDIAN)`. Similarly,
the corresponding fwrite calls should be audited too.

Regards,
James