Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-06 Thread peter green

Riku Voipio wrote:

I think nofpu would good for raspian. Any lost audio quality would
unnoticable on the Rasberry's analog audio output ;)

Peter, what's the recommended way to recognize raspbian in debian/rules
?
  

dpkg-vendor --derives-from raspbian


--
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/531922fc.40...@p10link.net



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-05 Thread Riku Voipio
On Tue, Mar 04, 2014 at 11:49:45AM +0100, Thomas Orgis wrote:
 In any case ... Riku: Care to run timings of MAD on your
 configurations? I'm interested in how fast it is producing that 24 bit
 output on limited CPUs.

time madplay -d -o null: convergence_-_points_of_view/*.mp3  /dev/null  

Cortex A15:

real0m33.154s
user0m33.045s
sys 0m0.110s

ARMv5:

real1m35.923s
user1m18.290s
sys 0m0.070s

Seems mpg123 wins bragging rights :) thanks, awesome work!

Riku


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140305091407.ga16...@afflict.kos.to



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-05 Thread Riku Voipio
On Tue, Mar 04, 2014 at 02:59:44AM +, peter green wrote:
 On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:
 
 That sounds like if the mpg123 package should use:
 on armel: --with-cpu=arm_nofpu
 on armhf: --with-cpu=arm_fpu
 
 
 Does this make sense to everybody?
 Seems sane to me. armv7 devices without neon are relatively uncommon
 so while it's important that they are supported it's IMO not vitally
 important to squeeze out every last drop of performance from them.
 
 I wonder what we should use on raspbian? I haven't tested on a Pi
 yet but it seems that on all tests i've seen so-far the generic fpu
 code is quite a bit slower than the arm nofpu code. Is there any
 quality difference from using a fpu vs nonfpu decoder? If so how
 much performance degredation do you beleive should be accepted in
 exchange for that quality improvement.

I think nofpu would good for raspian. Any lost audio quality would
unnoticable on the Rasberry's analog audio output ;)

Peter, what's the recommended way to recognize raspbian in debian/rules
?

Riku


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140305093430.gb16...@afflict.kos.to



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 04 Mar 2014 02:59:44 +
schrieb peter green plugw...@p10link.net: 

 Is there any quality 
 difference from using a fpu vs nonfpu decoder?

Technically, there is. See those numbers for generic fpu and non-fpu
code with and without --enable-int-quality given to configure (enables
better rounding for small performance hit, you might want to activate
that by default).

In numbers, the difference is this:

== src/mpg123.fpu_accurate.compliance.txt ==

 Layer 3 
-- 16 bit signed integer output
compl.bit:  RMS=4.300914e-06 (PASS) maxdiff=7.688999e-06 (PASS)
-- 32 bit integer output
compl.bit:  RMS=2.152784e-08 (PASS) maxdiff=1.769513e-07 (PASS)
-- 24 bit integer output
compl.bit:  RMS=4.206462e-08 (PASS) maxdiff=1.788139e-07 (PASS)
-- 32 bit floating point output
compl.bit:  RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS)

== src/mpg123.fpu.compliance.txt ==

 Layer 3 
-- 16 bit signed integer output
compl.bit:  RMS=8.907757e-06 (LIMITED) maxdiff=1.531839e-05 (PASS)
-- 32 bit integer output
compl.bit:  RMS=2.152589e-08 (PASS) maxdiff=1.769513e-07 (PASS)
-- 24 bit integer output
compl.bit:  RMS=4.205495e-08 (PASS) maxdiff=1.788139e-07 (PASS)
-- 32 bit floating point output
compl.bit:  RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS)

== src/mpg123.nofpu_accurate.compliance.txt ==

 Layer 3 
-- 16 bit signed integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
-- 32 bit integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
-- 24 bit integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
-- 32 bit floating point output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)

== src/mpg123.nofpu.compliance.txt ==

 Layer 3 
-- 16 bit signed integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
-- 32 bit integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
-- 24 bit integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
-- 32 bit floating point output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)

With a nofpu decoder, you always get the precision of 16 bit output,
because floating point numbers are converted from 16 bit. But,
especially so with --enable-int-quality, this is a fully compliante
MPEG audio decoder with all the precision that you need for normal
playback situations.

MAD claims 24 bit precision with integer math
(just about matching mpg123's 24 bit output with FPU decoder, see
http://www.underbit.com/resources/mpeg/audio/compliance, RMS=4.906e−08)
I suspect though, that MAD will be considerably slower than mpg123's
arm_nofpu decoder. On my Core2Duo P8800, madplay with libmad 0.15.1
needs about  7.4 s to 8.5 s decoding to null output (with either speed or
accuracy optimization). The mpg123 numbers for the generic variants
(accurate == --enable-int-quality):

== src/mpg123.fpu_accurate.bench.txt ==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.165.85

== src/mpg123.fpu.bench.txt ==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.055.83

== src/mpg123.nofpu_accurate.bench.txt ==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.676.81

== src/mpg123.nofpu.bench.txt ==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.016.16

You see, there is some hit from accurate rounding, but it is in a
different league compared to the difference between fpu and nofpu on a
NEON-less ARM device (and yes, on a x86 CPU, generic FPU code is faster
when actually proucing float output).

Oh, and remember: This is for mpg123 with handbrakes on, using Taihei's
assembly optimizations, the decoding time is about halved on the Core2.
Similarily, I'd like to see numbers for madplay on ARM (best on
machines with and without fpu to get a picture about what difference we
talk about):

sh$ time -d -o null convergence_-_points_of_view/*.mp3

I don't know offhand how mpg123 nofpu stacks up against that, but there
should be a considerable difference in speed. My guess is that, on
limited hardware without NEON, you'd prefer stutter-free playback with
least CPU power draw. When utmost theoretical quality really matters or
you intend extensive post-processing of the data --- especially using
an audio player that works with floating point math internally, like
audacious --- then employing a more capable CPU with NEON is something
I expect. The mpg123 nofpu decoder, according Riku's numbers, is still
a good choice for systems with a FPU but no NEON, but the generic
floating point decoder is not that far behind in speed (compared to
softfloat) and offers proper floating point accuracy as bonus.

Generally, it is a safe bet that any normal 

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 11:49:45 +0100
schrieb Thomas Orgis thomas-fo...@orgis.org: 

 sh$ time -d -o null convergence_-_points_of_view/*.mp3

That should be

sh$ time madplay -d -o null: convergence_-_points_of_view/*.mp3

... as you may have guessed (notice the added :).


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Felipe Sateler
On Mon, Mar 3, 2014 at 11:59 PM, peter green plugw...@p10link.net wrote:
 wonder what we should use on raspbian? I haven't tested on a Pi yet but it
 seems that on all tests i've seen so-far the generic fpu code is quite a bit
 slower than the arm nofpu code.


Indeed, it seems to be:

==
felipe@felipepi:mpg123-20140302115523-nofpu%
./scripts/benchmark-cpu.pl src/mpg123
../convergence_-_points_of_view/*.mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 86.26   90.66

=

felipe@felipepi:mpg123-20140302115523% ./scripts/benchmark-cpu.pl
src/mpg123 ../convergence_-_points_of_view/*.mp3
Found 2 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 102.80  100.06
generic_dither  121.10  100.84

=

-- 

Saludos,
Felipe Sateler


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/caafdzj8kwhvhz62pwgrx9xepzc_zyzqr9gf4kozlluwnq6b...@mail.gmail.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Lennart Sorensen
On Tue, Mar 04, 2014 at 02:59:44AM +, peter green wrote:
 Seems sane to me. armv7 devices without neon are relatively uncommon
 so while it's important that they are supported it's IMO not vitally
 important to squeeze out every last drop of performance from them.

I don't agree.  At least the fisrt Tegra chips did not have neon, and the
marvell chips often don't have neon (the newer ones are starting to now
that they are moving to using Cortex-A designs, rather than marvell custom
cores (like the JP4 used in the armada 510 in the cubox for example),
but many chips don't have neon).  Do the qualcomm designs have neon?
I have been mostly ignoring them due to the anti open source attitude
of qualcomm.

If with=arm_fpu auto selects neon or VFP3 automatically, then I think
armhf is perfect for all armv7 devices.

 I wonder what we should use on raspbian? I haven't tested on a Pi
 yet but it seems that on all tests i've seen so-far the generic fpu
 code is quite a bit slower than the arm nofpu code. Is there any
 quality difference from using a fpu vs nonfpu decoder? If so how
 much performance degredation do you beleive should be accepted in
 exchange for that quality improvement.

So VFP2 is slower than interger math?  Interesting.

 IMO it's often better to be explicit about this sort of thing. While
 upstreams defaults may align with debian armhf's requirements at the
 present time and on the present build hardware such defaults are
 subject to change either as a result of upstream changes in new
 versions or as a result of different build hardware.

I suppose that makes sense.  Avoids unexpected surprises later.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140304155447.gv17...@csclub.uwaterloo.ca



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 11:10:25 -0300
schrieb Felipe Sateler fsate...@debian.org: 

 #decodert_s16/s t_f32/s
 ARM 86.26   90.66
 generic 102.80  100.06
 generic_dither  121.10  100.84

Yes, a difference, but aguably a lot less than comparing VPU code to
NEON. With the feature to produce float output from all decoders, it is
your (debian's) option to prefer decoding speed by building a libmpg123
with arm_nofpu and use it on armhf machines without NEON via the
library loading mechanism. Or you decide for offering proper floating
point output that needs some 25-50 % more CPU time.

I am even more interested in a comparison with the runtime of madplay
in that configuration. Perhaps its fixed-point math with 24 bit output
is still faster than using the VFP with mpg123. Of course, I'd be
interested to know if that's not the case (mpg123 rulez!;-). But if it
is, it wouldn't totally surprise me.


Alrighty then,

Thomas

PS: You still have to decide for --enable-int-quality or not, for a
smaller impact on CPU time and basically one bit of precision.


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Felipe Sateler
On Tue, Mar 4, 2014 at 2:26 PM, Thomas Orgis thomas-fo...@orgis.org wrote:
 Am Tue, 4 Mar 2014 11:10:25 -0300
 schrieb Felipe Sateler fsate...@debian.org:

 #decodert_s16/s t_f32/s
 ARM 86.26   90.66
 generic 102.80  100.06
 generic_dither  121.10  100.84

 Yes, a difference, but aguably a lot less than comparing VPU code to
 NEON. With the feature to produce float output from all decoders, it is
 your (debian's) option to prefer decoding speed by building a libmpg123
 with arm_nofpu and use it on armhf machines without NEON via the
 library loading mechanism. Or you decide for offering proper floating
 point output that needs some 25-50 % more CPU time.

 I am even more interested in a comparison with the runtime of madplay
 in that configuration. Perhaps its fixed-point math with 24 bit output
 is still faster than using the VFP with mpg123. Of course, I'd be
 interested to know if that's not the case (mpg123 rulez!;-). But if it
 is, it wouldn't totally surprise me.

madplay -d -o null: convergence_-_points_of_view/*.mp3  /dev/null
130.22s user 1.88s system 93% cpu 2:21.91 total

That's with the following mad:

MPEG Audio Decoder 0.15.1 (beta)
  Copyright (C) 2000-2004 Underbit Technologies, Inc.
  Build options: NDEBUG FPM_ARM ASO_IMDCT ASO_INTERLEAVE1

ID3 Tag Library 0.15.1 (beta)
  Copyright (C) 2000-2004 Underbit Technologies, Inc.
  Build options: NDEBUG

madplay 0.15.2 (beta)
  Copyright (C) 2000-2004 Robert Leslie
  Build options: AUDIO_DEFAULT=audio_alsa ENABLE_NLS

This is the madplay straight from raspbian, not sure if some other
configure flag was to be tested.



-- 

Saludos,
Felipe Sateler


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/caafdzj-qw8nx-4gujcj+kvtn9lz76mp1tcnaszh1tdzkftq...@mail.gmail.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 16:25:17 -0300
schrieb Felipe Sateler fsate...@debian.org: 

  #decodert_s16/s t_f32/s
  ARM 86.26   90.66
  generic 102.80  100.06
  generic_dither  121.10  100.84

 madplay -d -o null: convergence_-_points_of_view/*.mp3  /dev/null
 130.22s user 1.88s system 93% cpu 2:21.91 total

Interesting. So the VFP is not that bad: You get superior output (not
noticeably, but measurable in the digital domain) from mpg123's generic
decoder in about 75 % of the decoding time.

The lower-quality 16 bit integer decoder of mpg123 is considerably
faster. So, on a armel system without VFP, it makes sense to employ
libmad to achieve 24 bit accuracy with reasonable CPU cost, if you
insist on that accuracy. But with VFP, using mpg123 gives you full 32
bit floating point output with less CPU load. For NEON, it's not even a
question.

I think I can live with that situation;-) both MAD and mpg123 achieve
their goals. MAD gets the best precision out of integer math, mpg123
offers something faster everywhere, possibly with less, but also
possibly with more (irrelevant, 24 bit is _really_ enough) precision.

One might also benchmark a decoder based on ffmpeg, which has both
fixed-point and floating-point decoders, but I don't have a good
command line for that at hand (used mplayer -ac mpg123 and mplayer -ac
ffmp3[float] in the past). Anyhow, leaving scope here. I should get
going and release mpg123 1.19.0 .

 This is the madplay straight from raspbian, not sure if some other
 configure flag was to be tested.

Optimizing for speed vs. quality might be an option ... but that's
somehow missing the point of preferring libmad.


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Riku Voipio
On Sun, Mar 02, 2014 at 12:02:40PM +0100, Thomas Orgis wrote:
 Am Sat, 01 Mar 2014 01:00:02 +0900
 schrieb Taihei Momma t...@mac.com: 
 
  OK, after some investigation with armhf cross environment and qemu, finally 
  the current mpg123 svn (r3517) should work 
 
 After Tahei didn't stop at this (big thanks from here!), we got a new
 snapshot,
 
   http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,
 
 that will hopefully become mpg123 1.19.0 soon (not 1.18.x
 because of feature additions regarding this very debian issue). The
 main points:
 
 - float output with all decoders (also arm_nofpu)
 - ARM decoders (esp. NEON) working with debian toolchain
 - new --with-cpu=arm_fpu choice with runtime detection to switch
   between NEON or normal FPU
 
 So, the number of builds for optimal treatment of differing platforms
 reduces to two:
 
 1. --with-cpu=arm_nofpu
 2. --with-cpu=arm_fpu

Awesome work!

 I hope we can all be happy about that. I'd also be glad to get some
 confirmation from debian that it really works now. Release will be
 imminent, then.

Here's some test results

On a cortex-a15 system arm_nofpu: (ubuntu armhf)

#decodert_s16/s t_f32/s
ARM 24.22   25.02

On a cortex-a15 system arm_fpu: (ubuntu armhf)

#decodert_s16/s t_f32/s
NEON14.33   14.90
generic 36.25   27.46
generic_dither  39.52   27.44

the A15 core was downclocked and cpufreq disabled to ensure
stable results

ARMv5 system arm_nofpu (debian armel)

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 49.12   63.17

ARMv5 system arm_fpu (debian sid)

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 491.75  468.37
generic_dither  535.50  468.38

armel is with softfloat emulation, so horrible times were expected - 
the main point of that last run was to verify that NEON runtime
detection works (Seems so).

Riku


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140303085058.ga1...@afflict.kos.to



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Lennart Sorensen
On Sun, Mar 02, 2014 at 12:02:40PM +0100, Thomas Orgis wrote:
 After Tahei didn't stop at this (big thanks from here!), we got a new
 snapshot,
 
   http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,
 
 that will hopefully become mpg123 1.19.0 soon (not 1.18.x
 because of feature additions regarding this very debian issue). The
 main points:
 
 - float output with all decoders (also arm_nofpu)
 - ARM decoders (esp. NEON) working with debian toolchain
 - new --with-cpu=arm_fpu choice with runtime detection to switch
   between NEON or normal FPU
 
 So, the number of builds for optimal treatment of differing platforms
 reduces to two:
 
 1. --with-cpu=arm_nofpu
 2. --with-cpu=arm_fpu
 
 I hope we can all be happy about that. I'd also be glad to get some
 confirmation from debian that it really works now. Release will be
 imminent, then.
 
 Thanks for staying with us with all the chattering about this ...

I now see (with arm_fpu of course, which it seems to have auto detected 
correctly):

perl scripts/benchmark-cpu.pl `which mpg123` /convergence_-_points_of_view/*mp3
Found 3 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON7.587.84
generic 19.23   14.56
generic_dither  20.97   14.54

Looks good.  I ran it 3 times and they were very close, and the cpu pinned
itself at 1.5GHz during the test, and went back to 1.0GHz when idle again.
One of the two cores was very bored though with nothing to do.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140303170248.gt17...@csclub.uwaterloo.ca



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Lennart Sorensen
On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:
 That sounds like if the mpg123 package should use:
 
 on armel: --with-cpu=arm_nofpu
 on armhf: --with-cpu=arm_fpu
 
 
 Does this make sense to everybody?

I think so.  armhf's current debian rules automatically picked arm_fpu
with the new version's configure script, so at least that one doesn't
seem to need any explicit help.  armel might though.

 Thank you for handling this issue (and basically every issue other
 that popped out in Debian for mpg123) so quickly!

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140303170444.gu17...@csclub.uwaterloo.ca



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread peter green

On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:


That sounds like if the mpg123 package should use:
 
on armel: --with-cpu=arm_nofpu

on armhf: --with-cpu=arm_fpu


Does this make sense to everybody?
  
Seems sane to me. armv7 devices without neon are relatively uncommon so 
while it's important that they are supported it's IMO not vitally 
important to squeeze out every last drop of performance from them.


I wonder what we should use on raspbian? I haven't tested on a Pi yet 
but it seems that on all tests i've seen so-far the generic fpu code is 
quite a bit slower than the arm nofpu code. Is there any quality 
difference from using a fpu vs nonfpu decoder? If so how much 
performance degredation do you beleive should be accepted in exchange 
for that quality improvement.


Lennart Sorensen wrote:

I think so.  armhf's current debian rules automatically picked arm_fpu
with the new version's configure script, so at least that one doesn't
seem to need any explicit help.  armel might though.
  
IMO it's often better to be explicit about this sort of thing. While 
upstreams defaults may align with debian armhf's requirements at the 
present time and on the present build hardware such defaults are subject 
to change either as a result of upstream changes in new versions or as a 
result of different build hardware.




--
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/531541a0.3010...@p10link.net



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-02 Thread Thomas Orgis
Am Sat, 01 Mar 2014 01:00:02 +0900
schrieb Taihei Momma t...@mac.com: 

 OK, after some investigation with armhf cross environment and qemu, finally 
 the current mpg123 svn (r3517) should work 

After Tahei didn't stop at this (big thanks from here!), we got a new
snapshot,

http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,

that will hopefully become mpg123 1.19.0 soon (not 1.18.x
because of feature additions regarding this very debian issue). The
main points:

- float output with all decoders (also arm_nofpu)
- ARM decoders (esp. NEON) working with debian toolchain
- new --with-cpu=arm_fpu choice with runtime detection to switch
  between NEON or normal FPU

So, the number of builds for optimal treatment of differing platforms
reduces to two:

1. --with-cpu=arm_nofpu
2. --with-cpu=arm_fpu

I hope we can all be happy about that. I'd also be glad to get some
confirmation from debian that it really works now. Release will be
imminent, then.

Thanks for staying with us with all the chattering about this ...


Alrighty then,

Thomas



signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-02 Thread Reinhard Tartler
On Sun, Mar 2, 2014 at 6:02 AM, Thomas Orgis thomas-fo...@orgis.org wrote:
 Am Sat, 01 Mar 2014 01:00:02 +0900
 schrieb Taihei Momma t...@mac.com:

 OK, after some investigation with armhf cross environment and qemu, finally 
 the current mpg123 svn (r3517) should work

 After Tahei didn't stop at this (big thanks from here!), we got a new
 snapshot,

 http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,

 that will hopefully become mpg123 1.19.0 soon (not 1.18.x
 because of feature additions regarding this very debian issue). The
 main points:

 - float output with all decoders (also arm_nofpu)
 - ARM decoders (esp. NEON) working with debian toolchain
 - new --with-cpu=arm_fpu choice with runtime detection to switch
   between NEON or normal FPU

 So, the number of builds for optimal treatment of differing platforms
 reduces to two:

 1. --with-cpu=arm_nofpu
 2. --with-cpu=arm_fpu

 I hope we can all be happy about that. I'd also be glad to get some
 confirmation from debian that it really works now. Release will be
 imminent, then.

That sounds like if the mpg123 package should use:

on armel: --with-cpu=arm_nofpu
on armhf: --with-cpu=arm_fpu


Does this make sense to everybody?

 Thanks for staying with us with all the chattering about this ...

Thank you for handling this issue (and basically every issue other
that popped out in Debian for mpg123) so quickly!


-- 
regards,
Reinhard


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/caj0cceaxdetftr8svyg7vnvkgvxcy80xrxt32ljtmxl_pfo...@mail.gmail.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-01 Thread Thomas Orgis
Am Sat, 01 Mar 2014 01:00:02 +0900
schrieb Taihei Momma t...@mac.com: 

 OK, after some investigation with armhf cross environment and qemu, finally 
 the current mpg123 svn (r3517) should work (including arm_nofpu decoder).
 
 The point is .type directive. Without this directive, a linker doesn't 
 distinguish arm functions from thumb functions, and interworking doesn't work 
 properly.

Great! So, folks, please check that

http://mpg123.de/snapshot/mpg123-2014030100.tar.bz2

does the trick with all decoders now. Performance numbers from the
benchmark script would be nice. I'll release 1.18.1 after confirmation
and we finally can settle this.


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-01 Thread Thomas Orgis
Am Sat, 1 Mar 2014 09:56:46 +0100
schrieb Thomas Orgis thomas-fo...@orgis.org: 

 Great! So, folks, please check that
 
   http://mpg123.de/snapshot/mpg123-2014030100.tar.bz2
 
 does the trick with all decoders now. Performance numbers from the
 benchmark script would be nice. I'll release 1.18.1 after confirmation

Sorry, I meant 1.18.2, of course. Also, I fixed the benchmark script to
check the return value with

http://mpg123.de/snapshot/mpg123-20140301101020.tar.bz2

just in case things are still broken.


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-28 Thread Taihei Momma
OK, after some investigation with armhf cross environment and qemu, finally the 
current mpg123 svn (r3517) should work (including arm_nofpu decoder).

The point is .type directive. Without this directive, a linker doesn't 
distinguish arm functions from thumb functions, and interworking doesn't work 
properly.

Regards,
Taihei Momma

--
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/c4774c20-17e6-47a5-8cb2-c71dbeff3...@mac.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Mon, 24 Feb 2014 12:27:36 -0500
schrieb Lennart Sorensen lsore...@csclub.uwaterloo.ca: 

 Any help from this:
 
 Program received signal SIGILL, Illegal instruction.
 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
 48  vpush   {q4-q7}

What the ... ? This does not make sense. I (and actually, with I, I
mean Taihei who knows more about ARM assembly;-). The vpush pseudo
instruction should be harmless in our context. Quote from Taihei:

I don't know why. Actually vpush is a pseudo instruction, and
vpush {q4-q7} should be assembled into vstmdb sp!, {d8-d15}
(machine code is ed2d8b10). I'm curious how their assembler
(gnu as?) assembles into.


Well ... what does

sh$ objdump -S src/libmpg132/.libs/dct64_neon.o

say? Any hint from the debian ARM folks with experience about funny
behaviour for stand-alone assembly files? I also wonder if this is
generally broken on debian (since certain toolchain version) or on
certain CPUs only. I repeat: This code worked before:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35


Alrighty then,

Thomas



signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Taihei Momma
Wait, code alignment issue?

#0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
 ^
not a multiple of 4.

I've just committed a fix to mpg123 repository to align the function by 4 
bytes. I supposed this was fixed before, but actually dct64 part was omitted: 
http://www.mpg123.de/cgi-bin/scm/mpg123?view=revisionrevision=3003

I hope this should fix the SIGILL issue.

Regards,
Taihei Momma

--
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/b4a4b91e-2ab4-447f-835c-3be85d411...@mac.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Tue, 25 Feb 2014 17:37:41 +0900
schrieb Taihei Momma t...@mac.com: 

 #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
  ^
 not a multiple of 4.

Oh, d'oh! It could be that simple.

 I've just committed a fix to mpg123 repository 

I generated a new snapshot,

http://mpg123.org/snapshot/mpg123-20140225111416.tar.bz2 ,

and also attached the patch for the rather small change that hopefully
has a big effect. Care to test this?


Alrighty then,

Thomas

-- 
Thomas Orgis - Source Mage GNU/Linux Developer (http://www.sourcemage.org)
OrgisNetzOrganisation ---)=- http://orgis.org
GPG public key D446D524: http://thomas.orgis.org/public_key
Fingerprint: 7236 3885 A742 B736 E0C8 9721 9B4C 52BC D446 D524
Index: src/libmpg123/dct64_neon_float.S
===
--- src/libmpg123/dct64_neon_float.S	(Revision 3514)
+++ src/libmpg123/dct64_neon_float.S	(Revision 3515)
@@ -44,6 +44,7 @@
 	.word 1060439283
 	.word 1060439283
 	.globl ASM_NAME(dct64_real_neon)
+	ALIGN4
 ASM_NAME(dct64_real_neon):
 	vpush		{q4-q7}
 
Index: src/libmpg123/dct64_neon.S
===
--- src/libmpg123/dct64_neon.S	(Revision 3514)
+++ src/libmpg123/dct64_neon.S	(Revision 3515)
@@ -44,6 +44,7 @@
 	.word 1060439283
 	.word 1060439283
 	.globl ASM_NAME(dct64_neon)
+	ALIGN4
 ASM_NAME(dct64_neon):
 	vpush		{q4-q7}
 


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Lennart Sorensen
On Tue, Feb 25, 2014 at 11:18:50AM +0100, Thomas Orgis wrote:
 Am Tue, 25 Feb 2014 17:37:41 +0900
 schrieb Taihei Momma t...@mac.com: 
 
  #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
   ^
  not a multiple of 4.
 
 Oh, d'oh! It could be that simple.
 
  I've just committed a fix to mpg123 repository 
 
 I generated a new snapshot,
 
   http://mpg123.org/snapshot/mpg123-20140225111416.tar.bz2 ,
 
 and also attached the patch for the rather small change that hopefully
 has a big effect. Care to test this?
 
 
 Alrighty then,
 
 Thomas
 
 -- 
 Thomas Orgis - Source Mage GNU/Linux Developer (http://www.sourcemage.org)
 OrgisNetzOrganisation ---)=- http://orgis.org
 GPG public key D446D524: http://thomas.orgis.org/public_key
 Fingerprint: 7236 3885 A742 B736 E0C8 9721 9B4C 52BC D446 D524

 Index: src/libmpg123/dct64_neon_float.S
 ===
 --- src/libmpg123/dct64_neon_float.S  (Revision 3514)
 +++ src/libmpg123/dct64_neon_float.S  (Revision 3515)
 @@ -44,6 +44,7 @@
   .word 1060439283
   .word 1060439283
   .globl ASM_NAME(dct64_real_neon)
 + ALIGN4
  ASM_NAME(dct64_real_neon):
   vpush   {q4-q7}
  
 Index: src/libmpg123/dct64_neon.S
 ===
 --- src/libmpg123/dct64_neon.S(Revision 3514)
 +++ src/libmpg123/dct64_neon.S(Revision 3515)
 @@ -44,6 +44,7 @@
   .word 1060439283
   .word 1060439283
   .globl ASM_NAME(dct64_neon)
 + ALIGN4
  ASM_NAME(dct64_neon):
   vpush   {q4-q7}
  

root@rceng05:/mpg123-20140225111416# gdb --args 
/tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/*mp3
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as arm-linux-gnueabihf.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /tmp/mpginst/usr/local/bin/mpg123...done.
(gdb) run
Starting program: /tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/01\ -\ Bleed.mp3 
/convergence_-_points_of_view/02\ -\ Strike\ the\ end.mp3 
/convergence_-_points_of_view/03\ -\ Listen.mp3 
/convergence_-_points_of_view/04\ -\ Six\ feet\ under.mp3 
/convergence_-_points_of_view/05\ -\ Always\ the\ same.mp3 
/convergence_-_points_of_view/06\ -\ Breath.mp3 
/convergence_-_points_of_view/07\ -\ Vanished\ memories.mp3 
/convergence_-_points_of_view/08\ -\ Silent.mp3 
/convergence_-_points_of_view/09\ -\ Nothing\ else.mp3 
/convergence_-_points_of_view/10\ -\ Train\ to\ leave.mp3

Program received signal SIGILL, Illegal instruction.
0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
49  vpush   {q4-q7}
(gdb) disassemble
Dump of assembler code for function INT123_dct64_neon:
   0xb6fb9330 +0: vpush   {d8-d15}
   0xb6fb9334 +4: sub r3, pc, #140; 0x8c
   0xb6fb9338 +8: vld1.32 {d0-d3}, [r2]!
   0xb6fb933c +12:vld1.32 {d4-d7}, [r2]!
   0xb6fb9340 +16:vld1.32 {d8-d11}, [r2]!
   0xb6fb9344 +20:vld1.32 {d12-d15}, [r2]
   0xb6fb9348 +24:vld1.32 {d24-d27}, [r3 :128]!
   0xb6fb934c +28:vld1.32 {d28-d31}, [r3 :128]!
   0xb6fb9350 +32:vrev64.32   q4, q4
   0xb6fb9354 +36:vrev64.32   q5, q5
   0xb6fb9358 +40:vrev64.32   q6, q6
   0xb6fb935c +44:vrev64.32   q7, q7
   0xb6fb9360 +48:vswpd8, d9
   0xb6fb9364 +52:vswpd10, d11
   0xb6fb9368 +56:vswpd12, d13
   0xb6fb936c +60:vswpd14, d15
   0xb6fb9370 +64:vsub.f32q8, q0, q7
   0xb6fb9374 +68:vsub.f32q9, q1, q6
   0xb6fb9378 +72:vsub.f32q10, q2, q5
   0xb6fb937c +76:vsub.f32q11, q3, q4
   0xb6fb9380 +80:vadd.f32q0, q0, q7
   0xb6fb9384 +84:vadd.f32q1, q1, q6
   0xb6fb9388 +88:vadd.f32q2, q2, q5
   0xb6fb938c +92:vadd.f32q3, q3, q4
   0xb6fb9390 +96:vmul.f32q4, q8, q12
   0xb6fb9394 +100:   vmul.f32q5, q9, q13
   0xb6fb9398 +104:   vmul.f32q6, q10, q14
   0xb6fb939c +108:   vmul.f32q7, q11, q15
   0xb6fb93a0 +112:   vld1.32 {d24-d27}, [r3 :128]!
   0xb6fb93a4 +116:   vld1.32 {d28-d31}, [r3 :128]
   0xb6fb93a8 +120:   vrev64.32   q2, q2
   0xb6fb93ac +124:   vrev64.32   q3, q3
   0xb6fb93b0 +128:   vrev64.32   q6, q6
   0xb6fb93b4 +132:   vrev64.32   q7, q7
   0xb6fb93b8 +136:   vswpd4, d5
   0xb6fb93bc +140:   vswpd6, d7
   0xb6fb93c0 +144:   vswpd12, d13
   0xb6fb93c4 +148:   vswpd14, d15
   0xb6fb93c8 +152:   vsub.f32q8, q0, q3
   0xb6fb93cc +156:   vsub.f32q9, q1, q2
   

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Tue, 25 Feb 2014 11:20:06 -0500
schrieb Lennart Sorensen lsore...@csclub.uwaterloo.ca: 

 On Tue, Feb 25, 2014 at 11:18:50AM +0100, Thomas Orgis wrote:
  Am Tue, 25 Feb 2014 17:37:41 +0900
  schrieb Taihei Momma t...@mac.com: 
  
   #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
^
   not a multiple of 4.

  Index: src/libmpg123/dct64_neon.S
  ===
  --- src/libmpg123/dct64_neon.S  (Revision 3514)
  +++ src/libmpg123/dct64_neon.S  (Revision 3515)
  @@ -44,6 +44,7 @@
  .word 1060439283
  .word 1060439283
  .globl ASM_NAME(dct64_neon)
  +   ALIGN4
   ASM_NAME(dct64_neon):
  vpush   {q4-q7}
   

Now ... 

 Program received signal SIGILL, Illegal instruction.
 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
 49  vpush   {q4-q7}

That address didn't change. I suggest we better align the function
symbol itself, seems like we accidentally missed by one line:

ALIGN4
.globl ASM_NAME(dct64_neon)
ASM_NAME(dct64_neon):

looks better to me (at least that's how we did it for all other
functions;-). Care to test the current

http://mpg123.org/snapshot/mpg123-20140225173909.tar.bz2 ?

Sorry for the inconvenience, but I don't have a setup handy to test
this myself.


Alrighty then,

Thomas



signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Lennart Sorensen
On Wed, Feb 26, 2014 at 01:59:12AM +0900, Taihei Momma wrote:
 On 2014/02/26, at 1:44, Thomas Orgis wrote:
 
  That address didn't change.
 
 
 Well, the function itself is properly aligned (so my fix didn't take effect 
 anyway).
  0xb6fb9330 +0: vpush   {d8-d15}
  0xb6fb9334 +4: sub r3, pc, #140; 0x8c
 
 But the processor decoded the first instruction as 2-byte (thumb?), then 
 increased PC by 2. And it raised SIGILL at
  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
 
 
 So, I guess
  - assembler emits a bad machine code for vpush
 or
  - kernel is not configured properly to run vfp instructions

Is that a kernel option?  I wouldn't have thought armhf would run without
that (unless no floating point code is every being run).

Well the kernel that is running has this:

CONFIG_VFP=y
CONFIG_VFPv3=y
CONFIG_NEON=y

 I'd like to look into objdump -d result to check the machine code. 

Remember Debian armhf is -mthumb by default.  Any assembly code needs
to be properly flagged with .arm, or .syntax unified or whatever is
appropriate (still trying to wrap my head around this myself).  That is
if the assembly code is written in arm rather than thumb2 assembly.
At least that's my understanding so far.  If I add .syntax unified and
.fpu neon, then I no longer have to pass -mfpu neon to the CFLAGS to
get it to compile, but it still fails.  I am just about to test the new
version to see if that helps anything.

The disassembly in gcc shows 4 byte alignment, but the address of the
illegal instruction is 2 bytes past the vpush instruction's address.

In fact if I add -marm to the CFLAGS, then it seems to work, so the .S
files are not being flagged correctly as being arm code, or they are
missing thumb interworking bits or something.

root@rceng05:/mpg123-20140225173909# perl scripts/benchmark-cpu.pl src/mpg123 
/convergence_-_points_of_view/*mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON7.527.65

That was with CFLAGS=-g -mcpu=cortex-a15 -mfpu=neon -marm

Without -marm, it crashes with illegal instruction.  But since -mthumb
is the default on armhf, then passing -marm seems wrong.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140225174228.gm17...@csclub.uwaterloo.ca



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread peter green

Taihei Momma wrote:

But the processor decoded the first instruction as 2-byte (thumb?),

Note that debian armhf builds C code in thumb2 mode by default.


--
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/530cd784.8010...@p10link.net



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-24 Thread Lennart Sorensen
On Sat, Feb 22, 2014 at 10:05:35AM +0100, Thomas Orgis wrote:
 Am Fri, 21 Feb 2014 11:25:12 -0500
 schrieb Lennart Sorensen lsore...@csclub.uwaterloo.ca: 
 
  Testing with the neon build I get a return code of 4, and it seems to
  be failing to run.  It was a pain to even get it to compile.  Using just
  the configure option, the assembler complained about the NEON instructions
  being invalid for the chosen cpu type.  Adding -mfpu=neon to the CFLAGS
  made it able to compile, but it still crashes with illegal instruction.
  I tried with CFLAGS set to -mcpu=cortex-a15 -mfpu=neon, and that still
  gives illegal instruction when running it.
 
 This is weird. What happened in debian side since
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35 ? We have
 the current code working on this setup:
 
 device: iPod touch 4G with iOS 5.1.1
 toolchain: gcc 4.2.1(from Xcode 3.2.6) on OSX 10.6.8, clang 3.3(from Xcode 
 5.0.2) on OSX 10.9.1 (double checked)
 configure script option: --host=armv7-apple-darwin --with-cpu=arm_nofpu[neon] 
 --with-audio=dummy --disable-shared --enable-static [--enable-int-quality]
 
 Taihei also just checked the compliance of the decoder choices
 including NEON. That illegal instruction ... care to fire up the
 debugger to tell us where it actually occurs? The NEON assembly is
 written as plain assembler input (cpp + as), you can see the
 instructions we use right there and it doesn't differ from iOS.
 
  It might be a good idea to have the benchmark script actuall check the
  return code of system()
 
 Yes.
 
  I was building and testing under Debian armhf sid.
  gcc (Debian 4.8.2-16) 4.8.2
  
  CPU is a dual Cortex-A15 1.5GHz (TI OMAP 57xx).
 
 
 Alrighty then,

Any help from this:

(gdb) run
Starting program: /tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/01\ -\ Bleed.mp3 
/convergence_-_points_of_view/02\ -\ Strike\ the\ end.mp3 
/convergence_-_points_of_view/03\ -\ Listen.mp3 
/convergence_-_points_of_view/04\ -\ Six\ feet\ under.mp3 
/convergence_-_points_of_view/05\ -\ Always\ the\ same.mp3 
/convergence_-_points_of_view/06\ -\ Breath.mp3 
/convergence_-_points_of_view/07\ -\ Vanished\ memories.mp3 
/convergence_-_points_of_view/08\ -\ Silent.mp3 
/convergence_-_points_of_view/09\ -\ Nothing\ else.mp3 
/convergence_-_points_of_view/10\ -\ Train\ to\ leave.mp3

Program received signal SIGILL, Illegal instruction.
0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
48  vpush   {q4-q7}
(gdb) where
#0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
#1  0xb6fab71c in INT123_synth_1to1_stereo_neon (bandPtr_l=optimized out, 
bandPtr_r=0x36400, fr=0x291d8) at synth.c:892
#2  0xb6fb8328 in INT123_do_layer3 (fr=0x291d8) at layer3.c:2060
#3  0xb6fa725e in decode_the_frame (fr=fr@entry=0x291d8) at libmpg123.c:699
#4  0xb6fa823e in mpg123_decode_frame_64 (mh=0x291d8, num=num@entry=0x28490 
framenum, audio=audio@entry=0xbefff8e8, bytes=bytes@entry=0xbefff8f0) at 
libmpg123.c:838
#5  0x00012fce in play_frame () at mpg123.c:667
#6  0xb2f0 in main (sys_argc=optimized out, sys_argv=optimized out) at 
mpg123.c:1177

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20140224172736.gk17...@csclub.uwaterloo.ca



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-22 Thread Thomas Orgis
Am Fri, 21 Feb 2014 11:25:12 -0500
schrieb Lennart Sorensen lsore...@csclub.uwaterloo.ca: 

 Testing with the neon build I get a return code of 4, and it seems to
 be failing to run.  It was a pain to even get it to compile.  Using just
 the configure option, the assembler complained about the NEON instructions
 being invalid for the chosen cpu type.  Adding -mfpu=neon to the CFLAGS
 made it able to compile, but it still crashes with illegal instruction.
 I tried with CFLAGS set to -mcpu=cortex-a15 -mfpu=neon, and that still
 gives illegal instruction when running it.

This is weird. What happened in debian side since
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35 ? We have
the current code working on this setup:

device: iPod touch 4G with iOS 5.1.1
toolchain: gcc 4.2.1(from Xcode 3.2.6) on OSX 10.6.8, clang 3.3(from Xcode 
5.0.2) on OSX 10.9.1 (double checked)
configure script option: --host=armv7-apple-darwin --with-cpu=arm_nofpu[neon] 
--with-audio=dummy --disable-shared --enable-static [--enable-int-quality]

Taihei also just checked the compliance of the decoder choices
including NEON. That illegal instruction ... care to fire up the
debugger to tell us where it actually occurs? The NEON assembly is
written as plain assembler input (cpp + as), you can see the
instructions we use right there and it doesn't differ from iOS.

 It might be a good idea to have the benchmark script actuall check the
 return code of system()

Yes.

 I was building and testing under Debian armhf sid.
 gcc (Debian 4.8.2-16) 4.8.2
 
 CPU is a dual Cortex-A15 1.5GHz (TI OMAP 57xx).


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-21 Thread Lennart Sorensen
On Fri, Feb 21, 2014 at 01:29:40AM +, peter green wrote:
 Thomas Orgis wrote:
 So, I got conversion to float implemented now and tested with the
 generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
 too;-) If you'd like to check the current snapshot of mpg123,
 
  http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,
 
 you hopefull will find that any normal build of mpg123 (unless
 specifying --disable-float explicitly) now offers all usual formats. As
 a bonus, I even implemented the 8 Bit A-Law output, which has always
 just been a placeholder (nobody missed it, apparently).
 
 I'd be interested on some timings of
 
  mpg123 -t -e s16 test.mp3
  mpg123 -t -e f32 test.mp3
 
 with the various builds you'll do for the ARM variants. Best would be running
 
  perl scripts/benchmark-cpu.pl src/mpg123 
  convergence_-_points_of_view/*.mp3
 
 with
 
  http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz
 
 as reference album, as mentioned on
 
  http://mpg123.orgis.org/benchmarking.shtml
 
 to be able to compare the performance of the code and machine to
 others. This yields output like this:
 
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decoder t_s16/s t_f32/s
 x86-64   3.394.05
 generic  6.156.01
 generic_dither   6.365.97
 
 ... or this, with --with-cpu=generic_fpu:
 
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decoder t_s16/s t_f32/s
 generic  6.146.29
 
 (on a Core2Duo machine)
 Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat
 out of date) debian sid armhf chroot I tested with perl
 scripts/benchmark-cpu.pl src/mpg123
 convergence_-_points_of_view/*.mp3 in the following configurations.
 
 Built with ./configure --with-cpu=arm_nofpu
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 ARM 30.36   34.26
 
 Built with ./configure --with-cpu=generic_fpu
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 generic 148.66  138.49
 
 Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 NEON0.030.04
 
 I found the neon result unbelivable so I decided to run the test
 program you mentioned to me in my private mail asking about how to
 run the benchmarks.
 root@plugwash:/mpg123-test#
 LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/
 perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
 
  Layer 1 
 -- 16 bit signed integer output
 fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
 fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
 fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
 fl4.bit:RMS=1.510105e-01 (FAIL) maxdiff=5.277658e-01 (FAIL)
 fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
 fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
 fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
 fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
 -- 32 bit integer output
 fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
 fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
 fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
 fl4.bit:RMS=1.513207e-01 (FAIL) maxdiff=4.787517e-01 (FAIL)
 fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
 fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
 fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
 fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
 -- 24 bit integer output
 fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
 fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
 fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
 fl4.bit:RMS=1.494715e-01 (FAIL) maxdiff=4.984906e-01 (FAIL)
 fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
 fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
 fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
 fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
 -- 32 bit floating point output
 fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
 fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
 fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
 fl4.bit:RMS=1.137037e-01 (FAIL) maxdiff=4.459082e-01 (FAIL)
 fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
 fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
 fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
 fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
 
  Layer 2 
 -- 16 bit signed integer output
 fl10.bit:  

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread Thomas Orgis
   I see. In that case, I'll have to leave the package as it until
   something along those lines is implemented.

So, I got conversion to float implemented now and tested with the
generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
too;-) If you'd like to check the current snapshot of mpg123,

http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,

you hopefull will find that any normal build of mpg123 (unless
specifying --disable-float explicitly) now offers all usual formats. As
a bonus, I even implemented the 8 Bit A-Law output, which has always
just been a placeholder (nobody missed it, apparently).

I'd be interested on some timings of

mpg123 -t -e s16 test.mp3
mpg123 -t -e f32 test.mp3

with the various builds you'll do for the ARM variants. Best would be running

perl scripts/benchmark-cpu.pl src/mpg123 
convergence_-_points_of_view/*.mp3

with

http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz

as reference album, as mentioned on

http://mpg123.orgis.org/benchmarking.shtml

to be able to compare the performance of the code and machine to
others. This yields output like this:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
x86-64  3.394.05
generic 6.156.01
generic_dither  6.365.97

... or this, with --with-cpu=generic_fpu:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.146.29

(on a Core2Duo machine).

 Yes, you can do that - build several copies of the library and use the
 hwcaps / auxv approach to pick the best one for the hardware at link
 time.
 
 NEON detection may come... but if we have linker selection, that would
 be covered right now.
 
 Yup.

Seconding the second part: Linker selection it is. NEON runtime
detection just isn't fun in user code.

The bright side: If the multiple builds are setup and tested, I can
safely release mpg123-1.19.0 with the changes and we finally have this
settled.


Alrighty then,

Thomas


signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread peter green

Thomas Orgis wrote:

So, I got conversion to float implemented now and tested with the
generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
too;-) If you'd like to check the current snapshot of mpg123,

http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,

you hopefull will find that any normal build of mpg123 (unless
specifying --disable-float explicitly) now offers all usual formats. As
a bonus, I even implemented the 8 Bit A-Law output, which has always
just been a placeholder (nobody missed it, apparently).

I'd be interested on some timings of

mpg123 -t -e s16 test.mp3
mpg123 -t -e f32 test.mp3

with the various builds you'll do for the ARM variants. Best would be running

perl scripts/benchmark-cpu.pl src/mpg123 
convergence_-_points_of_view/*.mp3

with

http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz

as reference album, as mentioned on

http://mpg123.orgis.org/benchmarking.shtml

to be able to compare the performance of the code and machine to
others. This yields output like this:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
x86-64  3.394.05
generic 6.156.01
generic_dither  6.365.97

... or this, with --with-cpu=generic_fpu:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.146.29

(on a Core2Duo machine)
Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out 
of date) debian sid armhf chroot I tested with perl 
scripts/benchmark-cpu.pl src/mpg123 convergence_-_points_of_view/*.mp3 
in the following configurations.


Built with ./configure --with-cpu=arm_nofpu
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 30.36   34.26

Built with ./configure --with-cpu=generic_fpu
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 148.66  138.49

Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON0.030.04

I found the neon result unbelivable so I decided to run the test program 
you mentioned to me in my private mail asking about how to run the 
benchmarks.
root@plugwash:/mpg123-test# 
LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ 
perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123


 Layer 1 
-- 16 bit signed integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.510105e-01 (FAIL) maxdiff=5.277658e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
-- 32 bit integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.513207e-01 (FAIL) maxdiff=4.787517e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
-- 24 bit integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.494715e-01 (FAIL) maxdiff=4.984906e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
-- 32 bit floating point output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.137037e-01 (FAIL) maxdiff=4.459082e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)

 Layer 2 
-- 16 bit signed integer output
fl10.bit:   RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
fl11.bit:   RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
fl12.bit:   RMS=3.528948e-02 

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread Thomas Orgis
I'm adding the mpg123 assembly guru to the CC list, as I imagine he
would be interested in why his ARM NEON code doesn't work on a Cortex
A8 chip here. Needless to say, it worked before (on other systems).
Also, the precision of the arm_nofpu code does not look right. This
topic is now shifting towards mpg123 development, but as long as it's
only on this debian platform that it's not working, I guess it is
on-topic for debian, too.

Am Fri, 21 Feb 2014 01:29:40 +
schrieb peter green plugw...@p10link.net: 

 Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out 
 of date) debian sid armhf chroot

 Built with ./configure --with-cpu=arm_nofpu
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 ARM 30.36   34.26
 
 Built with ./configure --with-cpu=generic_fpu
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 generic 148.66  138.49

That seems to prove a point about trying to use the nofpu build. How
does --with-cpu=generic_nofpu stack up for this machine? Also regarding
the compliance test later on ...

 Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
 #mpg123 benchmark (user CPU time in seconds for decoding)
 #decodert_s16/s t_f32/s
 NEON0.030.04

Yeah, as we see

 Illegal instruction

this is most interesting. I refer to Taihei, as I don't have a NEON
setup at hand (need to get a debian chroot going on my phone).

 root@plugwash:/mpg123-test# 
 LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ 
 perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
 
  Layer 1 
 -- 16 bit signed integer output
 fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
 fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)

That doesn't look pretty to me. Does it _sound_ like (metal) music (in
case no audio chip there, decode to WAV with -w output.wav, I happily
accept snippets, limit number of frames via -n 500).

 root@plugwash:/mpg123-test# 
 LD_LIBRARY_PATH=/mpg123-20140220132548-generic_fpu/src/libmpg123/.libs/ 
 perl compliance.pl /mpg123-20140220132548-generic_fpu/src/mpg123
 
  Layer 1 
 -- 16 bit signed integer output
 fl1.bit:RMS=8.683659e-06 (PASS) maxdiff=1.525879e-05 (PASS)
 fl2.bit:RMS=8.686681e-06 (PASS) maxdiff=1.525879e-05 (PASS)
 fl3.bit:RMS=8.737660e-06 (PASS) maxdiff=1.525879e-05 (PASS)

Yes, that is better. Can you compare --with-cpu=generic_nofpu to
isolate this to the assembly version for ARM? This is how it looks with
generic_nofpu on my box:

sh$ perl ../test/compliance.pl src/mpg123

 Layer 1 
-- 16 bit signed integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
-- 32 bit integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
-- 24 bit integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
-- 32 bit floating point output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)

 

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Thomas Orgis
Am Mon, 17 Feb 2014 10:00:48 +0200
schrieb Riku Voipio riku.voi...@iki.fi: 

 Thanks Peter for explaining, this was how I ended up the suggestion
 in the bug.
 
  I see. In that case, I'll have to leave the package as it until
  something along those lines is implemented.
 
 Yes. The ideal solution is for the upstream to implement cpu runtime
 detection that:
 
 1) uses neon if it is available
 2) falls back to fixed point if app requested 16-bit playback
 3) finally falls back to generic fpu code if neither of above applies
 
 Any packaging level workaround is going to be suboptimal for someone.

Isn't the approach for the linker to select libraries like libavcodec
on the table anymore? I see that I'll have to add that float conversion
code to keep the features along all builds, but selecting a vfp and
non-vfp variant for fixed point or floating point via the linker seems
like the most clean approach you are going to get.

NEON detection may come... but if we have linker selection, that would
be covered right now.

So ... can I get away with adding that stupid float conversion, so
folks have reasonable performance in likely applications of debian on
ARM, please? ;-)


Alrighty then,

Thomas

PS: I'll have to remove those experimental markings from the nofpu
variants in configure help. They are getting old.



signature.asc
Description: PGP signature


Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Steve McIntyre
On Mon, Feb 17, 2014 at 11:43:16AM +0100, Thomas Orgis wrote:
Am Mon, 17 Feb 2014 10:00:48 +0200
schrieb Riku Voipio riku.voi...@iki.fi: 

 Thanks Peter for explaining, this was how I ended up the suggestion
 in the bug.
 
  I see. In that case, I'll have to leave the package as it until
  something along those lines is implemented.
 
 Yes. The ideal solution is for the upstream to implement cpu runtime
 detection that:
 
 1) uses neon if it is available
 2) falls back to fixed point if app requested 16-bit playback
 3) finally falls back to generic fpu code if neither of above applies
 
 Any packaging level workaround is going to be suboptimal for someone.

Isn't the approach for the linker to select libraries like libavcodec
on the table anymore? I see that I'll have to add that float conversion
code to keep the features along all builds, but selecting a vfp and
non-vfp variant for fixed point or floating point via the linker seems
like the most clean approach you are going to get.

Yes, you can do that - build several copies of the library and use the
hwcaps / auxv approach to pick the best one for the hardware at link
time.

NEON detection may come... but if we have linker selection, that would
be covered right now.

Yup.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
Is there anybody out there?


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20140217123430.ga12...@einval.com



Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Sune Vuorela
On 2014-02-17, Steve McIntyre st...@einval.com wrote:
 Yes, you can do that - build several copies of the library and use the
 hwcaps / auxv approach to pick the best one for the hardware at link
 time.

NEON detection may come... but if we have linker selection, that would
be covered right now.

 Yup.

Qt is heading from doing autodetection at runtime to the
hwcaps/auxv approach on all archs, because runtime detection also can
have its issues sometimes, especially when you have inlinable code ...

(Yes. I can expand if requested)

/Sune


-- 
To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/ldt127$v3t$1...@ger.gmane.org