Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-06-04 Thread Graham Inggs
Hi Dirk

On Thu, 4 Jun 2020 at 14:15, Dirk Eddelbuettel  wrote:
> Good to know, thanks for the update!

You are welcome!

On Wed, 3 Jun 2020 at 18:36, Dirk Eddelbuettel  wrote:
>   According to an Intel report back from 2011, -Bsymbolic-functions "is
>   a dangerous option which can often result in some non-intuitive side
>   effects".
>   The report explicitly shows various problems with the option.
>   
> https://software.intel.com/content/www/us/en/develop/articles/performance-tools-for-software-developers-bsymbolic-can-cause-dangerous-side-effects.html
>
>   In the light of the above, it's a real wonder that Ubuntu uses the
>   option at all.

By the way, note that -Bsymbolic != -Bsymbolic-functions

>From dpgk's changelog, it seems Ubuntu have linking with
-Bsymbolic-functions as default since about 2008.
codesearch.debian.net only shows about 62 source packages (of the 10s
of thousands in Debian) where this needs to be overridden.
It seems overriding it in openblas only became necessary since the
restructuring in 0.3.7+ds-2, so no versions prior to Focal should be
affected.

Regards
Graham



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-06-04 Thread Dirk Eddelbuettel


Hi Graham,

On 4 June 2020 at 10:50, Graham Inggs wrote:
| On Wed, 3 Jun 2020 at 18:36, Dirk Eddelbuettel  wrote:
| > Graham do you think you can get it turned off for at least openblas?
| 
| Already fixed in Groovy [1] and the Stable Release Update [2] for
| Focal is in the queue [3] awaiting review by the SRU Team.

Good to know, thanks for the update!

Dirk
 
| Regards
| Graham
| 
| 
| [1] https://launchpad.net/ubuntu/+source/openblas
| [2] https://wiki.ubuntu.com/StableReleaseUpdates#Procedure
| [3] https://launchpad.net/ubuntu/focal/+queue?queue_state=1

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-06-04 Thread Graham Inggs
Hi Dirk

On Wed, 3 Jun 2020 at 18:36, Dirk Eddelbuettel  wrote:
> Graham do you think you can get it turned off for at least openblas?

Already fixed in Groovy [1] and the Stable Release Update [2] for
Focal is in the queue [3] awaiting review by the SRU Team.

Regards
Graham


[1] https://launchpad.net/ubuntu/+source/openblas
[2] https://wiki.ubuntu.com/StableReleaseUpdates#Procedure
[3] https://launchpad.net/ubuntu/focal/+queue?queue_state=1



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-06-03 Thread Dirk Eddelbuettel


Conrad (of Armadillo fame) sent me this (and ok'ed passing it on):

  According to an Intel report back from 2011, -Bsymbolic-functions "is
  a dangerous option which can often result in some non-intuitive side
  effects".
  The report explicitly shows various problems with the option.
  
https://software.intel.com/content/www/us/en/develop/articles/performance-tools-for-software-developers-bsymbolic-can-cause-dangerous-side-effects.html

  In the light of the above, it's a real wonder that Ubuntu uses the
  option at all.

  Perhaps Ubuntu developers meant well, but are not aware of the side effects ?

Graham do you think you can get it turned off for at least openblas?

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-06-01 Thread Dirk Eddelbuettel


For completeness, Conrad Sanderson (who is the main author of Armadillo,
which is used inter alia by MLPACK) posted another good summary with links to
other projects including also a launchpad bug report:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/1870138

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Dirk Eddelbuettel


On 30 May 2020 at 21:39, Sébastien Villemot wrote:
| Control: tags -1 + patch
| 
| Hi Graham,
| 
| Le samedi 30 mai 2020 à 15:09 +0200, Graham Inggs a écrit :
| 
| > I was able to reproduce this in Ubuntu 20.04 on i7-2600 with the
| > Rscript -e "example(solve)"
| > test case.  Rebuilding with
| > export DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
| > in debian/rules solved it for me.
| > 
| > On Sat, 30 May 2020 at 09:15, Sébastien Villemot <
| > sebast...@debian.org
| > > wrote:
| > > Just one question: did you try to rebuild it from source without
| > > changing anything? Maybe it’s just the rebuild that fixed it, and not
| > > the flag change.
| > 
| > I confirm that a no-change rebuild had no effect.
| 
| Thanks to you (and to Mo) for doing those rebuilds.
| 
| > Could we get this fixed in Debian?  At worst, it should be a no-op in
| > Debian, and should someone try rebuilding locally with -Bsymbolic-
| > functions they won't fall into this trap.
| 
| Sure, we’ll fix it in Debian (a similar fix is present in src:lapack,
| by the way).
| 
| It could however take some time, because we are currently waiting for
| NEW clearance from the most recent upload of src:openblas.

Awesome too. Really good to see everybody remains on top of this.

Dirk
 
| Best,
| 
| -- 
| ⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
| ⣾⠁⢠⠒⠀⣿⡁  Debian Developer
| ⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
| ⠈⠳⣄  https://www.debian.org
| 
| [DELETED ATTACHMENT signature.asc, application/pgp-signature]

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Dirk Eddelbuettel


On 30 May 2020 at 15:09, Graham Inggs wrote:
| Hi
| 
| I was able to reproduce this in Ubuntu 20.04 on i7-2600 with the
| Rscript -e "example(solve)"
| test case.  Rebuilding with
| export DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
| in debian/rules solved it for me.
| 
| On Sat, 30 May 2020 at 09:15, Sébastien Villemot  wrote:
| > Just one question: did you try to rebuild it from source without
| > changing anything? Maybe it’s just the rebuild that fixed it, and not
| > the flag change.
| 
| I confirm that a no-change rebuild had no effect.
| 
| I suspect this is the same problem that was discovered in LP: #1860601
| [1], but I was unable to reproduce it locally.
| I'll take care of fixing this in Ubuntu Groovy and SRU for Ubuntu Focal.

Lovely!  I was wondering how we could possibly reach out and get to someone!

Dirk

| Could we get this fixed in Debian?  At worst, it should be a no-op in
| Debian, and should someone try rebuilding locally with -Bsymbolic-
| functions they won't fall into this trap.
| 
| Regards
| Graham
| 
| 
| [1] https://pad.lv/1860601

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Sébastien Villemot
Control: tags -1 + patch

Hi Graham,

Le samedi 30 mai 2020 à 15:09 +0200, Graham Inggs a écrit :

> I was able to reproduce this in Ubuntu 20.04 on i7-2600 with the
> Rscript -e "example(solve)"
> test case.  Rebuilding with
> export DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
> in debian/rules solved it for me.
> 
> On Sat, 30 May 2020 at 09:15, Sébastien Villemot <
> sebast...@debian.org
> > wrote:
> > Just one question: did you try to rebuild it from source without
> > changing anything? Maybe it’s just the rebuild that fixed it, and not
> > the flag change.
> 
> I confirm that a no-change rebuild had no effect.

Thanks to you (and to Mo) for doing those rebuilds.

> Could we get this fixed in Debian?  At worst, it should be a no-op in
> Debian, and should someone try rebuilding locally with -Bsymbolic-
> functions they won't fall into this trap.

Sure, we’ll fix it in Debian (a similar fix is present in src:lapack,
by the way).

It could however take some time, because we are currently waiting for
NEW clearance from the most recent upload of src:openblas.

Best,

-- 
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
⠈⠳⣄  https://www.debian.org



signature.asc
Description: This is a digitally signed message part


Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Graham Inggs
Hmm, my last reply to this bug seems to have gone astray.



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Mo Zhou
> Just one question: did you try to rebuild it from source without
> changing anything? Maybe it’s just the rebuild that fixed it, and not
> the flag change.

Rebuilt without change: hang with libopenblas0-pthread

Rebuilt with USE_TLS=0: hang

Rebuilt with USE_SIMPLE_THREADED_LEVEL3=1: hang

Rebuilt with "-Wl,-Bsymbolic-functions" stripped: pass



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Graham Inggs
Hi

I was able to reproduce this in Ubuntu 20.04 on i7-2600 with the
Rscript -e "example(solve)"
test case.  Rebuilding with
export DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
in debian/rules solved it for me.

On Sat, 30 May 2020 at 09:15, Sébastien Villemot  wrote:
> Just one question: did you try to rebuild it from source without
> changing anything? Maybe it’s just the rebuild that fixed it, and not
> the flag change.

I confirm that a no-change rebuild had no effect.

I suspect this is the same problem that was discovered in LP: #1860601
[1], but I was unable to reproduce it locally.
I'll take care of fixing this in Ubuntu Groovy and SRU for Ubuntu Focal.

Could we get this fixed in Debian?  At worst, it should be a no-op in
Debian, and should someone try rebuilding locally with -Bsymbolic-
functions they won't fall into this trap.

Regards
Graham


[1] https://pad.lv/1860601



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Mo Zhou


> Just one question: did you try to rebuild it from source without
> changing anything? Maybe it’s just the rebuild that fixed it, and not
> the flag change.

Rebuilt without change: hang with libopenblas0-pthread

Rebuilt with USE_TLS=0: hang

Rebuilt with USE_SIMPLE_THREADED_LEVEL3=1: hang

Rebuilt with "-Wl,-Bsymbolic-functions" stripped: pass



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-30 Thread Sébastien Villemot
Le samedi 30 mai 2020 à 01:19 +, Mo Zhou a écrit :
> Control: tags -1 -moreinfo
> 
> Hi Sébastien,
> 
> Good catch! I tried to remove the mentioned LDFLAG
> 
>   DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
> 
> and rebuilt the openblas 3.8 package.
> 
> Then deadlock issue disappeared.

Just one question: did you try to rebuild it from source without
changing anything? Maybe it’s just the rebuild that fixed it, and not
the flag change.

-- 
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
⠈⠳⣄  https://www.debian.org



signature.asc
Description: This is a digitally signed message part


Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-29 Thread Dirk Eddelbuettel


On 30 May 2020 at 01:19, Mo Zhou wrote:
| Control: tags -1 -moreinfo
| 
| Hi Sébastien,
| 
| Good catch! I tried to remove the mentioned LDFLAG
| 
|   DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"
| 
| and rebuilt the openblas 3.8 package.
| 
| Then deadlock issue disappeared.

Wow. Nice work.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-29 Thread Mo Zhou
Control: tags -1 -moreinfo

Hi Sébastien,

Good catch! I tried to remove the mentioned LDFLAG

  DEB_LDFLAGS_MAINT_STRIP="-Wl,-Bsymbolic-functions"

and rebuilt the openblas 3.8 package.

Then deadlock issue disappeared.



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-29 Thread Sébastien Villemot
Le vendredi 29 mai 2020 à 14:55 +0200, Sébastien Villemot a écrit :

> Since Ubuntu did not modify the Debian source package, it means that
> it’s the way of generating the openblas binary within Ubuntu that
> causes the bug.
> 
> Compilation flags (as given by dpkg-buildflags) are currently the same
> between Debian and Ubuntu, so it does not come from that.

Erratum: LDFLAGS is different, since Ubuntu adds -Wl,-Bsymbolic-
functions.

I think this has been like that for quite some time, so it seems
unlikely to be the source of the problem, but a test could be to
recompile openblas on Ubuntu with DEB_LDFLAGS_MAINT_STRIP="-Wl,-
Bsymbolic-functions" in the environment.

-- 
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
⠈⠳⣄  https://www.debian.org



signature.asc
Description: This is a digitally signed message part


Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-29 Thread Sébastien Villemot
Le vendredi 29 mai 2020 à 01:13 +, Mo Zhou a écrit :

> Clarification: possibly a Ubuntu bug

> The way to reproduce with nspawn/chroot + ubuntu focal (20.04)
> if you don't have docker
> 
> 1. mkdir Focal
> 2. debootstrap focal Focal/
> 3. systemd-nspawn -D Focal
> 4. apt update -y; apt upgrade -y
> 5. apt install -y r-base-core
> 6. Rscript -e "example(solve)"  # good with netlib
> 7. apt install -y libopenblas-dev
> 8. Rscript -e "example(solve)"  # hang
> 
> The way to reproduce with *. + debian
> 
> 1. Not yet reproducible.

Thanks Mo for spotting this.

Indeed I can replicate the bug in a focal chroot, but not in a sid
chroot.

Moreover, if I install the Debian binary package for libopenblas0-
pthread (0.3.8+ds-1) on top of a focal chroot, then the bug disappears.

Since Ubuntu did not modify the Debian source package, it means that
it’s the way of generating the openblas binary within Ubuntu that
causes the bug.

Compilation flags (as given by dpkg-buildflags) are currently the same
between Debian and Ubuntu, so it does not come from that.

Maybe this has to do with internal default flags of GCC? Or some other
factor in the toolchain? I’m not familiar enough with Ubuntu to be able
to answer that.

-- 
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
⠈⠳⣄  https://www.debian.org



signature.asc
Description: This is a digitally signed message part


Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-28 Thread Dirk Eddelbuettel


On 29 May 2020 at 01:13, Mo Zhou wrote:
| Control: severity -1 important
| Control: tags -1 +moreinfo
| Clarification: possibly a Ubuntu bug

You may be right!  I just double checked the earliest report (on the
r-sig-debian list) and it too was on Ubuntu 20.04!

| Hello guys,
| 
| The way to reproduce with docker + ubuntu devel (20.10)
| 
| 1. docker image pull ubuntu:devel
| 2. docker run -ti ubuntu:devel
| 3. apt update -y ; apt upgrade -y
| 4. apt install -y r-base-core
| 5. Rscript -e "example(solve)"  # good with netlib
| 6. apt install -y libopenblas-dev
| 7. Rscript -e "example(solve)"  # hang
| 
| The way to reproduce with nspawn/chroot + ubuntu focal (20.04)
| if you don't have docker
| 
| 1. mkdir Focal
| 2. debootstrap focal Focal/
| 3. systemd-nspawn -D Focal
| 4. apt update -y; apt upgrade -y
| 5. apt install -y r-base-core
| 6. Rscript -e "example(solve)"  # good with netlib
| 7. apt install -y libopenblas-dev
| 8. Rscript -e "example(solve)"  # hang
| 
| The way to reproduce with *. + debian
| 
| 1. Not yet reproducible.

That is ... interesting.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-28 Thread Mo Zhou
Control: severity -1 important
Control: tags -1 +moreinfo
Clarification: possibly a Ubuntu bug

Hello guys,

The way to reproduce with docker + ubuntu devel (20.10)

1. docker image pull ubuntu:devel
2. docker run -ti ubuntu:devel
3. apt update -y ; apt upgrade -y
4. apt install -y r-base-core
5. Rscript -e "example(solve)"  # good with netlib
6. apt install -y libopenblas-dev
7. Rscript -e "example(solve)"  # hang

The way to reproduce with nspawn/chroot + ubuntu focal (20.04)
if you don't have docker

1. mkdir Focal
2. debootstrap focal Focal/
3. systemd-nspawn -D Focal
4. apt update -y; apt upgrade -y
5. apt install -y r-base-core
6. Rscript -e "example(solve)"  # good with netlib
7. apt install -y libopenblas-dev
8. Rscript -e "example(solve)"  # hang

The way to reproduce with *. + debian

1. Not yet reproducible.



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-28 Thread Dirk Eddelbuettel


On 28 May 2020 at 16:55, Sébastien Villemot wrote:
| Hi Dirk,
| 
| Le jeudi 28 mai 2020 à 07:07 -0500, Dirk Eddelbuettel a écrit :
| > Package: libopenblas-dev
| > Version: 0.3.8+ds-1
| > Severity: serious
| 
| > In short, when libopenblas-dev is installed (as e.g. from r-base-dev as a
| > dependency from libblas-dev, liblapack-dev) then
| > 
| > libopenblas0-pthread
| > 
| > is installed first via our depends ranking as libopenblas-pthread-dev comes
| > first.
| > 
| > This has served us well over the years but can exhibit a bug which I for
| > example saw with (Ubuntun's) 0.3.8+ds-1 package. Running a simple
| > 
| > example(solve)
| > 
| > in R hangs in an unsuspendable session (ie no Ctrl-C, kill is needed).
| > Simplest test is on the command-line via
| > 
| > $ Rscript -e 'example(solve)'
| > 
| > Removing libopenbkas0-pthread and installing libopenblas-openmp-dev helps. 
As
| > does a manual reordering of the alternatives.
| > 
| > This bug is reproducible on my system with a i7-8700k.
| 
| I’ve tried to reproduce it on my i7-8700K, without success. I used a
| clean sid chroot (with r-base 4.0.0-3 and openblas 0.3.9+ds-1).
| Downgrading to openblas 0.3.8+ds-1 (which is the version against which
| you reported the bug) does not change anything.
| 
| So it’s not clear that this bug is tied to a specific hardware. At
| least, a given CPU model is not a guarantee of reproducibility.

Darn.  What next?  Shall we compare environment variables (I don't set too
many and can't think of one that does it.)

Or shall we be speculative and try a 'private' package with the two defines
Mo suggested to see if that helps on my side?

Did the Parisian student ever contact you?

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-28 Thread Sébastien Villemot
Hi Dirk,

Le jeudi 28 mai 2020 à 07:07 -0500, Dirk Eddelbuettel a écrit :
> Package: libopenblas-dev
> Version: 0.3.8+ds-1
> Severity: serious

> In short, when libopenblas-dev is installed (as e.g. from r-base-dev as a
> dependency from libblas-dev, liblapack-dev) then
> 
> libopenblas0-pthread
> 
> is installed first via our depends ranking as libopenblas-pthread-dev comes
> first.
> 
> This has served us well over the years but can exhibit a bug which I for
> example saw with (Ubuntun's) 0.3.8+ds-1 package. Running a simple
> 
> example(solve)
> 
> in R hangs in an unsuspendable session (ie no Ctrl-C, kill is needed).
> Simplest test is on the command-line via
> 
> $ Rscript -e 'example(solve)'
> 
> Removing libopenbkas0-pthread and installing libopenblas-openmp-dev helps. As
> does a manual reordering of the alternatives.
> 
> This bug is reproducible on my system with a i7-8700k.

I’ve tried to reproduce it on my i7-8700K, without success. I used a
clean sid chroot (with r-base 4.0.0-3 and openblas 0.3.9+ds-1).
Downgrading to openblas 0.3.8+ds-1 (which is the version against which
you reported the bug) does not change anything.

So it’s not clear that this bug is tied to a specific hardware. At
least, a given CPU model is not a guarantee of reproducibility.

-- 
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  https://sebastien.villemot.name
⠈⠳⣄  https://www.debian.org



signature.asc
Description: This is a digitally signed message part


Bug#961725: libopenblas-dev: On some cpus, openmp and pthread dead-lock

2020-05-28 Thread Dirk Eddelbuettel


Package: libopenblas-dev
Version: 0.3.8+ds-1
Severity: serious

This is a somewhat 'late' bug report followed several on and off discussion
threads on debian-science and/or debian-r (and started on the r-sig-debian
list from the R Project).

In short, when libopenblas-dev is installed (as e.g. from r-base-dev as a
dependency from libblas-dev, liblapack-dev) then

libopenblas0-pthread

is installed first via our depends ranking as libopenblas-pthread-dev comes
first.

This has served us well over the years but can exhibit a bug which I for
example saw with (Ubuntun's) 0.3.8+ds-1 package. Running a simple

example(solve)

in R hangs in an unsuspendable session (ie no Ctrl-C, kill is needed).
Simplest test is on the command-line via

$ Rscript -e 'example(solve)'

Removing libopenbkas0-pthread and installing libopenblas-openmp-dev helps. As
does a manual reordering of the alternatives.

This bug is reproducible on my system with a i7-8700k.

The underlying issue may be Intel threading versus GNU threading which also
shows up using e.g. Intel MKL which seems happier with GNU threading.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org