Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-16 Thread Florian Weimer
* Андрей Доценко:

> The problem occurs only when using semaphores in a library that is not
> linked against pthread.

Yes, that's expected.  Sorry I didn't see this earlier—we have an
upstream bug about this:

  

In general, underlinking produces broken binaries.

Thanks,
Florian



Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-16 Thread Андрей Доценко
I've made all sem_timedwait work. One of the libraries was missing *pthread*
linkage but was linked to a process that links to pthread itself. Message

> undefined reference to symbol 'sem_timedwait@@GLIBC_2.2'
>
was not shown in the project. But it is shown it the test case if I remove
pthread linkage.

The problem occurs only when using semaphores in a library that is not
linked against pthread.

I've prepared minimal test suite using cmake (ctest) and Check framework.
The test reproduces the problem if linkage with pthread of the library is
commented out (by default). Test has the same behaviour as without separate
library with ASAN enabled. Behaviour is the same compiled either with gcc,
clang-3.9 or clang-6.0.

Test passes on amd64 distibutions and on Ubuntu Trusty i386 (with no
pthread linkage). But it fails on Debian 9 (Stretch) i386 and Debian 10
(Testing) i386. No compilation errors happen.

Command to run test in the test-bugs directory after extraction:
cmake --build . && ctest --verbose


test-bugs.tar.gz
Description: application/gzip


Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-14 Thread Андрей Доценко
An update to my last answer. A lot of time passed since I've made this
issue and many changes to the project too. I tested again in Docker and
critical rt semaphores start working (changes didn't touch them as git
blame says). But now I still see some other strange behaviour that "cannot
happen" in a project that follows CERT recommendations. I'll try to debug
it separately. The problem is floating as I see.

пн, 14 янв. 2019 г. в 12:48, Андрей Доценко :

> No, we use single architecture for all the processes. Our software works
> fine with Ubuntu 14.04 (i386 and amd64 versions) and Debian 9 (amd64). Only
> Denian 9 i386 has this problem. Problem reproduced with Docker i386
> container too. At the first time problem has been found in the Docker
> image, so I thought it caused by running i386 image over amd64 kernel, but
> installing to the i686 old hardware directly reproduces the problem.
>
> I'll prepare i386 docker image for Debian Buster as soon as I can and
> check the problem there.
>


Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-14 Thread Андрей Доценко
No, we use single architecture for all the processes. Our software works
fine with Ubuntu 14.04 (i386 and amd64 versions) and Debian 9 (amd64). Only
Denian 9 i386 has this problem. Problem reproduced with Docker i386
container too. At the first time problem has been found in the Docker
image, so I thought it caused by running i386 image over amd64 kernel, but
installing to the i686 old hardware directly reproduces the problem.

I'll prepare i386 docker image for Debian Buster as soon as I can and check
the problem there.


Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-11 Thread Aurelien Jarno
On 2019-01-11 16:50, Андрей Доценко wrote:
> >
> > On your side, have you been able to reproduce the problem *without*
> > ASAN, even on a bigger codebase? I wonder if it is actually a side
> > effect of ASAN.
> >
> 
> All *sem_timedwait* calls do not work in the codebase of our project
> without ASAN. So we cannot use i386 hardware in our embedded systems.
> Codebase is about 1000 source files. But the minimal test passes without
> ASAN so I cannot determine what affects sem_timedwait in our project. Any
> thoughts?

Do you use semaphores between 32- and 64-bit processes? That's not
something supported and used to work by chance in older glibc version
(prior to 2.21 IIRC).

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2019-01-11 Thread Андрей Доценко
>
> On your side, have you been able to reproduce the problem *without*
> ASAN, even on a bigger codebase? I wonder if it is actually a side
> effect of ASAN.
>

All *sem_timedwait* calls do not work in the codebase of our project
without ASAN. So we cannot use i386 hardware in our embedded systems.
Codebase is about 1000 source files. But the minimal test passes without
ASAN so I cannot determine what affects sem_timedwait in our project. Any
thoughts?

вс, 30 дек. 2018 г. в 22:31, Aurelien Jarno :

> Hi,
>
> On 2018-08-22 13:02, Андрей Доценко wrote:
> > Package: libc6
> > Version: 2.24-11+deb9-u3
> >
> > Using sem_timedwait on i686 gives random results. In out proprietary
> > software semaphore used by two processes and located in shared memory
> > mapped with mmap. All works under amd64 architecture and under another
> some
> > distibutions. But under Debian Stretch amd64 sem_timedwait always blocks
> > for timeout and returns ETIMEOUT error. Meanwhile it acquires the lock
> > decreasing semaphore value.
> >
> > I've tried to make test for this bug. Test reproduces bug only with ASAN
> > enabled. Without ASAN enabled it always passes. I've attached test but
> > without ASAN support to show that I don't miss anything. I can modify
> test
> > to enable ASAN support but if somebody ask.
>
> Thanks, I have been able to reproduce the problem here, even with glibc
> 2.29. I have attached a version which doesn't need cmake nor check.
>
> On your side, have you been able to reproduce the problem *without*
> ASAN, even on a bigger codebase? I wonder if it is actually a side
> effect of ASAN.
>
> > I've discovered that symbols used by i686 are different from those from
> > amd64. On amd64 symbols are:
> > ~$ nm "${PROJ_PATH}/Docker/debian/9/amd64/test-bugs/test-bugs"  | grep
> sem_
> >  U sem_destroy@@GLIBC_2.2.5
> >  U sem_getvalue@@GLIBC_2.2.5
> >  U sem_init@@GLIBC_2.2.5
> >  U sem_post@@GLIBC_2.2.5
> >  U sem_timedwait@@GLIBC_2.2.5
> > 004019b0 t test_process_sem_timedwait
> > 004011c0 t test_process_sem_timedwait_nolock
> >
> > But under i686 symbols are different:
> > ~$ nm "${PROJ_PATH}/Docker/debian/9/i386/test-bugs/test-bugs"  | grep
> sem_
> >  U sem_destroy@@GLIBC_2.1
> >  U sem_getvalue@@GLIBC_2.1
> >  U sem_init@@GLIBC_2.1
> >  U sem_post@@GLIBC_2.1
> >  U sem_timedwait@@GLIBC_2.2
> > 08049f50 t test_process_sem_timedwait
> > 08048ee0 t test_process_sem_timedwait_nolock
> >
> > As you can see symbols are different for i686. Version of sem_init,
> > sem_wait, sem_post, sem_destroy and sem_getvalue is GLIBC_2.1, but
> version
> > of sem_timedwait is GLIBC_2.2.
>
> This is perfectly normal. glibc 2.2.5 is the first glibc version that
> supported the amd64 architecture.
>
> > Replacing sem_timedwait with sem_wait makes all work on i686
> architecture.
> > So sem_wait is ok, but sem_timedwait is not.
>
> I confirm that.
>
> Regards,
> Aurelien
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net
>


-- 
Андрей Николаевич, инженер-программист.


Bug#906917: sem_timedwait could always block and returns ETIMEOUT but decrements the value on i686 architecture

2018-12-30 Thread Aurelien Jarno
Hi,

On 2018-08-22 13:02, Андрей Доценко wrote:
> Package: libc6
> Version: 2.24-11+deb9-u3
> 
> Using sem_timedwait on i686 gives random results. In out proprietary
> software semaphore used by two processes and located in shared memory
> mapped with mmap. All works under amd64 architecture and under another some
> distibutions. But under Debian Stretch amd64 sem_timedwait always blocks
> for timeout and returns ETIMEOUT error. Meanwhile it acquires the lock
> decreasing semaphore value.
> 
> I've tried to make test for this bug. Test reproduces bug only with ASAN
> enabled. Without ASAN enabled it always passes. I've attached test but
> without ASAN support to show that I don't miss anything. I can modify test
> to enable ASAN support but if somebody ask.

Thanks, I have been able to reproduce the problem here, even with glibc
2.29. I have attached a version which doesn't need cmake nor check. 

On your side, have you been able to reproduce the problem *without*
ASAN, even on a bigger codebase? I wonder if it is actually a side
effect of ASAN.

> I've discovered that symbols used by i686 are different from those from
> amd64. On amd64 symbols are:
> ~$ nm "${PROJ_PATH}/Docker/debian/9/amd64/test-bugs/test-bugs"  | grep sem_
>  U sem_destroy@@GLIBC_2.2.5
>  U sem_getvalue@@GLIBC_2.2.5
>  U sem_init@@GLIBC_2.2.5
>  U sem_post@@GLIBC_2.2.5
>  U sem_timedwait@@GLIBC_2.2.5
> 004019b0 t test_process_sem_timedwait
> 004011c0 t test_process_sem_timedwait_nolock
> 
> But under i686 symbols are different:
> ~$ nm "${PROJ_PATH}/Docker/debian/9/i386/test-bugs/test-bugs"  | grep sem_
>  U sem_destroy@@GLIBC_2.1
>  U sem_getvalue@@GLIBC_2.1
>  U sem_init@@GLIBC_2.1
>  U sem_post@@GLIBC_2.1
>  U sem_timedwait@@GLIBC_2.2
> 08049f50 t test_process_sem_timedwait
> 08048ee0 t test_process_sem_timedwait_nolock
> 
> As you can see symbols are different for i686. Version of sem_init,
> sem_wait, sem_post, sem_destroy and sem_getvalue is GLIBC_2.1, but version
> of sem_timedwait is GLIBC_2.2.

This is perfectly normal. glibc 2.2.5 is the first glibc version that
supported the amd64 architecture.

> Replacing sem_timedwait with sem_wait makes all work on i686 architecture.
> So sem_wait is ok, but sem_timedwait is not.

I confirm that.

Regards,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net
/* Build with: gcc -o bug906917 bug906917.c -pthread -fsanitize=address */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
	sem_t *sem =
	mmap(NULL, sizeof(*sem), PROT_READ | PROT_WRITE,
		 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	int r;
	r = sem_init(sem, 1, 0);
	if (r == -1) {
		perror("sem_init");
	}
	assert(r == 0);

	pid_t pid = fork();
	if (pid == 0) {
		r = sem_post(sem);
		if (r == -1) {
			exit(1);
		}
		exit(0);
	}
	assert(pid > 0);

	int exit_status;
	pid_t child_pid;
	do {
		child_pid = waitpid(pid, _status, 0);
	}
	while ((child_pid == -1) && (errno == EINTR));
	if (child_pid == -1) {
		perror("waitpid");
	}
	assert(child_pid == pid);
	assert(WIFEXITED(exit_status));
	assert(WEXITSTATUS(exit_status) == 0);

	struct timespec abstime;
	r = clock_gettime(CLOCK_REALTIME, );
	assert(r == 0);
	abstime.tv_sec += 5;

	int value = -1;
	sem_getvalue(sem, );
	assert(value == 1);

	do {
		r = sem_timedwait(sem, );
	}
	while ((r == -1) && (errno == EINTR));
	if (r == -1) {
		perror("sem_timedwait");
	}
	assert(r == 0);

	value = -1;
	sem_getvalue(sem, );
	assert(value == 0);

	sem_destroy(sem);

	return 0;
}