Re: [darktable-dev] OpenCL scheduling profiles

2017-04-09 Thread Ulrich Pegelow

Am 09.04.2017 um 17:29 schrieb Matthias Andree:

What's your number of background threads (fourth entry in core options)?


It's currently set to 2, and if removed from the configuration file with
darktable stopped,
will revert to 2 when darktable gets restarted and closed next time.

Note I see this quite often, but I don't see where that time comes from:

[dev] took 4,787 secs (5,388 CPU) to load the image.
[dev] took 4,787 secs (5,388 CPU) to load the image.



You might try higher values like six or eight. Main advantage of many 
background threads is hiding I/O latency and that might be a main issue 
here.



Looking at iotop it appears that the prime concern however is that it
maxes out the external USB3 HDD reading from NTFS...
reducing to 1 thread stalled the UI at first but came back with some 30
thumbnails all at once.



Might easily be that the main issue on your system is stalling I/O (for 
whatever reason). Please make some experiments from a very fast storage 
medium (SSD, ram disk) to find out if this is the main cause.



I sometimes see modules like highlite reconstruction, CA correction, or
demosaic ("Entrastern") still being dispatched to the CPU, which is very
slow, when it's normally dispatched to the GPU. Statistics below. It
seems the only module that is supposed to be on the CPU is Gamma, and
it's so blazingly fast that we don't need to care. Sorry for the German,
but you get the idea. This is only from launching darktable in
lighttable view:



There are some modules where no OpenCL code is available (Amaze 
demosaic, raw denoise, color input/output profile with LittleCMS2) but I 
cannot say if this is the main cause here. At least several of the 
modules from the output below have OpenCL support. Please try further to 
isolate if slow CPU processing correlates with specific images and their 
history stacks.



$ grep 'on CPU' /tmp/dt-perf-opencl.log  | sort -k7 | uniq -f6 -c | sort -nr
124 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Gamma' on
CPU, blended on CPU [thumbnail]
  6 [dev_pixelpipe] took 0,026 secs (0,076 CPU) processed
`Entrastern' on CPU, blended on CPU [thumbnail]
  5 [dev_pixelpipe] took 0,276 secs (0,832 CPU) processed
`Chromatische Aberration' on CPU, blended on CPU [thumbnail]
  5 [dev_pixelpipe] took 0,019 secs (0,060 CPU) processed
`Spitzlicht-Rekonstruktion' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,118 secs (0,348 CPU) processed
`Raw-Schwarz-/Weißpunkt' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,052 secs (0,140 CPU) processed
`Weißabgleich' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,023 secs (0,036 CPU) processed
`Tonemapping' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,008 secs (0,016 CPU) processed
`Objektivkorrektur' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,001 secs (0,004 CPU) processed
`Ausgabefarbprofil' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,001 secs (0,000 CPU) processed
`Eingabefarbprofil' on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Schärfen'
on CPU, blended on CPU [thumbnail]
  2 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed
`Basiskurve' on CPU, blended on CPU [thumbnail]
  1 [dev_pixelpipe] took 3,126 secs (9,444 CPU) processed
`Raw-Entrauschen' on CPU, blended on CPU [thumbnail]
  1 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Drehung'
on CPU, blended on CPU [thumbnail]



___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] All of a sudden, darktable Thread Seg faults

2017-04-09 Thread Roman Lebedev
On Sun, Apr 9, 2017 at 3:43 PM, Ulrich Pegelow
 wrote:
> Am 09.04.2017 um 09:31 schrieb Roman Lebedev:
>>
>> On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelow
>>  wrote:
>>>
>>> Am 08.04.2017 um 20:04 schrieb Roman Lebedev:


 Well, that is *very* strange indeed.
 If it *reliably* happens for you, then maybe you could also bisect this
 within the submodule itself?

>>>
>>> Very clear result:
>>
>> Aha, now that makes rather no sense.
>> It is likely caused by just one raw image, if you can find it, i'll
>> take it from here.
>>
>>> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit
>>> commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62
>>> Author: Roman Lebedev 
>>> Date:   Sat Apr 1 13:11:57 2017 +0300
>>>
>>> ThrowException(): and how about this?
>>>
>>> :04 04 84dd635a545bf913c916bc075f152a6718f05b1b
>>> ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M  src
>>
>> This commit was reverted in the very next commit, so what is the next
>> bad commit?
>>
>
> Looks like none of the following commits solves the issue.
>
> However, looking at the changes in question I found that the following patch
> in master brings darktable back to normal:
>
> diff --git a/src/librawspeed/common/RawspeedException.h
> b/src/librawspeed/common/RawspeedException.h
> index 692d3f9..b0ebee6 100644
> --- a/src/librawspeed/common/RawspeedException.h
> +++ b/src/librawspeed/common/RawspeedException.h
> @@ -32,7 +32,7 @@
>  namespace RawSpeed {
>
>  template 
> -[[noreturn]] static inline void __attribute__((noreturn, format(printf, 1,
> 2)))
> +[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2)))
>  ThrowException(const char* fmt, ...) {
>static constexpr size_t bufSize = 8192;
>  #if defined(HAVE_THREAD_LOCAL)
>
> That means reverting the change from commit
> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted in
> commit 0967e3c8a528cca0800cc5289cba5c212a385a6b.
>
> Don't ask me 
And, pushed.

> Ulrich
>
>
>
> ___
> darktable developer mailing list
> to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
>
___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] OpenCL scheduling profiles

2017-04-09 Thread Ulrich Pegelow

Am 09.04.2017 um 11:00 schrieb Matthias Andree:

Am 08.04.2017 um 14:29 schrieb Ulrich Pegelow:
2. What bothers me though are the timeouts and their defaults. In
practice, the darktable works ok-ish, but the lighttable does not. When
a truckload full of small thumbnails (say, lighttable zoomed out to show
10 columns of images) needs to be regenerated for the lighttable, it
*appears* (not yet corroborated with measurements) that bumping up
timeouts considerably helps to avoid latencies, as though things were
deadlocking and waiting for the timer to break the lock. Might be an
internal issue with the synchronization though - how fine granular is
the re-attempt? Is it sleep-and-retry, or does it use some form of
semaphores and signalling at the system level between threads?



What's your number of background threads (fourth entry in core options)?


___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] All of a sudden, darktable Thread Seg faults

2017-04-09 Thread Roman Lebedev
On Sun, Apr 9, 2017 at 3:43 PM, Ulrich Pegelow
 wrote:
> Am 09.04.2017 um 09:31 schrieb Roman Lebedev:
>>
>> On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelow
>>  wrote:
>>>
>>> Am 08.04.2017 um 20:04 schrieb Roman Lebedev:


 Well, that is *very* strange indeed.
 If it *reliably* happens for you, then maybe you could also bisect this
 within the submodule itself?

>>>
>>> Very clear result:
>>
>> Aha, now that makes rather no sense.
>> It is likely caused by just one raw image, if you can find it, i'll
>> take it from here.
>>
>>> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit
>>> commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62
>>> Author: Roman Lebedev 
>>> Date:   Sat Apr 1 13:11:57 2017 +0300
>>>
>>> ThrowException(): and how about this?
>>>
>>> :04 04 84dd635a545bf913c916bc075f152a6718f05b1b
>>> ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M  src
>>
>> This commit was reverted in the very next commit, so what is the next
>> bad commit?
>>
>
> Looks like none of the following commits solves the issue.
>
> However, looking at the changes in question I found that the following patch
> in master brings darktable back to normal:
>
> diff --git a/src/librawspeed/common/RawspeedException.h
> b/src/librawspeed/common/RawspeedException.h
> index 692d3f9..b0ebee6 100644
> --- a/src/librawspeed/common/RawspeedException.h
> +++ b/src/librawspeed/common/RawspeedException.h
> @@ -32,7 +32,7 @@
>  namespace RawSpeed {
>
>  template 
> -[[noreturn]] static inline void __attribute__((noreturn, format(printf, 1,
> 2)))
> +[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2)))
>  ThrowException(const char* fmt, ...) {
>static constexpr size_t bufSize = 8192;
>  #if defined(HAVE_THREAD_LOCAL)
>
> That means reverting the change from commit
> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted in
> commit 0967e3c8a528cca0800cc5289cba5c212a385a6b.
>
> Don't ask me 
Okay, thank you for debugging this :)
I'll try to push that later today.

> Ulrich
Roman.

> ___
> darktable developer mailing list
> to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
>
___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] All of a sudden, darktable Thread Seg faults

2017-04-09 Thread Ulrich Pegelow

Am 09.04.2017 um 09:31 schrieb Roman Lebedev:

On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelow
 wrote:

Am 08.04.2017 um 20:04 schrieb Roman Lebedev:


Well, that is *very* strange indeed.
If it *reliably* happens for you, then maybe you could also bisect this
within the submodule itself?



Very clear result:

Aha, now that makes rather no sense.
It is likely caused by just one raw image, if you can find it, i'll
take it from here.


7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit
commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62
Author: Roman Lebedev 
Date:   Sat Apr 1 13:11:57 2017 +0300

ThrowException(): and how about this?

:04 04 84dd635a545bf913c916bc075f152a6718f05b1b
ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M  src

This commit was reverted in the very next commit, so what is the next
bad commit?



Looks like none of the following commits solves the issue.

However, looking at the changes in question I found that the following 
patch in master brings darktable back to normal:


diff --git a/src/librawspeed/common/RawspeedException.h 
b/src/librawspeed/common/RawspeedException.h

index 692d3f9..b0ebee6 100644
--- a/src/librawspeed/common/RawspeedException.h
+++ b/src/librawspeed/common/RawspeedException.h
@@ -32,7 +32,7 @@
 namespace RawSpeed {

 template 
-[[noreturn]] static inline void __attribute__((noreturn, format(printf, 
1, 2)))

+[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2)))
 ThrowException(const char* fmt, ...) {
   static constexpr size_t bufSize = 8192;
 #if defined(HAVE_THREAD_LOCAL)

That means reverting the change from commit 
7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted 
in commit 0967e3c8a528cca0800cc5289cba5c212a385a6b.


Don't ask me 

Ulrich


___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] OpenCL scheduling profiles

2017-04-09 Thread Matthias Andree
Am 08.04.2017 um 14:29 schrieb Ulrich Pegelow:
> Hi,
>
> I added a bit more flexibility concerning OpenCL device scheduling
> into master. There is a new selection box in preferences (core
> options) that allows to choose among a few typical presets.
>
> The main target are modern systems with very fast GPUs. By default and
> "traditionally" darktable distributes work between CPU and GPU in the
> darkroom: the GPU processes the center (full) view and the CPU is
> responsible for the preview (navigation) panel. Now that GPUs get
> faster and faster there are systems where the GPU so strongly
> outperforms the CPU that it makes more sense to process preview and
> full pixelpipe on the GPU sequentially.
>
> For that reason the "OpenCL scheduling profile" parameter has three
> options:
>
> * "default" describes the old behavior: work is split between GPU and
> CPU and works best for systems where CPU and GPU performance are on a
> similar level.
>
> * "very fast GPU" tackles the case described above: in darkroom view
> both pixelpipes are sequentially processed by the GPU. This is meant
> for GPUs which strongly outperform the CPU on that system.
>
> * "multiple GPUs" is meant for systems with more than one OpenCL
> device so that the full and the preview pixelpipe get processed by
> separate GPUs.
>
> At first startup darktable tries to find the best suited profile based
> on some benchmarking. You may at any time change the profile, this
> takes effect immediately.
>
> I am interested in your experience, both in terms of automatic
> detection of the best suited profile and in terms of overall
> performance. Please note that this is all about system latency and
> perceived system responsiveness in the darkroom view. Calling
> darktable with '-d perf' will only give you limited insights so you
> need to mostly rely on your own judgement.
>

Hi Ulrich,

1. gorgeous, thank you very much!

For me, the benchmarking seems to DTRT™ (do the right thing), it picks
the "very fast GPU" profile with a 2016 NVidia GeForce 1060 GTX 6 GB and
an old 2009 AMD Phenom II X4 2.5 GHz 65 W Quadcore, code is compiled
with -O2 -march=native, OpenMP and OpenCL enabled, and I get this:

[opencl_init] here are the internal numbers and names of OpenCL devices
available to darktable:
[opencl_init]   0   'GeForce GTX 1060 6GB'
[opencl_init] FINALLY: opencl is AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is ON.
[opencl_create_kernel] successfully loaded kernel `zero' (0) for device 0
[...]
[opencl_init] benchmarking results: 0.029428 seconds for fastest GPU
versus 0.382860 seconds for CPU.
[opencl_init] set scheduling profile for very fast GPU.
[opencl_priorities] these are your device priorities:
[opencl_priorities] image   preview export  thumbnail
[opencl_priorities] 0   0   0   0
[opencl_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_priorities] image   preview export  thumbnail
[opencl_priorities] 1   1   1   1
[opencl_synchronization_timeout] synchronization timout set to 0

2. What bothers me though are the timeouts and their defaults. In
practice, the darktable works ok-ish, but the lighttable does not. When
a truckload full of small thumbnails (say, lighttable zoomed out to show
10 columns of images) needs to be regenerated for the lighttable, it
*appears* (not yet corroborated with measurements) that bumping up
timeouts considerably helps to avoid latencies, as though things were
deadlocking and waiting for the timer to break the lock. Might be an
internal issue with the synchronization though - how fine granular is
the re-attempt? Is it sleep-and-retry, or does it use some form of
semaphores and signalling at the system level between threads?

I am running with these - possibly ridiculously high - timeout settings
(15 s). This is normally enough to process an entire export including a
few CPU segments (say, raw denoise - I need it on some high-ISO images,
ISO 6400+, to avoid black blotches or green stipples, but I have some
concerns about its quality altogether which don't belong in this thread).

opencl_mandatory_timeout=3000
pixelpipe_synchronization_timeout=3000

3. Would it be sensible to set one of these timeouts considerably higher
than the other?

4. Can we have -d perf log when timeouts occur that change the
scheduling decision (i. e. if a timeout causes a job to be dispatched to
a different device, with original intent, and dispatch target), and
4b. possibly a complete scheduler trace including all dispatch attempts?
Might help debug in the long run.


___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org