Re: [Gegl-developer] GEGL OpenCL Porting
Have you tried GEGL_DEBUG=opencl ? On Wed, Nov 19, 2014 at 2:32 PM, Nanley Chery nanleych...@gmail.com wrote: I'm glad we could find this bug. Rolling back to the older version of gegl-operation-point-filter.c and adding support for enums in gegl-operation.c allows my opencl kernel to run (among other changes). I will rebase my repo on top of master once it's updated. The last issue that I'm having is that I get no entry for gegl:video-degradation when I have instrumentation enabled (GEGL_DEBUG_TIME=1). I've been parsing the output to determine the speed of other opencl implementations. Any suggestions? Thanks, Nanley On Wed, Nov 19, 2014 at 2:26 PM, Nanley Chery nanleych...@gmail.com wrote: It seems like the code to initialize and run the opencl kernel was lost in this commit: https://git.gnome.org/browse/gegl/commit/gegl?id=a206f032f77064cf9bff8590ac83ca5b086b53fd I'm not familiar enough with the codebase to understand the commit message. Why was this functionality removed? Should I add the deleted code into video degradation's process function? Thanks, Nanley On Wed, Nov 19, 2014 at 12:57 AM, Nanley Chery nanleych...@gmail.com wrote: I noticed there was more to the brightness-contrast example. I made the adjustments concerning the kernel name and parameter values. The code compiles now. The current problem that I'm experiencing is that the run-composition.py test for video-degradation passes with an empty kernel. I'm not sure which code paths are executing to make this work. Any pointers? I'll do some grepping of the source tree in the meantime. Thanks, Nanley On Tue, Nov 18, 2014 at 8:22 PM, Nanley Chery nanleych...@gmail.com wrote: Wow. Thank you for the tip, CL_CHECK is now giving me an output. This is the error message: (lt-gegl:10486): GEGL-video-degradation.c-WARNING **: Error in video-degradation.c:236@cl_process - invalid kernel I thought that I had followed the kernel compilation process correctly. Do you notice any mistake? I have pushed my latest change to the branch. Nanley On Tue, Nov 18, 2014 at 8:06 PM, Victor Oliveira victormath...@gmail.com wrote: Hi Nanley, I'd recommend you follow operations/common/brightness-contrast.c file for a point-filter operation (i.e. a pixel-wise filter) instead of doing what you did. Notice that in operations/common/brightness-contrast.c#n153 there's a string brightness_contrast_cl_source which is a string in opencl/brightness-contrast.cl.h, these are auto-generated files from the kernels in the opencl folder. Let me know what happens from that. Victor On Tue, Nov 18, 2014 at 4:45 PM, Nanley Chery nanleych...@gmail.com wrote: Hi Victor, Thank you very much for taking a look. I understand about the time. Here's the link to my bitbucket branch: https://bitbucket.org/nanoman281/gegl-cse6230/branch/vid_upstrm The latest commit is what's causing the video-degradation.xml test to fail (I'm testing using run-compositions.py). Nanley On Tue, Nov 18, 2014 at 5:11 PM, Victor Oliveira victormath...@gmail.com wrote: Hi Nanley, Just to let you know, I'll need some time to answer that because I'll need to build GIMP on my new laptop. Can you share your code so I can give a look? Victor On Tue, Nov 18, 2014 at 12:49 PM, Nanley Chery nanleych...@gmail.com wrote: Hi Victor, I'm a student working on OpenCL porting work for my High Performance Computing class. I'm trying to implement an OpenCL port for the newly-committed video-degradation operation. Are you willing to provide guidance on the following roadblock? The issue that I'm finding is that creating a cl_process method and setting the following variables in gegl_op_class_init is not enough to get the cl_process method called: operation_class-opencl_support = TRUE; point_filter_class-cl_process = cl_process; If I manually try to call the cl_process function in the process method (like in edge-laplace.c), the program terminates in the gegl_cl_set_kernel_args method without an error from CL_CHECK; Is there something I'm missing? I apologize for mailing you directly instead of writing to the mailing list. I'm a little pressed for time, so I opted for this option. Regards, Nanley ___ gegl-developer-list mailing list List address:gegl-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gegl-developer-list
Re: [Gegl-developer] GEGL OpenCL Porting
I put it back, hopefully everything is alright now. Victor On Wed, Nov 19, 2014 at 2:41 PM, Nanley Chery nanleych...@gmail.com wrote: Thanks for the question Victor. I'm actually running a custom perl script to automate the process. Your question led me to find a bug in the script. Cheers, Nanley On Wed, Nov 19, 2014 at 5:33 PM, Victor Oliveira victormath...@gmail.com wrote: Have you tried GEGL_DEBUG=opencl ? On Wed, Nov 19, 2014 at 2:32 PM, Nanley Chery nanleych...@gmail.com wrote: I'm glad we could find this bug. Rolling back to the older version of gegl-operation-point-filter.c and adding support for enums in gegl-operation.c allows my opencl kernel to run (among other changes). I will rebase my repo on top of master once it's updated. The last issue that I'm having is that I get no entry for gegl:video-degradation when I have instrumentation enabled (GEGL_DEBUG_TIME=1). I've been parsing the output to determine the speed of other opencl implementations. Any suggestions? Thanks, Nanley On Wed, Nov 19, 2014 at 2:26 PM, Nanley Chery nanleych...@gmail.com wrote: It seems like the code to initialize and run the opencl kernel was lost in this commit: https://git.gnome.org/browse/gegl/commit/gegl?id=a206f032f77064cf9bff8590ac83ca5b086b53fd I'm not familiar enough with the codebase to understand the commit message. Why was this functionality removed? Should I add the deleted code into video degradation's process function? Thanks, Nanley On Wed, Nov 19, 2014 at 12:57 AM, Nanley Chery nanleych...@gmail.com wrote: I noticed there was more to the brightness-contrast example. I made the adjustments concerning the kernel name and parameter values. The code compiles now. The current problem that I'm experiencing is that the run-composition.py test for video-degradation passes with an empty kernel. I'm not sure which code paths are executing to make this work. Any pointers? I'll do some grepping of the source tree in the meantime. Thanks, Nanley On Tue, Nov 18, 2014 at 8:22 PM, Nanley Chery nanleych...@gmail.com wrote: Wow. Thank you for the tip, CL_CHECK is now giving me an output. This is the error message: (lt-gegl:10486): GEGL-video-degradation.c-WARNING **: Error in video-degradation.c:236@cl_process - invalid kernel I thought that I had followed the kernel compilation process correctly. Do you notice any mistake? I have pushed my latest change to the branch. Nanley On Tue, Nov 18, 2014 at 8:06 PM, Victor Oliveira victormath...@gmail.com wrote: Hi Nanley, I'd recommend you follow operations/common/brightness-contrast.c file for a point-filter operation (i.e. a pixel-wise filter) instead of doing what you did. Notice that in operations/common/brightness-contrast.c#n153 there's a string brightness_contrast_cl_source which is a string in opencl/brightness-contrast.cl.h, these are auto-generated files from the kernels in the opencl folder. Let me know what happens from that. Victor On Tue, Nov 18, 2014 at 4:45 PM, Nanley Chery nanleych...@gmail.com wrote: Hi Victor, Thank you very much for taking a look. I understand about the time. Here's the link to my bitbucket branch: https://bitbucket.org/nanoman281/gegl-cse6230/branch/vid_upstrm The latest commit is what's causing the video-degradation.xml test to fail (I'm testing using run-compositions.py). Nanley On Tue, Nov 18, 2014 at 5:11 PM, Victor Oliveira victormath...@gmail.com wrote: Hi Nanley, Just to let you know, I'll need some time to answer that because I'll need to build GIMP on my new laptop. Can you share your code so I can give a look? Victor On Tue, Nov 18, 2014 at 12:49 PM, Nanley Chery nanleych...@gmail.com wrote: Hi Victor, I'm a student working on OpenCL porting work for my High Performance Computing class. I'm trying to implement an OpenCL port for the newly-committed video-degradation operation. Are you willing to provide guidance on the following roadblock? The issue that I'm finding is that creating a cl_process method and setting the following variables in gegl_op_class_init is not enough to get the cl_process method called: operation_class-opencl_support = TRUE; point_filter_class-cl_process = cl_process; If I manually try to call the cl_process function in the process method (like in edge-laplace.c), the program terminates in the gegl_cl_set_kernel_args method without an error from CL_CHECK; Is there something I'm missing? I apologize for mailing you directly instead of writing to the mailing list. I'm a little pressed for time, so I opted for this option. Regards, Nanley
Re: [Gegl-developer] imgflo: Visually programming image processing pipelines with GEGL Flowhub
It looks cool! Sorry for not having much time to give a look at the code now, but how does it run gegl in the browser? It's calling the gegl lib through node.js? Victor On Sat, Apr 12, 2014 at 9:51 AM, Jon Nordby jono...@gmail.com wrote: Hi, at Libre Graphics Meeting last week I mentioned in the GEGL and visual-programming workshops that we[1] are looking to build image processing pipelines with GEGL using NoFlo UI/Flowhub[2]. After one week of evenings hacking I'm happy to announce that the very basics are up and running and that we've hit the first milestone; building and processing a GEGL graph! Screenshot: https://twitter.com/jononor/status/455018717500276736 Code: http://github.com/jonnor/imgflo Those that are interested are invited to follow and give feedback on the Github page or on the gegl-developer mailing list. Cheers, Jon -- 1. The Grid, http://thegrid.io 2. http://noflojs.org/noflo-ui/ http://flowhub.io/ ___ gegl-developer-list mailing list List address:gegl-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gegl-developer-list ___ gegl-developer-list mailing list List address:gegl-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gegl-developer-list
Re: [Gegl-developer] [PATCH] Add opencl implementation of operation channel-mixer
I tried to run the filter in my nvidia card and it is very slow and crashes sometimes (invalid command queue, i.e. something really bad happened). Besides the fact that the cl_process function should accept a size_t instead of a long, I couldn't find any other problems but the problem persists. Can you give another look at the patch? Victor On Wed, Jan 22, 2014 at 12:41 PM, Daniel Sabo daniels...@gmail.com wrote: It is also possible to call gegl_cl_has_extension(cl_khr_fp64) to check for double support and then pick which kernal source string to build at runtime. But unless it actually needs a double I would avoid it, double will be slower pretty much everywhere, CPU included. (And if it *does* need double precision you probably shouldn't run the OpenCL version on devices that don't support it.) ___ gegl-developer-list mailing list List address:gegl-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gegl-developer-list
Re: [Gegl-developer] [PATCH] Optimize operation box-blur opencl kernel
Thanks Yongjia, I think this looks like a nice contribution. I'll push the patch soon. Victor Oliveira On Wed, Jan 22, 2014 at 12:21 AM, Yongjia Zhang zhang_yong_...@126.com wrote: From: Yongjia Zhang zhang_yong_...@126.com This is a better way to accomplish the box-blur cl operation by using ocl's local memory from the opencv source code. It use the local shared memory to reduce global memory access, which significantly reduces the kernel's processing time by 70 percent compared to the original one. Because of the barriers and local worksize limitation, processing with a radius larger than 110 becomes slower than original algorithm, so I keep the original kernels in order to deal with box-blur with radius larger than 110. All the tests are based on Intel Beginet and Intel IvyBridge CPU and GPU. Signed-off-by: Yongjia Zhang yongjia.zh...@intel.com --- opencl/box-blur.cl | 66 + opencl/box-blur.cl.h | 66 + operations/common/box-blur.c | 115 ++- 3 files changed, 201 insertions(+), 46 deletions(-) diff --git a/opencl/box-blur.cl b/opencl/box-blur.cl index e99bea4..a1da9de 100644 --- a/opencl/box-blur.cl +++ b/opencl/box-blur.cl @@ -43,3 +43,69 @@ __kernel void kernel_blur_ver (__global const float4 *aux, out[out_index] = mean / (float)(2 * radius + 1); } } + +__kernel void kernel_box_blur_fast(const __global float4 *in, + __global float4 *out, + __local float4 *column_sum, + const int width, + const int height, + const int radius, + const int size) +{ + const int local_id0 = get_local_id(0); + const int twice_radius = 2 * radius; + const int in_width = twice_radius + width; + const int in_height = twice_radius + height; + const float4 area = (float4)( (twice_radius+1) * (twice_radius+1) ); + int column_index_start,column_index_end; + int y = get_global_id(1) * size; + const int out_x = get_group_id(0) + * ( get_local_size(0) - twice_radius ) + local_id0 - radius; + const int in_x = out_x + radius; + int tmp_size = size; + int tmp_index = 0; + float4 tmp_sum = (float4)0.0f; + float4 total_sum = (float4)0.0f; + if( in_x in_width ) + { + column_index_start = y; + column_index_end = y + twice_radius; + for( int i=0; itwice_radius+1; ++i ) + tmp_sum+=in[(y+i)*in_width+in_x]; + column_sum[local_id0] = tmp_sum; + } + + barrier( CLK_LOCAL_MEM_FENCE ); + + while(1) + { + if( out_x width ) + { + if( local_id0 = radius + local_id0 get_local_size(0) - radius ) + { + total_sum = (float4)0.0f; + for( int i=0; itwice_radius+1; ++i ) + total_sum += column_sum[local_id0-radius+i]; + out[y*width+out_x] = total_sum/area; + } + } + if( --tmp_size ==0 || y == height - 1 ) + break; + + barrier( CLK_LOCAL_MEM_FENCE ); + + ++y; + if( in_x in_width ) + { + tmp_sum = column_sum[local_id0]; + tmp_sum -= in[(column_index_start)*in_width+in_x]; + tmp_sum += in[(column_index_end+1)*in_width+in_x]; + ++column_index_start; + ++column_index_end; + column_sum[local_id0] = tmp_sum; + } + + barrier( CLK_LOCAL_MEM_FENCE ); + } +} diff --git a/opencl/box-blur.cl.h b/opencl/box-blur.cl.h index bfed601..8f6aa81 100644 --- a/opencl/box-blur.cl.h +++ b/opencl/box-blur.cl.h @@ -44,4 +44,70 @@ static const char* box_blur_cl_source = out[out_index] = mean / (float)(2 * radius + 1); \n } \n } \n + \n +__kernel void kernel_box_blur_fast(const __global float4 *in, \n + __global float4 *out, \n + __local float4 *column_sum, \n + const int width, \n
Re: [Gegl-developer] [PATCH] Add opencl implementation of operation channel-mixer
Cool! Because double-precision is not supported (or very slow) in most GPUs, we try to not use it in our OpenCL code. Maybe this is something we should be more careful though. Your point that gegl_cl_compile_and_build should have compiler flags is pretty good and I'll work on that. About your code, no complains except that you could use clamp instead of fmin followed by fmax, but I can do that though. Victor Oliveira On Wed, Jan 22, 2014 at 12:26 AM, Yongjia Zhang zhang_yong_...@126.com wrote: From: Yongjia Zhang zhang_yong_...@126.com Although function gegl_chant_class_init had set operation_class-opencl_support=yes in source file channel-mixer.c, it didn't have a opencl implementation of operation channel-mixer. In the cpu version of channel-mixer, all needed information is calculated using variable type double. Since gegl_cl_compile_and_build does not take opencl kernel build options, this implementation uses type float instead of double. If build options could be taken, query the device capability of cl_khr_fp64 extension and then select the proper variable type would be the best. Signed-off-by: Yongjia Zhangyongjia.zh...@intel.com --- opencl/channel-mixer.cl | 40 opencl/channel-mixer.cl.h | 41 operations/common/channel-mixer.c | 80 +++ 3 files changed, 161 insertions(+) create mode 100644 opencl/channel-mixer.cl create mode 100644 opencl/channel-mixer.cl.h diff --git a/opencl/channel-mixer.cl b/opencl/channel-mixer.cl new file mode 100644 index 000..2fb899c --- /dev/null +++ b/opencl/channel-mixer.cl @@ -0,0 +1,40 @@ +#define CM_MIX_PIXEL( ch, r, g, b, norm ) \ + c = ch.x * r + ch.y * g + ch.z * b; \ + c *= norm; \ + mix_return = fmin( 1.0f, fmax( 0.0f, c ) ); + +__kernel void cl_channel_mixer(__global const float *in, + __global float *out, + float4 ch_red, + float4 ch_green, + float4 ch_blue, + float4 ch_black, + float red_norm, + float green_norm, + float blue_norm, + float black_norm, + int monochrome, + int has_alpha) +{ + const int step = (has_alpha == 0 ? 3 : 4 ); + const int offset = get_global_id(0) * step; + float mix_return = 0.0f; + float c = 0.0f; + if( monochrome ) + { + CM_MIX_PIXEL( ch_black, in[offset], in[offset+1], in[offset+2], black_norm ); + out[offset] = out[offset+1] = out[offset+2] = mix_return; + } + else + { + CM_MIX_PIXEL( ch_red, in[offset], in[offset+1], in[offset+2], red_norm ); + out[offset] = mix_return; + CM_MIX_PIXEL( ch_green, in[offset], in[offset+1], in[offset+2], green_norm ); + out[offset+1] = mix_return; + CM_MIX_PIXEL( ch_blue, in[offset], in[offset+1], in[offset+2], blue_norm ); + out[offset+2] = mix_return; + } + if( 4==step ) + out[offset+3] = in[offset+3]; +} + diff --git a/opencl/channel-mixer.cl.h b/opencl/channel-mixer.cl.h new file mode 100644 index 000..32e12d6 --- /dev/null +++ b/opencl/channel-mixer.cl.h @@ -0,0 +1,41 @@ +static const char* channel_mixer_cl_source = +#define CM_MIX_PIXEL( ch, r, g, b, norm ) \\\n + c = ch.x * r + ch.y * g + ch.z * b; \\\n + c *= norm; \\\n + mix_return = fmin( 1.0f, fmax( 0.0f, c ) ); \n + \n +__kernel void cl_channel_mixer(__global const float *in, \n + __global float *out, \n + float4 ch_red, \n + float4 ch_green, \n + float4 ch_blue, \n + float4 ch_black, \n + float red_norm, \n + float green_norm, \n + float blue_norm, \n
Re: [Gegl-developer] [PATCH] Optimize operation box-blur opencl kernel
Indeed, you're right. Just now I noticed the break command before the barrier, that's irreducible control-flow. Yongjia, can you change your kernel so all threads execute the barriers? Victor On Wed, Jan 22, 2014 at 11:12 AM, Tom Stellard t...@stellard.net wrote: On Wed, Jan 22, 2014 at 04:21:42PM +0800, Yongjia Zhang wrote: From: Yongjia Zhang zhang_yong_...@126.com This is a better way to accomplish the box-blur cl operation by using ocl's local memory from the opencv source code. It use the local shared memory to reduce global memory access, which significantly reduces the kernel's processing time by 70 percent compared to the original one. Because of the barriers and local worksize limitation, processing with a radius larger than 110 becomes slower than original algorithm, so I keep the original kernels in order to deal with box-blur with radius larger than 110. All the tests are based on Intel Beginet and Intel IvyBridge CPU and GPU. Signed-off-by: Yongjia Zhang yongjia.zh...@intel.com --- opencl/box-blur.cl | 66 + opencl/box-blur.cl.h | 66 + operations/common/box-blur.c | 115 ++- 3 files changed, 201 insertions(+), 46 deletions(-) diff --git a/opencl/box-blur.cl b/opencl/box-blur.cl index e99bea4..a1da9de 100644 --- a/opencl/box-blur.cl +++ b/opencl/box-blur.cl @@ -43,3 +43,69 @@ __kernel void kernel_blur_ver (__global const float4 *aux, out[out_index] = mean / (float)(2 * radius + 1); } } + +__kernel void kernel_box_blur_fast(const __global float4 *in, + __global float4 *out, + __local float4 *column_sum, + const int width, + const int height, + const int radius, + const int size) +{ + const int local_id0 = get_local_id(0); + const int twice_radius = 2 * radius; + const int in_width = twice_radius + width; + const int in_height = twice_radius + height; + const float4 area = (float4)( (twice_radius+1) * (twice_radius+1) ); + int column_index_start,column_index_end; + int y = get_global_id(1) * size; + const int out_x = get_group_id(0) + * ( get_local_size(0) - twice_radius ) + local_id0 - radius; + const int in_x = out_x + radius; + int tmp_size = size; + int tmp_index = 0; + float4 tmp_sum = (float4)0.0f; + float4 total_sum = (float4)0.0f; + if( in_x in_width ) + { + column_index_start = y; + column_index_end = y + twice_radius; + for( int i=0; itwice_radius+1; ++i ) + tmp_sum+=in[(y+i)*in_width+in_x]; + column_sum[local_id0] = tmp_sum; + } + + barrier( CLK_LOCAL_MEM_FENCE ); + + while(1) + { + if( out_x width ) + { + if( local_id0 = radius + local_id0 get_local_size(0) - radius ) + { + total_sum = (float4)0.0f; + for( int i=0; itwice_radius+1; ++i ) +total_sum += column_sum[local_id0-radius+i]; + out[y*width+out_x] = total_sum/area; + } + } + if( --tmp_size ==0 || y == height - 1 ) + break; + + barrier( CLK_LOCAL_MEM_FENCE ); Is this barrier call guaranteed to be executed by all threads? If not, then this will produce undefined behavior. -Tom + + ++y; + if( in_x in_width ) + { + tmp_sum = column_sum[local_id0]; + tmp_sum -= in[(column_index_start)*in_width+in_x]; + tmp_sum += in[(column_index_end+1)*in_width+in_x]; + ++column_index_start; + ++column_index_end; + column_sum[local_id0] = tmp_sum; + } + + barrier( CLK_LOCAL_MEM_FENCE ); + } +} diff --git a/opencl/box-blur.cl.h b/opencl/box-blur.cl.h index bfed601..8f6aa81 100644 --- a/opencl/box-blur.cl.h +++ b/opencl/box-blur.cl.h @@ -44,4 +44,70 @@ static const char* box_blur_cl_source = out[out_index] = mean / (float)(2 * radius + 1); \n } \n } \n + \n +__kernel void kernel_box_blur_fast(const __global float4 *in, \n + __global float4 *out,
Re: [Gegl-developer] Using Gegl from gui
A (much) more simple example (by Jon Nordby) is https://git.gnome.org/browse/gegl-qt/. On Wed, Sep 18, 2013 at 7:45 AM, Nicolas Robidoux nicolas.robid...@gmail.com wrote: GIMP :) ___ gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list ___ gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
[Gegl-developer] OpenCL support in GEGL is almost there
Hello everyone, I'm a GEGL developer and I've been working since last year implementing OpenCL support in GEGL. We have for now: - An API to write OpenCL point and area operations - A way to share image data between operations in the GPU (so we don't have to bring the image back and forth the CPU for each operator) - +20 GEGL operations have OpenCL support already - It's fast - It's been included in GIMP 2.8RC1 Darktable has its own OpenCL support code (which is pretty good, by the way). I'd like to start talks to avoid work repetition in both programs and to make an argument pro-GEGL use in Darktable :) bye! Victor Oliveira ___ gegl-developer-list mailing list gegl-developer-list@gnome.org http://mail.gnome.org/mailman/listinfo/gegl-developer-list