Re: [Beignet] [PATCH] create GIT_SHA1 without any dependency
LGTM, pushed, thanks. On Sat, Oct 25, 2014 at 03:10:02AM +0800, Meng Mengmeng wrote: --- src/CMakeLists.txt | 5 ++--- src/git_sha1.sh| 4 ++-- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 0d22589..9e65856 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -110,11 +110,10 @@ SET(CMAKE_C_FLAGS -DHAS_OCLIcd ${CMAKE_C_FLAGS}) endif (OCLIcd_FOUND) set(GIT_SHA1 git_sha1.h) -add_custom_command(OUTPUT ${GIT_SHA1} +add_custom_target(${GIT_SHA1} ALL COMMAND chmod +x ${CMAKE_CURRENT_SOURCE_DIR}/git_sha1.sh COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/git_sha1.sh ${CMAKE_CURRENT_SOURCE_DIR} ${GIT_SHA1} - ) -add_custom_target(GIT_SHA1 ALL DEPENDS ${GIT_SHA1}) +) SET(CMAKE_SHARED_LINKER_FLAGS ${CMAKE_SHARED_LINKER_FLAGS} -Wl,-Bsymbolic,--allow-shlib-undefined) diff --git a/src/git_sha1.sh b/src/git_sha1.sh index 4f6f972..f44f078 100755 --- a/src/git_sha1.sh +++ b/src/git_sha1.sh @@ -4,9 +4,9 @@ SOURCE_DIR=$1 FILE=$2 touch ${SOURCE_DIR}/${FILE}_tmp -if test -d $1/../.git; then +if test -d ${SOURCE_DIR}/../.git; then if which git /dev/null; then -git --git-dir=$1/../.git log -n 1 --oneline | \ +git --git-dir=${SOURCE_DIR}/../.git log -n 1 --oneline | \ sed 's/^\([^ ]*\) .*/#define BEIGNET_GIT_SHA1 git-\1/' \ ${SOURCE_DIR}/${FILE}_tmp fi -- 1.9.3 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [Intel-gfx] Beignet crashes on vanilla 3.17.1 with IVB hardware
I just checked again, and found both 3.17 and 3.17.1 should work fine on IVB with beignet. I just tested beignet on IVB with kernel 3.17.1, all unit tests passed successfully. For IVB user, no need to wait for 3.18. Don't know which application you were testing on your IVB machine. If it could be reproduced easily, please open a bug on fd.o. Thanks, Zhigang Gong. On Fri, Oct 24, 2014 at 10:27:23AM +0800, Zhigang Gong wrote: Hi, For IVB, I just checked the 3.18-rc1, it has the following patch: commit c9224faa59c3071ecfa2d4b24592f4eb61e57069 Author: Brad Volkin bradley.d.vol...@intel.com Date: Tue Jun 17 14:10:34 2014 -0700 drm/i915: Add some L3 registers to the parser whitelist Beignet needs these in order to program the L3 cache config for OpenCL workloads, particularly when using SLM. Signed-off-by: Brad Volkin bradley.d.vol...@intel.com Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch So, beignet should work fine with 3.18 on IVB/BYT. But for the HSW,I'm not quite sure when we could get a workable vanilla kernel. Something I found at the intel-gfx mail list as below and it doesn't sound good. http://lists.freedesktop.org/archives/intel-gfx/2014-May/044694.html http://lists.freedesktop.org/archives/intel-gfx/2014-May/045088.html CC to intel-gfx mail list. Hope we can get an official anwser here. Thanks, Zhigang Gong. On Thu, Oct 23, 2014 at 03:39:35PM +0300, Vasily Khoruzhick wrote: Hi, As you maybe know, any application which uses beignet OpenCL implementation crashes on Ivy Bridge hardware when using vanilla 3.17.1 kernel. I guess it's due to batchbuffer security and patch to disable batchbuffer security is required, but guys, it fails since 3.16, and 3.17 was released quite a while ago. Could you cooperate with i915 driver devs to make Beignet working on vanilla kernel without extra patches? Thanks! Regards, Vasily ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Intel-gfx mailing list intel-...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [Intel-gfx] Beignet crashes on vanilla 3.17.1 with IVB hardware
Hi Zhigang, Luxmark crashes with following backtrace: Program received signal SIGSEGV, Segmentation fault. 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () (gdb) bt #0 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () #1 0x00482236 in slg::RenderEngine::Stop() () #2 0x0047be9b in slg::RenderSession::~RenderSession() () #3 0x00468a77 in LuxMarkApp::Stop() () #4 0x00468b46 in LuxMarkApp::InitRendering(LuxMarkAppMode, char const*) () #5 0x0046edbe in MainWindow::event(QEvent*) () #6 0x77099b9c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib/libQtGui.so.4 #7 0x770a05e8 in QApplication::notify(QObject*, QEvent*) () from /usr/lib/libQtGui.so.4 #8 0x76b6af7d in QCoreApplication::notifyInternal(QObject*, QEvent*) () from /usr/lib/libQtCore.so.4 #9 0x76b6e341 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () from /usr/lib/libQtCore.so.4 #10 0x76b99e63 in ?? () from /usr/lib/libQtCore.so.4 #11 0x74d05a1d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #12 0x74d05d08 in ?? () from /usr/lib/libglib-2.0.so.0 #13 0x74d05dbc in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #14 0x76b99fad in QEventDispatcherGlib::processEvents(QFlagsQEventLoop::ProcessEventsFlag) () from /usr/lib/libQtCore.so.4 #15 0x7713d9c6 in ?? () from /usr/lib/libQtGui.so.4 #16 0x76b69ad1 in QEventLoop::processEvents(QFlagsQEventLoop::ProcessEventsFlag) () from /usr/lib/libQtCore.so.4 #17 0x76b69e35 in QEventLoop::exec(QFlagsQEventLoop::ProcessEventsFlag) () from /usr/lib/libQtCore.so.4 #18 0x76b6f3c7 in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4 #19 0x0045afc8 in main () And convert from imagemagick crashes in libgbe.so: Program received signal SIGSEGV, Segmentation fault. 0x7fffeb0df18c in gbe::buildRegInfo(gbe::ir::BasicBlock, gbe::vectorgbe::RegInfoForMov) () from /usr/lib/beignet//libgbe.so (gdb) bt #0 0x7fffeb0df18c in gbe::buildRegInfo(gbe::ir::BasicBlock, gbe::vectorgbe::RegInfoForMov) () from /usr/lib/beignet//libgbe.so #1 0x7fffeb0e048d in gbe::GenWriter::removeLOADIs(gbe::ir::Liveness const, gbe::ir::Function) () from /usr/lib/beignet//libgbe.so #2 0x7fffeb0fb823 in gbe::GenWriter::emitFunction(llvm::Function) () from /usr/lib/beignet//libgbe.so #3 0x7fffeb100b63 in gbe::GenWriter::runOnFunction(llvm::Function) () from /usr/lib/beignet//libgbe.so #4 0x7fffec0129e1 in llvm::FPPassManager::runOnFunction(llvm::Function) () from /usr/lib/beignet//libgbe.so #5 0x7fffebe11083 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module) () from /usr/lib/beignet//libgbe.so #6 0x7fffec015538 in llvm::legacy::PassManagerImpl::run(llvm::Module) () from /usr/lib/beignet//libgbe.so #7 0x7fffeb11fe0e in gbe::llvmToGen(gbe::ir::Unit, char const*, void const*, int, bool) () from /usr/lib/beignet//libgbe.so #8 0x7fffeb0c4fc1 in gbe::Program::buildFromLLVMFile(char const*, void const*, std::string, int) () from /usr/lib/beignet//libgbe.so #9 0x7fffeb17d0ba in gbe::genProgramNewFromLLVM(unsigned int, char const*, void const*, void const*, unsigned long, char*, unsigned long*, int) () from /usr/lib/beignet//libgbe.so #10 0x7fffeb0c8e09 in gbe::programNewFromSource(unsigned int, char const*, unsigned long, char const*, char*, unsigned long*) () from /usr/lib/beignet//libgbe.so #11 0x7fffef0eac96 in cl_program_build () from /usr/lib/beignet/libcl.so #12 0x7fffef0e39de in clBuildProgram () from /usr/lib/beignet/libcl.so #13 0x7fffef32c4cb in clBuildProgram () from /usr/lib/libOpenCL.so #14 0x77a461ec in ?? () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #15 0x77a4689f in InitOpenCLEnvInternal () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #16 0x77a46b83 in ?? () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #17 0x77a4797a in ?? () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #18 0x77a4808c in InitOpenCLEnv () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #19 0x77941558 in ?? () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #20 0x77948d3e in AccelerateResizeImage () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #21 0x77a95da0 in ResizeImage () from /usr/lib/libMagickCore-6.Q16HDRI.so.2 #22 0x7768d310 in MogrifyImage () from /usr/lib/libMagickWand-6.Q16HDRI.so.2 #23 0x77691bc9 in MogrifyImages () from /usr/lib/libMagickWand-6.Q16HDRI.so.2 #24 0x7761d03b in ConvertImageCommand () from /usr/lib/libMagickWand-6.Q16HDRI.so.2 #25 0x77686ff7 in MagickCommandGenesis () from /usr/lib/libMagickWand-6.Q16HDRI.so.2 #26 0x004008a7 in ?? () #27 0x7703a040 in __libc_start_main () from /usr/lib/libc.so.6 #28 0x004008fb in ?? () Reproducibility is 100% Regards Vasily On Fri, Oct 24, 2014 at 10:05 AM, Zhigang Gong
Re: [Beignet] possilbe bug when run opencv_test_imgproc
Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] possilbe bug when run opencv_test_imgproc
I have BYT box, an IVB machine and a HSW notebook. All of them haven't this issue. Yang rong is working on a race condition patch. This issue may be related. You may try again once he send out the patch. On Fri, Oct 24, 2014 at 5:46 PM, yan.w...@linux.intel.com wrote: Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [Intel-gfx] Beignet crashes on vanilla 3.17.1 with IVB hardware
Hi, Luxmark (both 2.0/2.1) works fine on my IVB machine. The back trace you provided below doesn't indicate it's a beignet related problem. It hadn't enter beignet domain and just crashed in luxmark internal. On Fri, Oct 24, 2014 at 12:04:29PM +0300, Vasily Khoruzhick wrote: Hi Zhigang, Luxmark crashes with following backtrace: Program received signal SIGSEGV, Segmentation fault. 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () (gdb) bt #0 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () #1 0x00482236 in slg::RenderEngine::Stop() () #2 0x0047be9b in slg::RenderSession::~RenderSession() () #3 0x00468a77 in LuxMarkApp::Stop() () #4 0x00468b46 in LuxMarkApp::InitRendering(LuxMarkAppMode, char const*) () After a quick analysis, I confirm that the second case is indeed a beignet bug. Beignet lacks of some llvm intrinsics support such as llvm.uadd.with.overflow.i32(). Will fix it next week. Thanks, Zhigang Gong. And convert from imagemagick crashes in libgbe.so: Program received signal SIGSEGV, Segmentation fault. 0x7fffeb0df18c in gbe::buildRegInfo(gbe::ir::BasicBlock, gbe::vectorgbe::RegInfoForMov) () from /usr/lib/beignet//libgbe.so (gdb) bt #0 0x7fffeb0df18c in gbe::buildRegInfo(gbe::ir::BasicBlock, gbe::vectorgbe::RegInfoForMov) () from /usr/lib/beignet//libgbe.so #1 0x7fffeb0e048d in gbe::GenWriter::removeLOADIs(gbe::ir::Liveness const, gbe::ir::Function) () from /usr/lib/beignet//libgbe.so #2 0x7fffeb0fb823 in gbe::GenWriter::emitFunction(llvm::Function) () from /usr/lib/beignet//libgbe.so #3 0x7fffeb100b63 in ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH] GBE: fix a wrong type of cl_device_info.
Per OpenCL spec 1.2: CL_DEVICE_IMAGE_MAX_BUFFER_SIZE should be size_t type rather than cl_ulong. This bug will cause problems on i386 platform. Signed-off-by: Zhigang Gong zhigang.g...@intel.com --- src/cl_device_id.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/cl_device_id.h b/src/cl_device_id.h index afc32e2..b2c0e5b 100644 --- a/src/cl_device_id.h +++ b/src/cl_device_id.h @@ -60,7 +60,7 @@ struct _cl_device_id { size_t image3d_max_width; size_t image3d_max_height; size_t image3d_max_depth; - cl_ulong image_mem_size; + size_t image_mem_size; cl_uint max_samplers; size_t max_parameter_size; cl_uint mem_base_addr_align; -- 1.8.3.2 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [Intel-gfx] Beignet crashes on vanilla 3.17.1 with IVB hardware
Hi Zhigang, On Fri, Oct 24, 2014 at 12:13 PM, Zhigang Gong zhigang.g...@linux.intel.com wrote: Hi, Luxmark (both 2.0/2.1) works fine on my IVB machine. The back trace you provided below doesn't indicate it's a beignet related problem. It hadn't enter beignet domain and just crashed in luxmark internal. I'm testing with Luxmark-1.3.1. Luxmark 2.0 works fine. On Fri, Oct 24, 2014 at 12:04:29PM +0300, Vasily Khoruzhick wrote: Hi Zhigang, Luxmark crashes with following backtrace: Program received signal SIGSEGV, Segmentation fault. 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () (gdb) bt #0 0x004cd8b0 in slg::PathOCLRenderEngine::StopLockLess() () #1 0x00482236 in slg::RenderEngine::Stop() () #2 0x0047be9b in slg::RenderSession::~RenderSession() () #3 0x00468a77 in LuxMarkApp::Stop() () #4 0x00468b46 in LuxMarkApp::InitRendering(LuxMarkAppMode, char const*) () After a quick analysis, I confirm that the second case is indeed a beignet bug. Beignet lacks of some llvm intrinsics support such as llvm.uadd.with.overflow.i32(). Will fix it next week. Ok, thank you! Thanks, Zhigang Gong. Regards Vasily ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] possilbe bug when run opencv_test_imgproc
Sure. I could try Yang Rong's patch. BTW, I also meet the following failed cases. Could you please confirm them? Thanks. [ FAILED ] 3 tests, listed below: [ FAILED ] OCL_ImgProc/Canny.Accuracy/8, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(false), UseRoi(false)) [ FAILED ] OCL_ImgProc/Canny.Accuracy/10, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(true), UseRoi(false)) [ FAILED ] OCL_Imgproc/HoughLines.RealImage/2, where GetParam() = (1, 0.00872665, 80) Yan Wang I have BYT box, an IVB machine and a HSW notebook. All of them haven't this issue. Yang rong is working on a race condition patch. This issue may be related. You may try again once he send out the patch. On Fri, Oct 24, 2014 at 5:46 PM, yan.w...@linux.intel.com wrote: Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] possilbe bug when run opencv_test_imgproc
All of these three failures are already tracked in JIRA. If you have access to JIRA, you can check them easily. Thanks, Zhigang Gong. On Fri, Oct 24, 2014 at 10:33 PM, yan.w...@linux.intel.com wrote: Sure. I could try Yang Rong's patch. BTW, I also meet the following failed cases. Could you please confirm them? Thanks. [ FAILED ] 3 tests, listed below: [ FAILED ] OCL_ImgProc/Canny.Accuracy/8, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(false), UseRoi(false)) [ FAILED ] OCL_ImgProc/Canny.Accuracy/10, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(true), UseRoi(false)) [ FAILED ] OCL_Imgproc/HoughLines.RealImage/2, where GetParam() = (1, 0.00872665, 80) Yan Wang I have BYT box, an IVB machine and a HSW notebook. All of them haven't this issue. Yang rong is working on a race condition patch. This issue may be related. You may try again once he send out the patch. On Fri, Oct 24, 2014 at 5:46 PM, yan.w...@linux.intel.com wrote: Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] possilbe bug when run opencv_test_imgproc
Could you give me one URL? Thanks. Yan Wang All of these three failures are already tracked in JIRA. If you have access to JIRA, you can check them easily. Thanks, Zhigang Gong. On Fri, Oct 24, 2014 at 10:33 PM, yan.w...@linux.intel.com wrote: Sure. I could try Yang Rong's patch. BTW, I also meet the following failed cases. Could you please confirm them? Thanks. [ FAILED ] 3 tests, listed below: [ FAILED ] OCL_ImgProc/Canny.Accuracy/8, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(false), UseRoi(false)) [ FAILED ] OCL_ImgProc/Canny.Accuracy/10, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(true), UseRoi(false)) [ FAILED ] OCL_Imgproc/HoughLines.RealImage/2, where GetParam() = (1, 0.00872665, 80) Yan Wang I have BYT box, an IVB machine and a HSW notebook. All of them haven't this issue. Yang rong is working on a race condition patch. This issue may be related. You may try again once he send out the patch. On Fri, Oct 24, 2014 at 5:46 PM, yan.w...@linux.intel.com wrote: Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] possilbe bug when run opencv_test_imgproc
Hi, The 2 bugs are: 1. https://jira01.devtools.intel.com/browse/VIZ-4490 VIZ-4490 [HSW/IVB/BYT-M] After update opencv, 1 opencv case (opencv_test_imgproc/OCL_Imgproc/HoughLines)fail 2. https://jira01.devtools.intel.com/browse/VIZ-4489 VIZ-4489 [IVB/HSW/BYT-M] After update opencv, some opencv cases(eg.opencv_test_imgproc/OCL_ImgProc/Canny)fail Thanks, Meng -Original Message- From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of yan.w...@linux.intel.com Sent: Friday, October 24, 2014 10:59 PM To: Zhigang Gong Cc: yan.w...@linux.intel.com; beignet@lists.freedesktop.org Subject: Re: [Beignet] possilbe bug when run opencv_test_imgproc Could you give me one URL? Thanks. Yan Wang All of these three failures are already tracked in JIRA. If you have access to JIRA, you can check them easily. Thanks, Zhigang Gong. On Fri, Oct 24, 2014 at 10:33 PM, yan.w...@linux.intel.com wrote: Sure. I could try Yang Rong's patch. BTW, I also meet the following failed cases. Could you please confirm them? Thanks. [ FAILED ] 3 tests, listed below: [ FAILED ] OCL_ImgProc/Canny.Accuracy/8, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(false), UseRoi(false)) [ FAILED ] OCL_ImgProc/Canny.Accuracy/10, where GetParam() = (Channels(3), AppertureSize(3), L2gradient(true), UseRoi(false)) [ FAILED ] OCL_Imgproc/HoughLines.RealImage/2, where GetParam() = (1, 0.00872665, 80) Yan Wang I have BYT box, an IVB machine and a HSW notebook. All of them haven't this issue. Yang rong is working on a race condition patch. This issue may be related. You may try again once he send out the patch. On Fri, Oct 24, 2014 at 5:46 PM, yan.w...@linux.intel.com wrote: Hi, Zhigang, May your platform also get failed cases when run OCL_ImageProc/Filter2D.Mat* although no crash, because I have only Baytail T platfrom. I am not sure the other platform has the same issue. If yes, I could try to investigate the reason. It may have issue of filter2D OpenCL implementation in OpenCV. Thanks. Yan Wang Hi, All, I found one possible bug for review. if run the following: ./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat*. OCL_ImageProc/Filter2D.Mat/256 failed and continue. But the whole test flow will crash in OCL_ImageProc/Filter2D.Mat/257: [ FAILED ] OCL_ImageProc/Filter2D.Mat/240, where GetParam() = (CV_8U, Channels(2), 7, 1, BORDER_CONSTANT, false, false) (6433 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/241 [ OK ] OCL_ImageProc/Filter2D.Mat/241 (358 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/242 [ OK ] OCL_ImageProc/Filter2D.Mat/242 (311 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/243 [ OK ] OCL_ImageProc/Filter2D.Mat/243 (6 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/244 [ OK ] OCL_ImageProc/Filter2D.Mat/244 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/245 [ OK ] OCL_ImageProc/Filter2D.Mat/245 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/246 [ OK ] OCL_ImageProc/Filter2D.Mat/246 (203 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/247 [ OK ] OCL_ImageProc/Filter2D.Mat/247 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/248 [ OK ] OCL_ImageProc/Filter2D.Mat/248 (210 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/249 [ OK ] OCL_ImageProc/Filter2D.Mat/249 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/250 [ OK ] OCL_ImageProc/Filter2D.Mat/250 (208 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/251 [ OK ] OCL_ImageProc/Filter2D.Mat/251 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/252 [ OK ] OCL_ImageProc/Filter2D.Mat/252 (214 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/253 [ OK ] OCL_ImageProc/Filter2D.Mat/253 (207 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/254 [ OK ] OCL_ImageProc/Filter2D.Mat/254 (212 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/255 [ OK ] OCL_ImageProc/Filter2D.Mat/255 (7 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/256 /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:106: Failure Expected: (TestUtils::checkNorm2(dst, udst)) = (threshold), actual: 255 vs 1 Size: [92 x 61] /home/yanwang/opencv/modules/imgproc/test/ocl/test_filter2d.cpp:107: Failure Expected: (TestUtils::checkNorm2(dst_roi, udst_roi)) = (threshold), actual: 255 vs 1 Size: [92 x 61] [ FAILED ] OCL_ImageProc/Filter2D.Mat/256, where GetParam() = (CV_8U, Channels(2), 7, 4, BORDER_CONSTANT, false, false) (7413 ms) [ RUN ] OCL_ImageProc/Filter2D.Mat/257 opencv_test_imgproc: /home/yanwang/beignet/src/intel/intel_gpgpu.c:703: intel_gpgpu_check_binded_buf_address: Assertion `gpgpu-binded_buf[i]-offset != 0' failed. Segmentation fault (core dumped) But if I run OCL_ImageProc/Filter2D.Mat/257 only, it passed. I think 256 case may influence it. Thanks. Yan Wang
[Beignet] beignet not working with Wolfram Mathematica
Hello, I'm trying to run beignet from Wolfram Mathematica 10.0.1, I have beignet 0.9.2, Fedora 20 x86_64, kernel 3.16.3, i5-3230M. First, there is an issue which is probably to be blamed on Mathematica (you need to manually load /usr/lib64/beignet/libcl.so with LibraryLoad), I suppose that Mathematica doesn't know about the icd system. Anyway, after that, this test program src = __kernel void myKernel( __global mint * global0Id, __global mint * global1Id, mint width, mint height) { int xIndex = get_global_id(0); int yIndex = get_global_id(1); int index = xIndex + yIndex*width; if (xIndex width yIndex height) { global0Id[index] = get_local_id(0); global1Id[index] = get_local_id(1); } }; which is suggested in the examples of the Mathematica implementation (http://reference.wolfram.com/language/OpenCLLink/tutorial/Programming.html) produces all zeroes. If instead the pocl driver is loaded, the kernel is executed correctly. Also I could run the LuxMark benchmarks (even though on some tests I see yellow spots that I believe are glitches). I am sorry if the question is a bit vague, but I don't know much about OpenCL and I indeed wanted to start learning while using Mathematica, which is a tool that I already need for other reasons. Lorenzo ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH 6/6] BDW: Add function intel_gpgpu_bind_buf for gen8.
This patchset LGTM On 一, 2014-09-29 at 13:37 +0800, Yang Rong wrote: From: Junyan He junyan...@linux.intel.com Must call cl_bind_buf instead of intel_gpgpu_bind_buf directly in intel_gpgpu. Signed-off-by: Junyan He junyan...@linux.intel.com --- src/intel/intel_gpgpu.c | 36 +++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index 6b8fa38..eedfe31 100644 --- a/src/intel/intel_gpgpu.c +++ b/src/intel/intel_gpgpu.c @@ -818,13 +818,13 @@ intel_gpgpu_setup_bti_gen8(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, ss0-ss8_9.surface_base_addr_lo = (buf-offset64 + internal_offset) 0x; ss0-ss8_9.surface_base_addr_hi = ((buf-offset64 + internal_offset) 32) 0x; dri_bo_emit_reloc(gpgpu-aux_buf.bo, - I915_GEM_DOMAIN_RENDER, - I915_GEM_DOMAIN_RENDER, - internal_offset, - gpgpu-aux_offset.surface_heap_offset + - heap-binding_table[index] + - offsetof(gen8_surface_state_t, ss1), - buf); +I915_GEM_DOMAIN_RENDER, +I915_GEM_DOMAIN_RENDER, +internal_offset, +gpgpu-aux_offset.surface_heap_offset + +heap-binding_table[index] + +offsetof(gen8_surface_state_t, ss1), +buf); } static int @@ -981,6 +981,18 @@ intel_gpgpu_bind_buf(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t offset, intel_gpgpu_setup_bti(gpgpu, buf, internal_offset, size, bti); } +static void +intel_gpgpu_bind_buf_gen8(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t offset, + uint32_t internal_offset, uint32_t size, uint8_t bti) +{ + assert(gpgpu-binded_n max_buf_n); + gpgpu-binded_buf[gpgpu-binded_n] = buf; + gpgpu-target_buf_offset[gpgpu-binded_n] = internal_offset; + gpgpu-binded_offset[gpgpu-binded_n] = offset; + gpgpu-binded_n++; + intel_gpgpu_setup_bti_gen8(gpgpu, buf, internal_offset, size, bti); +} + static int intel_gpgpu_set_scratch(intel_gpgpu_t * gpgpu, uint32_t per_thread_size) { @@ -1011,7 +1023,7 @@ intel_gpgpu_set_stack(intel_gpgpu_t *gpgpu, uint32_t offset, uint32_t size, uint drm_intel_bufmgr *bufmgr = gpgpu-drv-bufmgr; gpgpu-stack_b.bo = drm_intel_bo_alloc(bufmgr, STACK, size, 64); - intel_gpgpu_bind_buf(gpgpu, gpgpu-stack_b.bo, offset, 0, size, bti); + cl_gpgpu_bind_buf((cl_gpgpu)gpgpu, (cl_buffer)gpgpu-stack_b.bo, offset, 0, size, bti); } static void @@ -1427,7 +1439,7 @@ intel_gpgpu_set_printf_buf(intel_gpgpu_t *gpgpu, uint32_t i, uint32_t size, uint } memset(bo-virtual, 0, size); drm_intel_bo_unmap(bo); - intel_gpgpu_bind_buf(gpgpu, bo, offset, 0, size, bti); + cl_gpgpu_bind_buf((cl_gpgpu)gpgpu, (cl_buffer)bo, offset, 0, size, bti); return 0; } @@ -1526,6 +1538,12 @@ intel_set_gpgpu_callbacks(int device_id) cl_gpgpu_set_printf_info = (cl_gpgpu_set_printf_info_cb *)intel_gpgpu_set_printf_info; cl_gpgpu_get_printf_info = (cl_gpgpu_get_printf_info_cb *)intel_gpgpu_get_printf_info; + if (IS_BROADWELL(device_id)) { +cl_gpgpu_bind_buf = (cl_gpgpu_bind_buf_cb *)intel_gpgpu_bind_buf_gen8; +cl_gpgpu_get_cache_ctrl = (cl_gpgpu_get_cache_ctrl_cb *)intel_gpgpu_get_cache_ctrl_gen8; +return; + } + if (IS_HASWELL(device_id)) { cl_gpgpu_bind_image = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_gen75; cl_gpgpu_alloc_constant_buffer = (cl_gpgpu_alloc_constant_buffer_cb *) intel_gpgpu_alloc_constant_buffer_gen75; ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH 5/5] BDW: Add class Gen8Context.
Yes, I am also plan to change GenContext to a pure virtual class, and I think It is better to do this when optimize the long operations in BDW. -Original Message- From: He Junyan [mailto:junyan...@inbox.com] Sent: Thursday, October 9, 2014 13:09 To: Yang, Rong R Cc: beignet@lists.freedesktop.org Subject: Re: [Beignet] [PATCH 5/5] BDW: Add class Gen8Context. This patchset is OK and will not cause regression on previous platform. In this patch set, the GenEncoder will be a pure virtual class and all platform encoders will derive from it. But the GenContext still represents the Gen7 context. I think it is better to follow the same way as the encoder to make the architecture clearer. On 一, 2014-09-29 at 13:37 +0800, Yang Rong wrote: Now Gen8Context is almost same as Gen75Context, but still derive Gen8Context from GenContext for clearly. Signed-off-by: Yang Rong rong.r.y...@intel.com --- backend/src/CMakeLists.txt | 2 + backend/src/backend/gen8_context.cpp | 113 +++ backend/src/backend/gen8_context.hpp | 63 +++ backend/src/backend/gen_program.cpp | 3 + 4 files changed, 181 insertions(+) create mode 100644 backend/src/backend/gen8_context.cpp create mode 100644 backend/src/backend/gen8_context.hpp diff --git a/backend/src/CMakeLists.txt b/backend/src/CMakeLists.txt index 2daa630..c5d388e 100644 --- a/backend/src/CMakeLists.txt +++ b/backend/src/CMakeLists.txt @@ -96,6 +96,8 @@ set (GBE_SRC backend/gen_context.cpp backend/gen75_context.hpp backend/gen75_context.cpp +backend/gen8_context.hpp +backend/gen8_context.cpp backend/gen_program.cpp backend/gen_program.hpp backend/gen_program.h diff --git a/backend/src/backend/gen8_context.cpp b/backend/src/backend/gen8_context.cpp new file mode 100644 index 000..a9914f6 --- /dev/null +++ b/backend/src/backend/gen8_context.cpp @@ -0,0 +1,113 @@ +/* + * Copyright © 2012 Intel Corporation + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see http://www.gnu.org/licenses/. + * + */ + +/** + * \file gen8_context.cpp + */ + +#include backend/gen8_context.hpp +#include backend/gen8_encoder.hpp +#include backend/gen_program.hpp +#include backend/gen_defs.hpp +#include backend/gen_encoder.hpp +#include backend/gen_insn_selection.hpp +#include backend/gen_insn_scheduling.hpp +#include backend/gen_reg_allocation.hpp +#include sys/cvar.hpp +#include ir/function.hpp +#include ir/value.hpp +#include cstring + +namespace gbe +{ + void Gen8Context::emitSLMOffset(void) { +if(kernel-getUseSLM() == false) + return; + +const GenRegister slm_offset = ra-genReg(GenRegister::ud1grf(ir::ocl::slmoffset)); +const GenRegister slm_index = GenRegister::ud1grf(0, 0); +//the slm index is hold in r0.0 24-27 bit, in 4K unit, shift left 12 to get byte unit +p-push(); + p-curr.execWidth = 1; + p-curr.predicate = GEN_PREDICATE_NONE; + p-SHR(slm_offset, slm_index, GenRegister::immud(12)); +p-pop(); + } + + void Gen8Context::allocSLMOffsetCurbe(void) { +if(fn.getUseSLM()) + allocCurbeReg(ir::ocl::slmoffset, GBE_CURBE_SLM_OFFSET); } + + uint32_t Gen8Context::alignScratchSize(uint32_t size){ +if(size == 0) + return 0; +uint32_t i = 2048; +while(i size) i *= 2; +return i; + } + + void Gen8Context::emitStackPointer(void) { +using namespace ir; + +// Only emit stack pointer computation if we use a stack +if (kernel-getCurbeOffset(GBE_CURBE_STACK_POINTER, 0) = 0) + return; + +// Check that everything is consistent in the kernel code +const uint32_t perLaneSize = kernel-getStackSize(); +const uint32_t perThreadSize = perLaneSize * this-simdWidth; +GBE_ASSERT(perLaneSize 0); +GBE_ASSERT(isPowerOf2(perLaneSize) == true); +GBE_ASSERT(isPowerOf2(perThreadSize) == true); + +// Use shifts rather than muls which are limited to 32x16 bit sources +const uint32_t perLaneShift = logi2(perLaneSize); +const uint32_t perThreadShift = logi2(perThreadSize); +const GenRegister selStatckPtr = this-simdWidth == 8 ? + GenRegister::ud8grf(ir::ocl::stackptr) : + GenRegister::ud16grf(ir::ocl::stackptr); +const GenRegister stackptr =
Re: [Beignet] [PATCH 1/6] BDW: Add gen8 into intel_driver_init
This patch itself LGTM. But I will move it after other patches. Thus once this patch get applied, it can accept Gen8 device and work as expected. Thanks. On Mon, Sep 29, 2014 at 01:37:44PM +0800, Yang Rong wrote: From: Junyan He junyan...@linux.intel.com Signed-off-by: Junyan He junyan...@linux.intel.com --- src/cl_command_queue.c | 2 +- src/intel/intel_driver.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index 4cbb4eb..48deba0 100644 --- a/src/cl_command_queue.c +++ b/src/cl_command_queue.c @@ -410,7 +410,7 @@ cl_command_queue_ND_range(cl_command_queue queue, } #endif /* USE_FULSIM */ - if (ver == 7 || ver == 75) + if (ver == 7 || ver == 75 || ver == 8) TRY (cl_command_queue_ND_range_gen7, queue, k, work_dim, global_wk_off, global_wk_sz, local_wk_sz); else FATAL (Unknown Gen Device); diff --git a/src/intel/intel_driver.c b/src/intel/intel_driver.c index 66f2bcf..2c2ed5f 100644 --- a/src/intel/intel_driver.c +++ b/src/intel/intel_driver.c @@ -183,7 +183,9 @@ intel_driver_init(intel_driver_t *driver, int dev_fd) else FATAL (Unsupported Gen for emulation); #else - if (IS_GEN75(driver-device_id)) + if (IS_GEN8(driver-device_id)) +driver-gen_ver = 8; + else if (IS_GEN75(driver-device_id)) driver-gen_ver = 75; else if (IS_GEN7(driver-device_id)) driver-gen_ver = 7; -- 1.8.3.2 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH 1/3] GBE: Fix a bug when setting flag register
This patch LGTM, will push latter, thanks. On Fri, Oct 10, 2014 at 03:01:25PM +0800, Ruiling Song wrote: we should use simd1, instead of simd8/simd16. Signed-off-by: Ruiling Song ruiling.s...@intel.com --- backend/src/backend/gen_context.cpp | 20 ++-- backend/src/backend/gen_context.hpp |1 + 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp index 8844233..245c318 100644 --- a/backend/src/backend/gen_context.cpp +++ b/backend/src/backend/gen_context.cpp @@ -763,14 +763,14 @@ namespace gbe p-SHL(c, e, a); p-SHL(d, f, a); p-OR(e, d, b); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); p-SEL(d, d, e); p-curr.predicate = GEN_PREDICATE_NONE; p-AND(a, a, GenRegister::immud(32)); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); @@ -791,14 +791,14 @@ namespace gbe p-SHR(c, f, a); p-SHR(d, e, a); p-OR(e, d, b); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); p-SEL(d, d, e); p-curr.predicate = GEN_PREDICATE_NONE; p-AND(a, a, GenRegister::immud(32)); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); @@ -820,7 +820,7 @@ namespace gbe p-ASR(c, f, a); p-SHR(d, e, a); p-OR(e, d, b); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); @@ -828,7 +828,7 @@ namespace gbe p-curr.predicate = GEN_PREDICATE_NONE; p-AND(a, a, GenRegister::immud(32)); p-ASR(f, f, GenRegister::immd(31)); -p-MOV(flagReg, GenRegister::immuw(0x)); +setFlag(flagReg, GenRegister::immuw(0x)); p-curr.predicate = GEN_PREDICATE_NORMAL; p-curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p-CMP(GEN_CONDITIONAL_Z, a, zero); @@ -842,6 +842,14 @@ namespace gbe NOT_IMPLEMENTED; } } + void GenContext::setFlag(GenRegister flagReg, GenRegister src) { +p-push(); +p-curr.noMask = 1; +p-curr.execWidth = 1; +p-curr.predicate = GEN_PREDICATE_NONE; +p-MOV(flagReg, src); +p-pop(); + } void GenContext::saveFlag(GenRegister dest, int flag, int subFlag) { p-push(); diff --git a/backend/src/backend/gen_context.hpp b/backend/src/backend/gen_context.hpp index 4a01fd5..7a51f57 100644 --- a/backend/src/backend/gen_context.hpp +++ b/backend/src/backend/gen_context.hpp @@ -116,6 +116,7 @@ namespace gbe void I64FullAdd(GenRegister high1, GenRegister low1, GenRegister high2, GenRegister low2); void I32FullMult(GenRegister high, GenRegister low, GenRegister src0, GenRegister src1); void I64FullMult(GenRegister dst1, GenRegister dst2, GenRegister dst3, GenRegister dst4, GenRegister x_high, GenRegister x_low, GenRegister y_high, GenRegister y_low); +void setFlag(GenRegister flag, GenRegister src); void saveFlag(GenRegister dest, int flag, int subFlag); void UnsignedI64ToFloat(GenRegister dst, GenRegister high, GenRegister low, GenRegister exp, GenRegister mantissa, GenRegister tmp, GenRegister flag); -- 1.7.10.4 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH] Fix HSW thread_n = 64 assert.
In function cl_get_kernel_max_wg_sz, hsw's thread count may large than 64, add a max limit. Signed-off-by: Yang Rong rong.r.y...@intel.com --- src/cl_device_id.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/cl_device_id.c b/src/cl_device_id.c index a0d0db6..7944ca4 100644 --- a/src/cl_device_id.c +++ b/src/cl_device_id.c @@ -633,7 +633,7 @@ cl_check_builtin_kernel_dimension(cl_kernel kernel, cl_device_id device) LOCAL size_t cl_get_kernel_max_wg_sz(cl_kernel kernel) { - size_t work_group_size; + size_t work_group_size, thread_cnt; int simd_width = interp_kernel_get_simd_width(kernel-opaque); int vendor_id = kernel-program-ctx-device-vendor_id; if (!interp_kernel_use_slm(kernel-opaque)) { @@ -642,9 +642,13 @@ cl_get_kernel_max_wg_sz(cl_kernel kernel) else work_group_size = kernel-program-ctx-device-max_compute_unit * kernel-program-ctx-device-max_thread_per_unit * simd_width; - } else -work_group_size = kernel-program-ctx-device-max_compute_unit * simd_width * + } else { +thread_cnt = kernel-program-ctx-device-max_compute_unit * kernel-program-ctx-device-max_thread_per_unit / kernel-program-ctx-device-sub_slice_count; +if(thread_cnt 64) + thread_cnt = 64; +work_group_size = thread_cnt * simd_width; + } return work_group_size; } -- 1.8.3.2 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH] Fix AUX buffer for really page aligned
three comments in line, thanks. -Original Message- From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Zhenyu Wang Sent: Wednesday, October 22, 2014 4:11 PM To: beignet@lists.freedesktop.org Subject: [Beignet] [PATCH] Fix AUX buffer for really page aligned Apply ALIGN() for aux buffer size from beginning has no effect on final target size. Move to the end of all state offsets set for alignment. Signed-off-by: Zhenyu Wang zhen...@linux.intel.com --- src/intel/intel_gpgpu.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index 105a077..98b32bf 100644 --- a/src/intel/intel_gpgpu.c +++ b/src/intel/intel_gpgpu.c @@ -759,8 +759,6 @@ intel_gpgpu_state_init(intel_gpgpu_t *gpgpu, dri_bo_unreference(gpgpu-aux_buf.bo); gpgpu-aux_buf.bo = NULL; - //surface heap must be 4096 bytes aligned because state base address use 20bit for the address - size_aux = ALIGN(size_aux, 4096); [Yejun] Yes, there is no effect at the beginning. Actually, the code here is to highlight the alignment requirement, so future maintainer knows it especially if he wants to move the layout from the beginning to the middle. gpgpu-aux_offset.surface_heap_offset = size_aux; size_aux += sizeof(surface_heap_t); @@ -784,7 +782,10 @@ intel_gpgpu_state_init(intel_gpgpu_t *gpgpu, gpgpu-aux_offset.sampler_border_color_state_offset = size_aux; size_aux += GEN_MAX_SAMPLERS * sizeof(gen7_sampler_border_color_t); [Yejun] aux buffer contains several parts, each part has its own alignment requirement, so we can see several 'ALIGN' in above code. The alignment is handled for each part, it is not feasible to move one/all of them to the last. - bo = dri_bo_alloc(gpgpu-drv-bufmgr, AUX_BUFFER, size_aux, 0); + //surface heap must be 4096 bytes aligned because state base address + use 20bit for the address size_aux = ALIGN(size_aux, 4096); + + bo = dri_bo_alloc(gpgpu-drv-bufmgr, AUX_BUFFER, size_aux, 4096); [Yejun] the last parameter '4096' is to control the size of the whole buffer (to be page aligned), not to meet the align requirement of the base address, and it is not explicitly required and is ignored in function drm_intel_gem_bo_alloc_internal. if (!bo || dri_bo_map(bo, 1) != 0) { fprintf(stderr, %s:%d: %s.\n, __FILE__, __LINE__, strerror(errno)); if (bo) -- 2.1.1 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] Problems with recent beignet
On Sat, Oct 18, 2014 at 04:26:02PM +0200, Martin Hauke wrote: Hi Zhigang, On 18.10.2014 04:36, you wrote: building git master on OpenSUSE 13.1 actually fails since this OpenSUSE version ships only with libdrm2-2.4.46-3.2.2 As you may know, we are working on enable BDW. And the new version libdrm is required for that purpose. Considering libdrm 2.4.52 was released more than half year ago, we didn't think this requirement is a real issue. If there is a strong reason that there are systems won't upgrade to the libdrm 2.4.52 or newer. Please let us know. Thanks. OpenSUSE 13.2 will be released in the next weeks and is will be based on current OpenSUSE Factory packages so it will ship at least libdrm 2.4.58 For OpenSUSE 13.1 users i've included libdrm 2.4.58 in my obs repositories: https://build.opensuse.org/package/show/home:mnhauke:opencl:stable/beignet https://build.opensuse.org/package/show/home:mnhauke:opencl:testing/beignet OpenSUSE 13.1 ships LLVM 3.3 and building beignet git master still fails: https://build.opensuse.org/package/live_build_log/home:mnhauke:opencl:testing/beignet/openSUSE_13.1/x86_64 snip--- backend/src/llvm/llvm_unroll.cpp:47:32: fatal error: llvm/IR/Dominators.h: No such file or directory snip--- Is LLVM 3.3 still supported with recent beignet versions? LLVM 3.3 should be supported. If it was broken, there is a bug. I just pushed a patch which could fix the problem you met above. Thanks, Zhigang Gong. ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH V2] Add the disasm support for Gen8
From: Junyan He junyan...@linux.intel.com Signed-off-by: Junyan He junyan...@linux.intel.com --- backend/src/backend/gen/gen_mesa_disasm.c | 1110 +++-- backend/src/backend/gen_defs.hpp |2 + 2 files changed, 582 insertions(+), 530 deletions(-) diff --git a/backend/src/backend/gen/gen_mesa_disasm.c b/backend/src/backend/gen/gen_mesa_disasm.c index 2412404..876bbeb 100644 --- a/backend/src/backend/gen/gen_mesa_disasm.c +++ b/backend/src/backend/gen/gen_mesa_disasm.c @@ -1,4 +1,4 @@ -/* +/* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or @@ -67,7 +67,6 @@ static const struct { [GEN_OPCODE_LZD] = { .name = lzd, .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_FBH] = { .name = fbh, .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_FBL] = { .name = fbl, .nsrc = 1, .ndst = 1 }, - [GEN_OPCODE_CBIT] = { .name = cbit, .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_F16TO32] = { .name = f16to32, .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_F32TO16] = { .name = f32to16, .nsrc = 1, .ndst = 1 }, @@ -143,7 +142,7 @@ static const char *_abs[2] = { [1] = (abs), }; -static const char *vert_stride[16] = { +static const char *vert_stride_gen7[16] = { [0] = 0, [1] = 1, [2] = 2, @@ -153,6 +152,15 @@ static const char *vert_stride[16] = { [6] = 32, [15] = VxH, }; +static const char *vert_stride_gen8[16] = { + [0] = 0, + [1] = 1, + [2] = 2, + [3] = 4, + [4] = 8, + [5] = 16, + [6] = 32, +}; static const char *width[8] = { [0] = 1, @@ -234,11 +242,17 @@ static const char *pred_ctrl_align1[16] = { [11] = .all16h, }; -static const char *thread_ctrl[4] = { +static const char *thread_ctrl_gen7[4] = { + [0] = , + [2] = switch +}; +static const char *thread_ctrl_gen8[4] = { [0] = , + [1] = atomic, [2] = switch }; + static const char *dep_ctrl[4] = { [0] = , [1] = NoDDClr, @@ -246,11 +260,6 @@ static const char *dep_ctrl[4] = { [3] = NoDDClr,NoDDChk, }; -static const char *mask_ctrl[4] = { - [0] = , - [1] = nomask, -}; - static const char *access_mode[2] = { [0] = align1, [1] = align16, @@ -351,7 +360,7 @@ static const char *gateway_sub_function[8] = { [7] = reserved }; -static const char *math_function[16] = { +static const char *math_function_gen7[16] = { [GEN_MATH_FUNCTION_INV] = inv, [GEN_MATH_FUNCTION_LOG] = log, [GEN_MATH_FUNCTION_EXP] = exp, @@ -365,6 +374,20 @@ static const char *math_function[16] = { [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT] = intdiv, [GEN_MATH_FUNCTION_INT_DIV_REMAINDER] = intmod, }; +static const char *math_function_gen8[16] = { + [GEN_MATH_FUNCTION_INV] = inv, + [GEN_MATH_FUNCTION_LOG] = log, + [GEN_MATH_FUNCTION_EXP] = exp, + [GEN_MATH_FUNCTION_SQRT] = sqrt, + [GEN_MATH_FUNCTION_RSQ] = rsq, + [GEN_MATH_FUNCTION_SIN] = sin, + [GEN_MATH_FUNCTION_COS] = cos, + [GEN_MATH_FUNCTION_FDIV] = fdiv, + [GEN_MATH_FUNCTION_POW] = pow, + [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER] = intdivmod, + [GEN8_MATH_FUNCTION_INVM] = invm, + [GEN8_MATH_FUNCTION_RSQRTM] = rsqrtm, +}; static const char *math_saturate[2] = { [0] = , @@ -452,14 +475,82 @@ static const char *data_port1_data_cache_msg_type[] = { static int column; -static int string (FILE *file, const char *string) +static int gen_version; + +#define GEN_BITS_FIELD(inst, gen) \ + ({\ +int bits; \ +if (gen_version 80) \ + bits = ((const union Gen7NativeInstruction *)inst)-gen; \ +else\ + bits = ((const union Gen8NativeInstruction *)inst)-gen; \ +bits; \ + }) + +#define GEN_BITS_FIELD2(inst, gen7, gen8) \ + ({\ +int bits; \ +if (gen_version 80) \ + bits = ((const union Gen7NativeInstruction *)inst)-gen7; \ +else\ + bits = ((const union Gen8NativeInstruction *)inst)-gen8; \ +bits; \ + }) + +#define PRED_CTRL(inst)GEN_BITS_FIELD(inst, header.predicate_control) +#define PRED_INV(inst) GEN_BITS_FIELD(inst, header.predicate_inverse) +#define FLAG_REG_NR(inst) GEN_BITS_FIELD2(inst, bits2.da1.flag_reg_nr, bits1.da1.flag_reg_nr) +#define FLAG_SUB_REG_NR(inst) GEN_BITS_FIELD2(inst, bits2.da1.flag_sub_reg_nr, bits1.da1.flag_sub_reg_nr) +#define ACCESS_MODE(inst) GEN_BITS_FIELD(inst, header.access_mode) +#define MASK_CONTROL(inst) GEN_BITS_FIELD2(inst, header.mask_control, bits1.da1.mask_control) +#define
Re: [Beignet] Which version of mesa that Beignet could compile with?
On Sun, 2014-10-19 at 22:15 +0800, Boxiang Sun wrote: Did you set the 'MESA_SOURCE_FOUND' to '1' and the path of mesa source directory in Beignet cmake file? I'm pretty sure that the current master could not compiles with Mesa 10.3.x if without some modifications. Regards, Sun 2014-10-19 3:13 GMT+08:00 Yichao Yu yyc1...@gmail.com: On Sat, Oct 18, 2014 at 11:44 AM, Boxiang Sun daetalu...@gmail.com wrote: Hi, I know the Beignet could not been build with current version of Really? Both the current master and the 0.9.3 release compiles with mesa 10.3.1 just fine here. mesa(10.3.x5). But I tired the 9.2.5. It still could not been build. So which version of mesa that the current Beignet could been build? I mean could build successfully without any modification. I've not tested recently, but the Beignet Mesa EGL extension has been broken for a long time. I looked into fixing it, but it's a big job, and really needs to be worked on by an experienced Mesa developer as it requires support from the Mesa side. The (currently?) broken version was a bit hacky and depended on Mesa functionality that has since been removed. ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH] Fix AUX buffer for really page aligned
On 2014.10.22 08:49:53 +, Guo, Yejun wrote: three comments in line, thanks. Thanks for review this. gpgpu-aux_offset.surface_heap_offset = size_aux; size_aux += sizeof(surface_heap_t); @@ -784,7 +782,10 @@ intel_gpgpu_state_init(intel_gpgpu_t *gpgpu, gpgpu-aux_offset.sampler_border_color_state_offset = size_aux; size_aux += GEN_MAX_SAMPLERS * sizeof(gen7_sampler_border_color_t); [Yejun] aux buffer contains several parts, each part has its own alignment requirement, so we can see several 'ALIGN' in above code. The alignment is handled for each part, it is not feasible to move one/all of them to the last. The goal is to make sure final aux buffer size is page aligned after adding up all required aligned space size. My patch only counts on final size. - bo = dri_bo_alloc(gpgpu-drv-bufmgr, AUX_BUFFER, size_aux, 0); + //surface heap must be 4096 bytes aligned because state base address + use 20bit for the address size_aux = ALIGN(size_aux, 4096); + + bo = dri_bo_alloc(gpgpu-drv-bufmgr, AUX_BUFFER, size_aux, 4096); [Yejun] the last parameter '4096' is to control the size of the whole buffer (to be page aligned), not to meet the align requirement of the base address, and it is not explicitly required and is ignored in function drm_intel_gem_bo_alloc_internal. It's for 'alignment' parameter. Yes current libdrm will optimize it as to be page aligned, but as your comment above it's a good note to say that we need aux buffer allocation to be page aligned, like for other objects, printf buf, timestamp, etc. -- Open Source Technology Center, Intel ltd. $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 signature.asc Description: Digital signature ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH v2 1/5] Make use of write enable flag for mem bo map
This patchset LGTM, thanks. -Original Message- From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Zhenyu Wang Sent: Thursday, October 23, 2014 15:19 To: beignet@lists.freedesktop.org Subject: [Beignet] [PATCH v2 1/5] Make use of write enable flag for mem bo map Use drm/intel optimization for mem bo mapping in case of read or write. So we could be possibly waiting less. This also adds 'map_flags' check in clEnqueueMapBuffer/clEnqueueMapImage for actual read or write mapping. But currently leave clMapBufferIntel untouched which might break ABI/API. v2: Fix write_map flag in clEnqueueMapBuffer/clEnqueueMapImage. Signed-off-by: Zhenyu Wang zhen...@linux.intel.com --- src/cl_api.c | 6 +- src/cl_command_queue.c | 2 +- src/cl_enqueue.c | 18 +- src/cl_enqueue.h | 1 + src/cl_mem.c | 18 +- src/cl_mem.h | 4 ++-- 6 files changed, 27 insertions(+), 22 deletions(-) diff --git a/src/cl_api.c b/src/cl_api.c index 8a2e999..05d3093 100644 --- a/src/cl_api.c +++ b/src/cl_api.c @@ -2653,6 +2653,8 @@ clEnqueueMapBuffer(cl_command_queue command_queue, data-size= size; data-ptr = ptr; data-unsync_map = 1; + if (map_flags (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION)) +data-write_map = 1; if(handle_events(command_queue, num_events_in_wait_list, event_wait_list, event, data, CL_COMMAND_MAP_BUFFER) == CL_ENQUEUE_EXECUTE_IMM) { @@ -2735,6 +2737,8 @@ clEnqueueMapImage(cl_command_queue command_queue, data-region[0] = region[0]; data-region[1] = region[1]; data-region[2] = region[2]; data-ptr = ptr; data-unsync_map = 1; + if (map_flags (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION)) +data-write_map = 1; if(handle_events(command_queue, num_events_in_wait_list, event_wait_list, event, data, CL_COMMAND_MAP_IMAGE) == CL_ENQUEUE_EXECUTE_IMM) { @@ -3203,7 +3207,7 @@ clMapBufferIntel(cl_mem mem, cl_int *errcode_ret) void *ptr = NULL; cl_int err = CL_SUCCESS; CHECK_MEM (mem); - ptr = cl_mem_map(mem); + ptr = cl_mem_map(mem, 1); error: if (errcode_ret) *errcode_ret = err; diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index 48deba0..d07774f 100644 --- a/src/cl_command_queue.c +++ b/src/cl_command_queue.c @@ -336,7 +336,7 @@ cl_fulsim_read_all_surfaces(cl_command_queue queue, cl_kernel k) assert(mem-bo); chunk_n = cl_buffer_get_size(mem-bo) / chunk_sz; chunk_remainder = cl_buffer_get_size(mem-bo) % chunk_sz; -to = cl_mem_map(mem); +to = cl_mem_map(mem, 1); for (j = 0; j chunk_n; ++j) { char name[256]; sprintf(name, dump%03i.bmp, curr); diff --git a/src/cl_enqueue.c b/src/cl_enqueue.c index af118ad..2e43122 100644 --- a/src/cl_enqueue.c +++ b/src/cl_enqueue.c @@ -38,7 +38,7 @@ cl_int cl_enqueue_read_buffer(enqueue_data* data) void* src_ptr; struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; - if (!(src_ptr = cl_mem_map_auto(data-mem_obj))) { + if (!(src_ptr = cl_mem_map_auto(data-mem_obj, 0))) { err = CL_MAP_FAILURE; goto error; } @@ -66,7 +66,7 @@ cl_int cl_enqueue_read_buffer_rect(enqueue_data* data) mem-type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; - if (!(src_ptr = cl_mem_map_auto(mem))) { + if (!(src_ptr = cl_mem_map_auto(mem, 0))) { err = CL_MAP_FAILURE; goto error; } @@ -112,7 +112,7 @@ cl_int cl_enqueue_write_buffer(enqueue_data *data) struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; void* dst_ptr; - if (!(dst_ptr = cl_mem_map_auto(data-mem_obj))) { + if (!(dst_ptr = cl_mem_map_auto(data-mem_obj, 1))) { err = CL_MAP_FAILURE; goto error; } @@ -140,7 +140,7 @@ cl_int cl_enqueue_write_buffer_rect(enqueue_data *data) mem-type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; - if (!(dst_ptr = cl_mem_map_auto(mem))) { + if (!(dst_ptr = cl_mem_map_auto(mem, 1))) { err = CL_MAP_FAILURE; goto error; } @@ -188,7 +188,7 @@ cl_int cl_enqueue_read_image(enqueue_data *data) const size_t* origin = data-origin; const size_t* region = data-region; - if (!(src_ptr = cl_mem_map_auto(mem))) { + if (!(src_ptr = cl_mem_map_auto(mem, 0))) { err = CL_MAP_FAILURE; goto error; } @@ -231,7 +231,7 @@ cl_int cl_enqueue_write_image(enqueue_data *data) cl_mem mem = data-mem_obj; CHECK_IMAGE(mem, image); - if (!(dst_ptr = cl_mem_map_auto(mem))) { + if (!(dst_ptr = cl_mem_map_auto(mem, 1))) { err = CL_MAP_FAILURE; goto error; } @@ -260,7 +260,7 @@ cl_int cl_enqueue_map_buffer(enqueue_data *data) //because using unsync map in clEnqueueMapBuffer, so force use map_gtt
Re: [Beignet] [PATCH 3/5] Remove intel_gpgpu_check_binded_buf_address()
This assertion is just to make sure we will not get a NULL pointer for a normal buffer. The OpenCL spec doesn't give a very specific statement about a NULL buffer object. But it does allow to pass a NULL to a buffer object. Thus some one may implement the following kernel: __kernel foo( global uint * input, global uint *output1, global uint *output2) { ... if (output1) output1[get_global_id(0)] = result0; if (output2) output2[get_global_id(1)] = result1; } If we pass in a NULL output1 which should be a normal allocated buffer, it breaks the above code. Without PPGTT, this assertion works fine till now. If with PPGTT, we may hit this assertion, but we can't just simply remove this assertion. Do you have good suggestion to always avoid allocate 0 offset for a valid buffer object? All the other patches in this series LGTM, I will push them latter. Let's defer this one to next week. Thanks, Zhigang Gong. On Thu, Oct 23, 2014 at 03:19:24PM +0800, Zhenyu Wang wrote: On recent kernel with full PPGTT support, we can possibly bind buffer offset with 0, but intel_gpgpu_check_binded_buf_address() always thinks it's invalid, which is not true. So simply remove the check. Signed-off-by: Zhenyu Wang zhen...@linux.intel.com --- src/intel/intel_gpgpu.c | 9 - 1 file changed, 9 deletions(-) diff --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index 6cd73d6..b7958d5 100644 --- a/src/intel/intel_gpgpu.c +++ b/src/intel/intel_gpgpu.c @@ -694,14 +694,6 @@ intel_gpgpu_batch_reset(intel_gpgpu_t *gpgpu, size_t sz) { return intel_batchbuffer_reset(gpgpu-batch, sz); } -/* check we do not get a 0 starting address for binded buf */ -static void -intel_gpgpu_check_binded_buf_address(intel_gpgpu_t *gpgpu) -{ - uint32_t i; - for (i = 0; i gpgpu-binded_n; ++i) -assert(gpgpu-binded_buf[i]-offset != 0); -} static void intel_gpgpu_flush_batch_buffer(intel_batchbuffer_t *batch) @@ -717,7 +709,6 @@ intel_gpgpu_flush(intel_gpgpu_t *gpgpu) if (!gpgpu-batch || !gpgpu-batch-buffer) return; intel_gpgpu_flush_batch_buffer(gpgpu-batch); - intel_gpgpu_check_binded_buf_address(gpgpu); } static int -- 2.1.1 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] Performances of beignet ... macbook pro 13
Jérôme, Thanks for sharing the very interesting performance data. We will look into the large image issue. Thanks, Zhigang Gong. ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] johntheripper/OpenCL clGetEventProfilingInfo issue
This should be an application bug, according to OpenCL 1.2 spec: CL_PROFILING_INFO_NOT_AVAILABLE if the CL_QUEUE_PROFILING_ENABLE flag is not set for the command-queue, if the execution status of the command identified by event is not CL_COMPLETE or if event is a user event object. To make sure an event's state to be CL_COMPLETE, you need to call clWaitForEvents() rather than clFinish(). According to spec, clFinish() is used to : blocks until all previously queued OpenCL commands in command_queue are issued to the associated device and have completed. It is not to update all the related event's state. And it is too heavy, as it will wait for the command to be completed. The event's CL_COMPLETE state means the command has been flushed into the GPU's command buffer and may haven't completed. It's used to do GPU command queue side synchronization. clFinish() is to synchronize with host CPU. I would recommend you to call clWaitForEvents before you call the clGetEventProfilingInfo(). If you still met problems with that change, please let us know. Thanks, Zhigang Gong. On Sat, Oct 25, 2014 at 6:13 AM, Oleksii Shevchuk public.ava...@gmail.com wrote: Hi list! I'm trying to use beignet (67650189c145a65addffdcd4d8ff452709bd4149) with johntheripper (ceb8aa1899817965afd26ffb651932e475a7bbc7 at https://github.com/magnumripper/JohnTheRipper) on i7/HD4000. Test suite reports that tests are passed: . compiler_abs_diff_int()Instruction (#42) src too large pooloffset 11 Instruction (#20) src too large pooloffset 11 [SUCCESS] . [SUCCESS] double_precision_check() - WARN: GPU doesn't have correct double precision. Got 9.995699E-05, expected 0.000101 summary: -- total: 421 run: 421 pass: 419 fail: 0 pass rate: 1.00 But with john I ran into the next problem: john -format=rar-opencl -te Will run 4 OpenMP threads Device 0: Intel(R) HD Graphics IvyBridge M GT2 OpenCL error (CL_PROFILING_INFO_NOT_AVAILABLE) in file (common-opencl.c) at line (1210) - (Failed in clGetEventProfilingInfo I) Here is this place: https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/src/common-opencl.c#L1205 According to the cl_api.c, in the clGetEventProfilingInfo, status is not equals to CL_COMPLETE, and all other conditions were ok. I add this to cl_api.c / clGetEventProfilingInfo fprintf(stderr, ER: %d? %d? %d?\n,. event-type == CL_COMMAND_USER, !(event-queue-props CL_QUEUE_PROFILING_ENABLE),. event-status != CL_COMPLETE); And result was: LD_PRELOAD=/home/avatar/Software/beignet-build/src/libcl.so ../run/john -format=encfs-opencl -te Will run 4 OpenMP threads Device 0: Intel(R) HD Graphics IvyBridge M GT2 ER: 0? 0? 1? OpenCL error (CL_PROFILING_INFO_NOT_AVAILABLE) in file (common-opencl.c) at line (1211) - (Failed in clGetEventProfilingInfo I) According to the documentation, before requesting clGetEventProfilingInfo, the queue should be created with CL_QUEUE_PROFILING_ENABLE. So I add clFinish after all queues creation with this flag. Result was the same. So, the question is, are there problems in the beignet implementation, or in their code? Thanks. // wbr // alxchk ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] beignet not working with Wolfram Mathematica
As you are using 3.16.x kernel, the most possible solution is in the README.md's known issue section: Almost all unit tests fail on Linux kernel 3.15/3.16. There is a known issue in some versions of linux kernel which enable register whitelist feature but miss some necessary registers which are required for beignet. The problematic version are around 3.15 and 3.16 which have commit f0a346b... but haven't commit c9224f... If it is the case, you can apply c9224f... manually and rebuild the kernel or just disable the parse command by invoke the following command (use Ubuntu as an example): # echo 0 /sys/module/i915/parameters/enable_cmd_parser And I strongly recommend you to use the latest git master version beignet. If you still have problem, please feel free to report here. Thanks. On Sun, Oct 5, 2014 at 2:22 AM, Lorenzo Pistone blaffabla...@gmail.com wrote: Hello, I'm trying to run beignet from Wolfram Mathematica 10.0.1, I have beignet 0.9.2, Fedora 20 x86_64, kernel 3.16.3, i5-3230M. First, there is an issue which is probably to be blamed on Mathematica (you need to manually load /usr/lib64/beignet/libcl.so with LibraryLoad), I suppose that Mathematica doesn't know about the icd system. Anyway, after that, this test program src = __kernel void myKernel( __global mint * global0Id, __global mint * global1Id, mint width, mint height) { int xIndex = get_global_id(0); int yIndex = get_global_id(1); int index = xIndex + yIndex*width; if (xIndex width yIndex height) { global0Id[index] = get_local_id(0); global1Id[index] = get_local_id(1); } }; which is suggested in the examples of the Mathematica implementation (http://reference.wolfram.com/language/OpenCLLink/tutorial/Programming.html) produces all zeroes. If instead the pocl driver is loaded, the kernel is executed correctly. Also I could run the LuxMark benchmarks (even though on some tests I see yellow spots that I believe are glitches). I am sorry if the question is a bit vague, but I don't know much about OpenCL and I indeed wanted to start learning while using Mathematica, which is a tool that I already need for other reasons. Lorenzo ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet