Hello pocl-developers,

I've had some success implementing part of a new device driver in
pocl, where I'll be running on an x86_64 host and the device will be
ARM-ish based.

I've got a laundry list of questions, but overall it's gone quite
smoothly, and has been a lot of fun.

--------------------------------------------------------------------------------

It seems that cl_device_id.uninit() is never called?

--------------------------------------------------------------------------------

I notice in devices.c:pocl_init_devices(), that there is either an
environment variable:

    device_list = getenv(POCL_DEVICES_ENV);

or a fallback to a hard coded list:

    device_list = <list>;

I was wondering what the thoughts were on adding a device API like
.discover() or .probe() so that the driver can report whether it found
any of its devices ?

i.e. something like:

    pocl_device_types[i].discover();

Also, I've been toying with the idea of having multiple devices of the
same type, e.g. like if you plugged two GPUs into the same board. Any
thoughts on that?

--------------------------------------------------------------------------------

I'm building the host to be staged installed on x86_64, and when I
installed pocl onto my staged system, I've got this ICD file:

    bash$ cat /etc/OpenCL/vendors/pocl.icd
    libpocl.so.1.2.0

yet annoyingly the ICD can't find the pocl lib when it does dlopen()
on it since dlopen only checks:

       o   The cache file  /etc/ld.so.cache  (maintained  by  ldconfig(8))
 is
           checked to see whether it contains an entry for filename.

       o   The directories /lib and /usr/lib are searched (in that order).

And I have:

    bash$ ldconfig -p | grep pocl
        libpoclu.so.1 (libc6,x86-64) => /usr/local/lib/libpoclu.so.1
        libpoclu.so (libc6,x86-64) => /usr/local/lib/libpoclu.so
        libpocl.so.1 (libc6,x86-64) => /usr/local/lib/libpocl.so.1
        libpocl.so (libc6,x86-64) => /usr/local/lib/libpocl.so

Changing pocl.icd.in to:

    @libdir@/libpocl.so.VER

Seemed to help me out there...

--------------------------------------------------------------------------------

Looking at this hunk in lib/CL/clCreateBuffer.c:

      device_ptr = device->malloc(device->data, flags, size, host_ptr);
...
      if (flags & (CL_MEM_ALLOC_HOST_PTR | CL_MEM_USE_HOST_PTR))
        mem->mem_host_ptr = host_ptr;

I'm confused by the the _ALLOC_HOST_PTR. From the OpenCL doc:

    This flag specifies that the application wants the OpenCL
    implementation to allocate memory from host accessible memory.

i.e. since I don't see host_ptr set anywhere, and I'm pretty sure
host_ptr will be NULL in this case?

--------------------------------------------------------------------------------

What are your thoughts on debian and/or rpm packaging of libpocl ?

--------------------------------------------------------------------------------

I noticed cellspu.c seems to be using:

      al = &(kernel->dyn_arguments[i]);

Whereas all the other devices are using run.arguments ?

--------------------------------------------------------------------------------

There is this hunk in the OpenCL spec:

    If the argument is declared with the __local qualifier, the
    arg_value entry must be NULL

yet I can generate:

    arg[0]: Local arg size 8 (0x92d838)

Should this be detectable as an error somewhere?

i.e. the user was able to call:

    clSetKernelArg(kernel, idx, sizeof(ptr), ptr);

For a __local void *ptr argument.

--------------------------------------------------------------------------------

When I compile, I see a few gcc warnings fly past, any plans for
"-Wall -Werror" ?

--------------------------------------------------------------------------------

I notice in pocl_device.h there is :

    typedef void (*pocl_workgroup) (void **, struct pocl_context *);

Leading to:

    void *arguments[kernel->num_args + kernel->num_locals];

    w (arguments, pc);

Does this mean all arguments are effectively promoted to sizeof(void *)?

i.e. I guess there is no problem with 16-bit and 8-bit scalar
arguments here.

What happens if the host has 64-bit pointers and the device has 32-bit
pointers ? I guess *I* should carefully setup the argument list before
sending to the device and construct it in 32-bit quantity.

--------------------------------------------------------------------------------

As I'm building the host to be staged installed on x86_64, as well as
compiling an arm target, so I'm passing the following to configure:

    LLVM_CONFIG=/home/alun/local-llvm/usr/local/bin/llvm-config

(Since I'm doing a DESTDIR:=~/local-llvm/ install, then tarball)

This of course means that I have a problem with @CLANG@

    config.h:

    /* LLVM compiler executable. */
    #define LLC "/home/alun/local-llvm/usr/local/bin/llc"

    /* clang executable. */
    #define CLANG "/home/alun/local-llvm/usr/local/bin/clang"


So far I've just been hacking the scripts/pocl-* but some nicer
solution would be good...?

- I guess I don't care about these paths:

*** lib/CL/devices/common.c:llvm_codegen()
    CLANG " -target %s %s -c -o %s.o %s",
    LLC " " HOST_LLC_FLAGS " -o %s %s",
    LINK_CMD " -target "OCL_KERNEL_TARGET

Unless I begin to try to call llvm_codegen() from my new
device.run()... but more about that in a minute...

--------------------------------------------------------------------------------

So that was the trivial stuff :) Now onto the harder issue.

I'm reasonably familiar with the autoconf host/build/target
selections, but I know it's a pain when you have multiple targets,
i.e. like when I've been building LLVM with:

    --enable-targets=arm,x86,x86_64

For building pocl, it seems like the goal of configure.ac is to get
this triple of variables:

    OCL_TARGETS, KERNEL_DIR, OCL_KERNEL_TARGET

And the default AC $TARGET stuff is unused.


KERNEL_DIR/ OCL_KERNEL_TARGET then get used as the llvm_target_triplet
and such in :

    lib/CL/devices/basic/basic.h
    lib/CL/devices/pthread/pocl-pthread.h


But they also get used in

*** lib/CL/devices/common.c:llvm_codegen()
    CLANG " -target %s %s -c -o %s.o %s",
    LLC " " HOST_LLC_FLAGS " -o %s %s",
    LINK_CMD " -target "OCL_KERNEL_TARGET

- certainly meaning I can use llvm_codegen() unless I refactor it to
  requre the target and options passed in.

Then I end up with a problem in lib/kernel/arm/Makefile.am

Since I have:

    KERNEL_TARGET=@OCL_KERNEL_TARGET@
    TARGET_DIR=arm

and that'll get me issues like:

,----
| /home/alun/local-llvm/usr/local/bin/clang -Xclang
-ffake-address-space-map -emit-llvm   -fsigned-char -c -target
x86_64-unknown-linux-gnu -o add_sat.cl.bc -x cl ./../add_sat.cl -include
../../../include/arm/types.h -include
/home/alun/work/pocl-2/include/_kernel.h
| In file included from <built-in>:158:
| In file included from <command line>:2:
| /home/alun/work/pocl-2/include/_kernel.h:223:1: error: static_assert
failed
|       "size_t"
| _CL_STATIC_ASSERT(size_t, sizeof(size_t) == sizeof(void*));
| ^                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| /home/alun/work/pocl-2/include/_kernel.h:76:37: note: expanded from macro
|       '_CL_STATIC_ASSERT'
| #  define _CL_STATIC_ASSERT(_t, _x) _Static_assert(_x, #_t)
|                                     ^
`----

I notice cellspu is avoiding this world with :

    CLANGFLAGS += -target cellspu-v0
    CLANG_DEFAULT_INCLUDES = -include
$(top_builddir)/include/cellspu/types.h

I guess this is the issue of ARM being a host and device target, and I
was wondering what thoughts you had there?

Currently I've tweaked the arm/Makefile.am to :

    KERNEL_TARGET:=arm-linux-gnueabihf
    TARGET_DIR=arm

    EXTRA_CLANGFLAGS:= \
   -mcpu=cortex-a9 \
   -mfloat-abi=softfp \
   -mfpu=neon \
   -mfpu=vfpv3


But this then means that while the kernel-arm-linux-gnueabihf.bc gets
made correctly, the kernel doesn't, so I get:

,----
| WARNING: Linking two modules of different target triples:
| /usr/local/lib/pocl/arm/kernel-arm-linux-gnueabihf.bc:
| 'armv7-linux-gnueabihf' and 'armv4t-linux-gnueabihf'
`----

i.e. I think in addition to the llvm target triple, the device driver
needs someway to set the compilation flags? I wonder if it shouldn't
be something like:

    Usage: $0 [-t <llvm_target_triplet> -f <flags>] -o output input


btw I had to remove this section in pocl-kernel.in:

,----
| #pure clang doesn't allow "-target tce-tut-llvm"
| case $target in
|   tce-*)
|     target_flags="" ;;
|   *)
|     target_flags="-target $target";;
| esac
| @CLANG@ @HOST_CLANG_FLAGS@ $target_flags -c -o ${output_file}.o -x c -
<<EOF
`----

Since I was getting:

,----
| /usr/local/bin/clang -target arm-linux-gnueabihf -c -o
/tmp/poclN2oGr9/newdev/dot_product/descriptor.so.o -x c -
| /usr/bin/as: unrecognized option '-mfloat-abi=softfp'
`----

i.e. I'm not sure we want -target here for a host compilation ?


One last point, I also noticed that configure.ac declares
TARGET_LLC_FLAGS, but it seems to be unused.

--------------------------------------------------------------------------------

thanks for all the good work,

A.

-- 
Alun Evans
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to