I have spent the past week or so making pocl build on ARMv6, i.e. a Raspberry Pi. The basic problem I encountered is the following:
The target triple is insufficient in describing what code should be generated. It does not describe all the features that a CPU provides. Some people call this the "hardware floating point problem", although this is something of a misnomer. Trying to support hardware floating point operations triggers this problem, but the problem is not intrinsically about hardware floating point operations. This is already somewhat a problem e.g. on Intel, where the triple does not specify e.g. whether AVX instructions are available. On Intel, this may lead to sub-optimal code, which is not something that one would immediately observe -- one typically has to disassemble the generated kernel code, and most people don't do that. On ARM, different CPU types offer vastly different features. For performance reasons, ARM offers several incompatible ABIs. Unfortunately, the target triple does not choose the ABI! The reason is somewhat indirect -- although the ABI is actually specified in the target triple, llvm will ignore this (!) unless one also specifies a CPU type that has sufficient features to use this ABI. Otherwise, llvm will generate code for a "basic" CPU, which may lack features, and will then (silently!) switch over to a different ABI. I would consider this a design bug in llvm, but that's what we have. (In my case, host and target are the same, and the default target triple and CPU type that llc uses are already what I want. Still, since pocl explicitly specifies (the very same) target triple, llvm stops using the default CPU type and uses a more basic CPU type that does not actually support the ABI specified in this target triple...) To remedy this, one needs to specify -march= or -mcpu= in various places. I have not yet determined the minimum set of such options that would lead to correct and/or efficient code, but at the very least, the llc invocation in devices/common.c seems to be affected. To add insult to injury, different llvm tools use different option names (why?) to specify target triple: the clang front-end uses -target, whereas llc uses -mtriple. Finding the list of available triples / CPU types / architecture attributes is also adventurous. I think the following is necessary for pocl: When configuring a target, we should not look for a target triple, but should rather accept a set of options that may include CPU type, FPU type, ABI specification, and architecture attributes as well. This would likely also improve performance on other architectures, such as x86_64. We may need to ask for this information twice, since clang and llc expect it in different form, but maybe we can also translate this ourselves. Of course, at the moment, configuring host and target is a bit of a mess, and it's not clear which configuration (environment) variable is used under what circumstances; a bit of clean up here would help. -erik -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/ My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu/.
signature.asc
Description: Message signed with OpenPGP using GPGMail
------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
