Hello Noah,

> I would think a better default location for the pocl cache (linux) would be 
> derived from $TMPDIR rather than $HOME.

Having it on /tmp on many systems makes the cache non-persistent, which kind of 
defeats the purpose of having a cache in the first place... perhaps there is a 
more suitable place, but i'm not aware of it.

> I wonder if sys::fs::createUniqueFile()  is not so unique after-all at this 
> scale?  Could this lead to a sort of race between the create and 
> open(exclusive)...?

I'm 99.9% sure it's unique. I'm not sure what race you have in mind, but IIRC 
LLVM just appends a random string to a template filename, then tries 
open(O_CREAT | O_EXCL), and repeats if that fails. Pocl then closes the 
descriptor and hands over the filename to Clang's preprocessor. It's possible 
Clang removes the file before re-opening to write into it, or there is 
something else going on which triggers a bug.

Regards,
-- mb

________________________________
From: Noah Reddell <[email protected]>
Sent: Friday, December 28, 2018 12:30:01 AM
To: Portable Computing Language development discussion
Subject: Re: [pocl-devel] intermittent clang ComputeLineNumbers SegFault

Hi Michal,
    Thank you for the suggestion of POCL_CACHE_DIR.   Setting this to a tmps 
unique to each compute node immediately worked around the issue.
    I can now reliably run my application.
    On most Cray systems, $HOME is a DFS mount when mounted on compute nodes.  
I'm sure there are many similarities from DFS to NFS.

    I would think a better default location for the pocl cache (linux) would be 
derived from $TMPDIR rather than $HOME.

   I wonder if sys::fs::createUniqueFile()  is not so unique after-all at this 
scale?  Could this lead to a sort of race between the create and 
open(exclusive)...?

Cheers,

Noah





On Thu, Dec 27, 2018 at 11:14 AM Michal Babej 
<[email protected]<mailto:[email protected]>> wrote:

Hello,


> Is pocl or clang trying to write anything to the working directory?  In my 
> restricted case, /tmp is private to each compute node and thus each process.


Not to the working directory (AFAIK, i haven't inspected the entire Clang 
codebase), but pocl writes to its own cache directory, which by default is 
$HOME/.cache/pocl/kcache; you can change it to a different directory by setting 
the POCL_CACHE_DIR env variable.


IIRC there have been some issues before, when people had the cache dir located 
on NFS shares; is that your case (is your $HOME shared) ? You could try 
pointing POCL_CACHE_DIR to /tmp/pocl_cache and see if it makes the problem go 
away. It's possible pocl / Clang makes some assumption about filesystem which 
does not hold for NFS.


In the backtrace you pasted, it seems it's crashing in the preprocessing phase. 
Here pocl writes to a temporary file created by LLVM's 
sys::fs::createUniqueFile() which in turn uses open() with exclusive flag on a 
randomized  path.


Regards,

-- mb

________________________________
From: Noah Reddell 
<[email protected]<mailto:noah.reddell%[email protected]>>
Sent: Saturday, December 22, 2018 12:09:55 AM
To: [email protected]<mailto:[email protected]>
Subject: [pocl-devel] intermittent clang ComputeLineNumbers SegFault

Hi,

      I figured it is about time I give pocl a try with my physics simulation 
code.   I've been using Intel's OpenCL library for computing on Cray systems 
with Xeon CPU.
       Today I built pocl (today's git master ) on a Cray XC40 using 
clang+llvm-7.0.0-x86_64-linux-sles12.3
       I was able to run a simple Hello World kernel as well as clinfo.   When 
running my physics application at necessary scale, I'm seeing about 0.2% of 
clBuildProgram fail by SEGFAULT, all with a common stack signature. (pasted 
below)
       I'm not sure why this would be so intermittent.  I've tried reducing to 
one process per compute node, so only one clBuildProgram would be executing on 
that node at a time.  In this testing, that leaves 90 processes doing the same 
program compile simultaneously in the same working directory.   Is pocl or 
clang trying to write anything to the working directory?  In my restricted 
case, /tmp is private to each compute node and thus each process.
     Google-ing for similar stack language, I find one mention that may well be 
the same bug:
https://www.mail-archive.com/[email protected]/msg28677.html
https://bugs.llvm.org/show_bug.cgi?id=39833

    "poclcc" is successful with the same OpenCL kernel source.  I assume I'd 
need to run it hundreds of times, perhaps in parallel to potentially trigger 
the same bug.

      Any advice would be appreciated.  Now that I've thought through the 
situation, I think I should probably create an account and contribute to the 
LLVM bug 39833 discussion with a me-too.

Cheers,

Noah Reddell


  WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, 
boost::shared_ptr<WmComputeAssignment const>, 
std::vector<boost::shared_ptr<WmSubDomain const>, 
std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, 
WmComputeMachine&)@wmresidentpatchprocessor.cc:358
  [email protected]:37
  compile_and_link_program@pocl_build.c:624
  pocl_llvm_build_program@pocl_llvm_build.cc:489
  clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07
  clang::FrontendAction::Execute()@0x2aaaabf1c106
  clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328
  clang::DoPrintPreprocessedInput(clang::Preprocessor&, llvm::raw_ostream*, 
clang::PreprocessorOutputOptions const&)@0x2aaaabf51226
  clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc
  clang::Preprocessor::EnterSourceFile(clang::FileID, clang::DirectoryLookup 
const*, clang::SourceLocation)@0x2aaaacbf7407
  (anonymous 
namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, 
clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, 
clang::FileID)@0x2aaaabf5212d
  clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) 
const@0x2aaaacc4e00e
  clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) 
const@0x2aaaacc4e43a
  ComputeLineNumbers(clang::DiagnosticsEngine&, clang::SrcMgr::ContentCache*, 
llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, 
clang::SourceManager const&, bool&)@0x2aaaacc4e683



_______________________________________________
pocl-devel mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/pocl-devel
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to