Thanks. That’s true, but there are cases (like making BLAS calls) where I have 
to nest more than 4 `withUnsafeMutable…` closures. It’s safe by really clumsy. 
I just wish there were a cleaner way that looks like the do-notation in Haskell 
or for-notation in Scala.

-Richard

> On Dec 16, 2016, at 14:16, Joe Groff <jgr...@apple.com> wrote:
> 
>> 
>> On Dec 16, 2016, at 12:10 PM, Richard Wei <rxr...@gmail.com> wrote:
>> 
>> `sync` is not escaping. Shadow copy causes the device memory to make a copy, 
>> which can’t be a solution either. I’ll file a radar.
>> 
>>> Note that, independently, this part looks fishy:
>>> 
>>>> try! fill<<<(blockSize, blockCount)>>>[
>>>>          .pointer(to: &self)
>>> 
>>> UnsafePointers formed by passing an argument inout are not valid after the 
>>> called function returns, so if this function is forming and returning a 
>>> pointer, it will likely result in undefined behavior. You should use 
>>> `withUnsafePointer(&self) { p in ... }` to get a usable pointer.
>> 
>> This part is a semi-"type safe” wrapper to CUDA kernel launcher. The purpose 
>> is to make explicit which argument gets passed to the kernel as a pointer; 
>> in this case, self is a `DeviceArray`. `.pointer` is a factory method under 
>> `KernelArgument`. Since most arguments to the kernel are passed in by 
>> pointers, IMO using a bunch of `withUnsafe...` clauses would only make it 
>> look unnecessarily clumsy.
> 
> Unfortunately, that's the only defined way to write this code. The pointer 
> you get as an argument from `&self` is only good until pointer(to:) returns, 
> so it won't be guaranteed to be valid by the time you use it, and the 
> compiler will assume that `self` doesn't change afterward, so any mutation 
> done to `self` through that pointer will lead to undefined behavior. 
> Rewriting this code to use withUnsafePointer should also work around the 
> capture bug without requiring a shadow copy:
> 
>   let blockSize = min(512, count)
>   let blockCount = (count+blockSize-1)/blockSize
>   withUnsafePointer(to: &self) { selfPtr in 
>     device.sync { // Launch CUDA kernel
>         try! fill<<<(blockSize, blockCount)>>>[
>             .pointer(to: selfPtr), .value(value), .value(Int64(count))
>         ]
>     }
>  }
> 
> -Joe

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Reply via email to