[chrono] Re: GPU Module No available contact pair

Ruochun Zhang Tue, 17 May 2022 20:22:22 -0700

Hi David,

I vaguely remember CUDA 11.2 was quite a bugged version, for our purposes 
at least. Maybe we used to have problems with that version too, but I don't 
recall clearly. Thankfully 11.3 came out soon enough and right now, we are 
using CUDA 11.6 and having no problem. I'm letting you know this because I 
don't think you are stuck with CUDA 10, you can give the newest version a 
try should you be interested.


Thank you,
Ruochun

On Tuesday, May 17, 2022 at 9:50:13 PM UTC-5 [email protected] wrote:

> Hi Ruochun,
>
> It looks like the problem was the cuda version that was used on the 
> original machine. The machine that was having issues was using cuda 11.2.2, 
> but the other system was using cuda 10.1.243. After switching the original 
> problematic machine to 10.1.243, the script ran without issue.
>
> Thanks!
> David
>
> On Monday, May 16, 2022 at 7:34:23 PM UTC-6 Ruochun Zhang wrote:
>
>> Hi David,
>>
>> Glad that worked for you. In general, that "negative SD" problem is that 
>> particles got out of the simulation "world" somehow, that is usually a 
>> consequence of unusually large penetrations (and subsequent huge 
>> velocities). To avoid that, the typical thing to do is reducing the time 
>> step size and checking that you don't instantiate particles overlapping 
>> with each other. I know that the GPU execution order will make each DEM 
>> simulation slightly different from each other, but statistically they 
>> should be the same, and since I (and you on the second machine) can 
>> consistently run that script, I don't think this is the cause; it is more 
>> likely that the operating systems caused the code to compile differently on 
>> these 2 machines.
>>
>> I would be interested in knowing what you find out in the end, it would 
>> be a help to me.
>>
>> Thank you!
>> Ruochun
>>
>> On Monday, May 16, 2022 at 7:40:36 PM UTC-5 [email protected] wrote:
>>
>>> Hi Ruochun,
>>>
>>> I just tried the script on a different machine using the feature/gpu 
>>> branch and increasing the max_touched to 20 and the script worked,  so the 
>>> issue must just be something with the setup on the system I was using. I'll 
>>> put an update in here once I find out what the differences are between the 
>>> two machines in case anyone else has a similar issue.
>>>
>>> Thanks a lot for your help!
>>> David
>>>
>>> On Monday, May 16, 2022 at 2:47:21 PM UTC-6 David Reger wrote:
>>>
>>>> I gave it a try with my original mesh and your new mesh and both gave 
>>>> the negative local… error around frame 90 still. You’re just using the 
>>>> chrono version  that is from the repo with the feature/gpu branch, right? 
>>>> If you haven’t already, could you try a fresh clone of the repo, apply the 
>>>> max_touched change, and then run the script to see if it’s successful just 
>>>> to make sure that we’re both doing the exact same thing and seeing a 
>>>> different outcome?
>>>>
>>>> Thanks!
>>>> David
>>>> On Monday, May 16, 2022 at 2:28:23 PM UTC-6 Ruochun Zhang wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>> It's a bit weird, I checked and I almost did not change anything. I 
>>>>> did comment out line 120~122 (because in your json file you don't have 
>>>>> rolling friction defined), but I tested adding them back and it affected 
>>>>> nothing, I can still run it. Are you running it with your original mesh? 
>>>>> If 
>>>>> so can you have a try with the mesh I attached in a earlier post let me 
>>>>> know if it helps? If it does not help, we can go from there; however I'd 
>>>>> be 
>>>>> very confused at that point.
>>>>>
>>>>> Thank you,
>>>>> Ruochun
>>>>>
>>>>> On Monday, May 16, 2022 at 2:43:17 PM UTC-5 [email protected] wrote:
>>>>>
>>>>>> Hi Ruochun,
>>>>>>
>>>>>> Sorry, I had made some changes to my script. I redownloaded the 
>>>>>> original scripts I provided here earlier, and rebuilt chrono with the 
>>>>>> feature/gpu branch from a fresh repo clone with the touched by sphere 
>>>>>> change. After doing all of this and running the exact same script that I 
>>>>>> had uploaded originally, I now got a “negative local pod in SD” error 
>>>>>> around frame 90. This is a bit strange since you had managed to run that 
>>>>>> script successfully, and everything was a clean install with the same 
>>>>>> script that I uploaded, so it should’ve had the same outcome as your 
>>>>>> run. 
>>>>>> Did you make any changes to the script/json? 
>>>>>>
>>>>>> On Monday, May 16, 2022 at 12:29:58 PM UTC-6 Ruochun Zhang wrote:
>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Oh sorry before you do that, could you try this: I assume you cloned 
>>>>>>> Chrono and built from source. Then can you checkout the 
>>>>>>> *feature/gpu* branch first, then apply the  
>>>>>>> MAX_SPHERES_TOUCHED_BY_SPHERE change, and then build and try again with 
>>>>>>> the 
>>>>>>> script you failed to run initially? I did apply a bug fix in 
>>>>>>> *feature/gpu* branch and it is probably not in *develop* branch 
>>>>>>> yet, and I hope to rule out the possibility that this bug was hurting 
>>>>>>> you.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ruochun
>>>>>>>
>>>>>>> On Monday, May 16, 2022 at 1:23:06 PM UTC-5 Ruochun Zhang wrote:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> I am pretty sure that script worked for me until reaching a steady 
>>>>>>>> state, like in the picture attached. One thing is that I'd be quite 
>>>>>>>> surprised if MAX_SPHERES_TOUCHED_BY_SPHERE = 200 and the kernels did 
>>>>>>>> not 
>>>>>>>> fail to compile... I'd say something like 32 is the maximum that you 
>>>>>>>> should 
>>>>>>>> assign it. Maybe you should try something like 30 to see if it works. 
>>>>>>>> But 
>>>>>>>> if it still gives the same error, we have to have a look at the 
>>>>>>>> script. Is 
>>>>>>>> it still the same script you attached?
>>>>>>>>
>>>>>>>> Changing particle sizes has large impact on the physics and, 
>>>>>>>> "contacts over limit" problem can happen naturally (like in your first 
>>>>>>>> question), or happen as a result of non-physical behavior in the 
>>>>>>>> simulation, which is often related to improper sim parameters wrt the 
>>>>>>>> sphere radius. So it's hard to say without context. One thing you 
>>>>>>>> should do 
>>>>>>>> is of course, visualize simulation results before the crash and see if 
>>>>>>>> there is something non-physical.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Ruochun
>>>>>>>>
>>>>>>>> On Monday, May 16, 2022 at 10:41:03 AM UTC-5 [email protected] 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Actually, it looks like the particle source still isn’t working, 
>>>>>>>>> even when increasing the MAX_SPHERES_TOUCHED_BY_SPHERE up to 200. The 
>>>>>>>>> simulation will run for longer, but still fail with the same contact 
>>>>>>>>> pairs 
>>>>>>>>> error. Interestingly, it seems like it will fail sooner if I made the 
>>>>>>>>> particle source radius smaller (fails after 627 pebbles added (step 
>>>>>>>>> 34) 
>>>>>>>>> when the source radius is 0.26 and fails after 31499 pebbles added 
>>>>>>>>> (step 
>>>>>>>>> 85) when the source radius is 1.1.). Do I still just need to increase 
>>>>>>>>> the 
>>>>>>>>> number further or is this a different issue?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> David
>>>>>>>>> On Monday, May 16, 2022 at 8:55:47 AM UTC-6 David Reger wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ruochun,
>>>>>>>>>>
>>>>>>>>>> Thanks for the help, it seems to be working now! I was able to 
>>>>>>>>>> get the particle relocation working as well.
>>>>>>>>>>
>>>>>>>>>> I am interested in the new solver. Let me know when a 
>>>>>>>>>> release/test build is available for it, I’d like to try it out to 
>>>>>>>>>> see if 
>>>>>>>>>> it’s faster for these applications. 
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On Friday, May 13, 2022 at 3:43:36 PM UTC-6 Ruochun Zhang wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> This issue is a weakness in the default assumption we made that 
>>>>>>>>>>> a sphere can have at most 12 contacts. This assumption is made to 
>>>>>>>>>>> save GPU 
>>>>>>>>>>> memory and to help identify some large-penetration problems in 
>>>>>>>>>>> simulation 
>>>>>>>>>>> which is typical with insufficient time step size. This assumption 
>>>>>>>>>>> is fine 
>>>>>>>>>>> with near-rigid spherical contacts, but problematic when meshes are 
>>>>>>>>>>> involved (each mesh facet in contact with a sphere eats up one slot 
>>>>>>>>>>> as 
>>>>>>>>>>> well). Imagine a sphere sitting on the tip of a needle made of 
>>>>>>>>>>> mesh, it 
>>>>>>>>>>> could have contacts with tens of mesh facets, and we haven't 
>>>>>>>>>>> counted the 
>>>>>>>>>>> sphere neighbors it can potentially have.
>>>>>>>>>>>
>>>>>>>>>>> The fix is easy, please go to the file *ChGpuDefines.h* (in 
>>>>>>>>>>> chrono\src\chrono_gpu), and replace
>>>>>>>>>>> *#define MAX_SPHERES_TOUCHED_BY_SPHERE 12*
>>>>>>>>>>> by
>>>>>>>>>>> *#define MAX_SPHERES_TOUCHED_BY_SPHERE 20*
>>>>>>>>>>> or some even larger number if you need it. Rebuild it and your 
>>>>>>>>>>> script should run fine. Note the error messages are hard-coded to 
>>>>>>>>>>> say 12 is 
>>>>>>>>>>> not enough if  *MAX_SPHERES_TOUCHED_BY_SPHERE* is exceeded, so 
>>>>>>>>>>> if 20 is not enough and you need even more, just change it and do 
>>>>>>>>>>> not let 
>>>>>>>>>>> the error messages confuse you.
>>>>>>>>>>>
>>>>>>>>>>> Another thing is that it is better to use meshes with relatively 
>>>>>>>>>>> uniform triangle sizes. I attached a rebuilt mesh based on your 
>>>>>>>>>>> original 
>>>>>>>>>>> one. It's optional and does not seem to affect this simulation, but 
>>>>>>>>>>> it's a 
>>>>>>>>>>> good practice.
>>>>>>>>>>>
>>>>>>>>>>> To answer your other questions: Unfortunately C::GPU does not 
>>>>>>>>>>> currently have an *efficient* way of streaming particles into 
>>>>>>>>>>> the system. The method you are using (re-initialization) is 
>>>>>>>>>>> probably what I 
>>>>>>>>>>> would do too if I have to. With a problem size similar to yours, it 
>>>>>>>>>>> should 
>>>>>>>>>>> be fine. And C::GPU does not have an official API that enforces 
>>>>>>>>>>> manual 
>>>>>>>>>>> particle position changes. However this should be fairly 
>>>>>>>>>>> straightforward to 
>>>>>>>>>>> implement. The naive approach is of course, do it on the host side 
>>>>>>>>>>> with a 
>>>>>>>>>>> for loop. If you care about efficiency, then we should instead add 
>>>>>>>>>>> one 
>>>>>>>>>>> custom GPU kernel call at the end of each iteration, that scans the 
>>>>>>>>>>> z 
>>>>>>>>>>> coordinates of all particles, and add an offset to them if they are 
>>>>>>>>>>> below a 
>>>>>>>>>>> certain value. It would be nice if you can tailor it to your needs, 
>>>>>>>>>>> but if 
>>>>>>>>>>> you need help implementing this custom kernel you can let us know 
>>>>>>>>>>> (it may 
>>>>>>>>>>> be good to add it as a permanent feature).
>>>>>>>>>>>
>>>>>>>>>>> Lastly, I don't know if you are interested or not but in the new 
>>>>>>>>>>> generation of DEM simulator that we are currently developing, apart 
>>>>>>>>>>> from 
>>>>>>>>>>> supporting non-trivial particle geometries, there will be 
>>>>>>>>>>> *efficient* ways to do both things (sleeper and active 
>>>>>>>>>>> entities; periodic boundary with no extra cost). It is not out yet, 
>>>>>>>>>>> however.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Ruochun
>>>>>>>>>>>
>>>>>>>>>>> On Thursday, May 12, 2022 at 10:47:27 PM UTC-5 [email protected] 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I have been working on trying to use the GPU module in project 
>>>>>>>>>>>> chrono to fill a vessel with spherical particles. I have been able 
>>>>>>>>>>>> to 
>>>>>>>>>>>> successfully do so by using the method in the demos of generating 
>>>>>>>>>>>> particle 
>>>>>>>>>>>> sheets and allowing them to settle in the vessel. I have recently, 
>>>>>>>>>>>> however, 
>>>>>>>>>>>> been attempting to fill the vessel with a "particle source" method 
>>>>>>>>>>>> that 
>>>>>>>>>>>> continuously streams particles into the domain until a certain 
>>>>>>>>>>>> number of 
>>>>>>>>>>>> particles is reached. I am unsure if this method is officially 
>>>>>>>>>>>> supported 
>>>>>>>>>>>> with the GPU module, and I am encountering crash that continuously 
>>>>>>>>>>>> seems to 
>>>>>>>>>>>> happen. I receive the error *No available contact pair slots 
>>>>>>>>>>>> for body # and body # *after the simulation has progressed. It 
>>>>>>>>>>>> seems to occur sometime after the particles hit the bottom of the 
>>>>>>>>>>>> vessel. I 
>>>>>>>>>>>> have tried reducing my timestep, reducing the "flow rate" of 
>>>>>>>>>>>> incoming 
>>>>>>>>>>>> particles, changing the height of the particle inflow, and 
>>>>>>>>>>>> altering some 
>>>>>>>>>>>> stiffness/damping constants, but this error seems to always happen 
>>>>>>>>>>>> soon 
>>>>>>>>>>>> after the particles make contact with the vessel. I have attached 
>>>>>>>>>>>> my input 
>>>>>>>>>>>> files, any help would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> An unrelated question, but does the GPU module support the 
>>>>>>>>>>>> changing of particle positions during the simulation (i.e. taking 
>>>>>>>>>>>> all 
>>>>>>>>>>>> particles below a certain z and moving them to the top to 
>>>>>>>>>>>> "recycle" them 
>>>>>>>>>>>> continuously during the simulation)?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/cb370619-2581-44d4-ba58-9f70e6475ff6n%40googlegroups.com.

[chrono] Re: GPU Module No available contact pair

Reply via email to