[chrono] Re: GPU Module No available contact pair

David Reger Wed, 18 May 2022 09:53:52 -0700

Thanks!
Also, I have another unrelated question. I want to assign particles a group 
ID based on their position when a simulation is first started (starting 
from a checkpoint file) so that I can track the particles in each group as 
the simulation progresses. I just want to dump the group ID’s as a column 
in the particle output file. Could you give some guidance on what files I 
will need to modify to add this functionality?


Thanks!
David

On Tuesday, May 17, 2022 at 9:22:18 PM UTC-6 Ruochun Zhang wrote:

> Hi David,
>
> I vaguely remember CUDA 11.2 was quite a bugged version, for our purposes 
> at least. Maybe we used to have problems with that version too, but I don't 
> recall clearly. Thankfully 11.3 came out soon enough and right now, we are 
> using CUDA 11.6 and having no problem. I'm letting you know this because I 
> don't think you are stuck with CUDA 10, you can give the newest version a 
> try should you be interested.
>
> Thank you,
> Ruochun
>
> On Tuesday, May 17, 2022 at 9:50:13 PM UTC-5 [email protected] wrote:
>
>> Hi Ruochun,
>>
>> It looks like the problem was the cuda version that was used on the 
>> original machine. The machine that was having issues was using cuda 11.2.2, 
>> but the other system was using cuda 10.1.243. After switching the original 
>> problematic machine to 10.1.243, the script ran without issue.
>>
>> Thanks!
>> David
>>
>> On Monday, May 16, 2022 at 7:34:23 PM UTC-6 Ruochun Zhang wrote:
>>
>>> Hi David,
>>>
>>> Glad that worked for you. In general, that "negative SD" problem is that 
>>> particles got out of the simulation "world" somehow, that is usually a 
>>> consequence of unusually large penetrations (and subsequent huge 
>>> velocities). To avoid that, the typical thing to do is reducing the time 
>>> step size and checking that you don't instantiate particles overlapping 
>>> with each other. I know that the GPU execution order will make each DEM 
>>> simulation slightly different from each other, but statistically they 
>>> should be the same, and since I (and you on the second machine) can 
>>> consistently run that script, I don't think this is the cause; it is more 
>>> likely that the operating systems caused the code to compile differently on 
>>> these 2 machines.
>>>
>>> I would be interested in knowing what you find out in the end, it would 
>>> be a help to me.
>>>
>>> Thank you!
>>> Ruochun
>>>
>>> On Monday, May 16, 2022 at 7:40:36 PM UTC-5 [email protected] wrote:
>>>
>>>> Hi Ruochun,
>>>>
>>>> I just tried the script on a different machine using the feature/gpu 
>>>> branch and increasing the max_touched to 20 and the script worked,  so the 
>>>> issue must just be something with the setup on the system I was using. 
>>>> I'll 
>>>> put an update in here once I find out what the differences are between the 
>>>> two machines in case anyone else has a similar issue.
>>>>
>>>> Thanks a lot for your help!
>>>> David
>>>>
>>>> On Monday, May 16, 2022 at 2:47:21 PM UTC-6 David Reger wrote:
>>>>
>>>>> I gave it a try with my original mesh and your new mesh and both gave 
>>>>> the negative local… error around frame 90 still. You’re just using the 
>>>>> chrono version  that is from the repo with the feature/gpu branch, right? 
>>>>> If you haven’t already, could you try a fresh clone of the repo, apply 
>>>>> the 
>>>>> max_touched change, and then run the script to see if it’s successful 
>>>>> just 
>>>>> to make sure that we’re both doing the exact same thing and seeing a 
>>>>> different outcome?
>>>>>
>>>>> Thanks!
>>>>> David
>>>>> On Monday, May 16, 2022 at 2:28:23 PM UTC-6 Ruochun Zhang wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> It's a bit weird, I checked and I almost did not change anything. I 
>>>>>> did comment out line 120~122 (because in your json file you don't have 
>>>>>> rolling friction defined), but I tested adding them back and it affected 
>>>>>> nothing, I can still run it. Are you running it with your original mesh? 
>>>>>> If 
>>>>>> so can you have a try with the mesh I attached in a earlier post let me 
>>>>>> know if it helps? If it does not help, we can go from there; however I'd 
>>>>>> be 
>>>>>> very confused at that point.
>>>>>>
>>>>>> Thank you,
>>>>>> Ruochun
>>>>>>
>>>>>> On Monday, May 16, 2022 at 2:43:17 PM UTC-5 [email protected] wrote:
>>>>>>
>>>>>>> Hi Ruochun,
>>>>>>>
>>>>>>> Sorry, I had made some changes to my script. I redownloaded the 
>>>>>>> original scripts I provided here earlier, and rebuilt chrono with the 
>>>>>>> feature/gpu branch from a fresh repo clone with the touched by sphere 
>>>>>>> change. After doing all of this and running the exact same script that 
>>>>>>> I 
>>>>>>> had uploaded originally, I now got a “negative local pod in SD” error 
>>>>>>> around frame 90. This is a bit strange since you had managed to run 
>>>>>>> that 
>>>>>>> script successfully, and everything was a clean install with the same 
>>>>>>> script that I uploaded, so it should’ve had the same outcome as your 
>>>>>>> run. 
>>>>>>> Did you make any changes to the script/json? 
>>>>>>>
>>>>>>> On Monday, May 16, 2022 at 12:29:58 PM UTC-6 Ruochun Zhang wrote:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Oh sorry before you do that, could you try this: I assume you 
>>>>>>>> cloned Chrono and built from source. Then can you checkout the 
>>>>>>>> *feature/gpu* branch first, then apply the  
>>>>>>>> MAX_SPHERES_TOUCHED_BY_SPHERE change, and then build and try again 
>>>>>>>> with the 
>>>>>>>> script you failed to run initially? I did apply a bug fix in 
>>>>>>>> *feature/gpu* branch and it is probably not in *develop* branch 
>>>>>>>> yet, and I hope to rule out the possibility that this bug was hurting 
>>>>>>>> you.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Ruochun
>>>>>>>>
>>>>>>>> On Monday, May 16, 2022 at 1:23:06 PM UTC-5 Ruochun Zhang wrote:
>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> I am pretty sure that script worked for me until reaching a steady 
>>>>>>>>> state, like in the picture attached. One thing is that I'd be quite 
>>>>>>>>> surprised if MAX_SPHERES_TOUCHED_BY_SPHERE = 200 and the kernels did 
>>>>>>>>> not 
>>>>>>>>> fail to compile... I'd say something like 32 is the maximum that you 
>>>>>>>>> should 
>>>>>>>>> assign it. Maybe you should try something like 30 to see if it works. 
>>>>>>>>> But 
>>>>>>>>> if it still gives the same error, we have to have a look at the 
>>>>>>>>> script. Is 
>>>>>>>>> it still the same script you attached?
>>>>>>>>>
>>>>>>>>> Changing particle sizes has large impact on the physics and, 
>>>>>>>>> "contacts over limit" problem can happen naturally (like in your 
>>>>>>>>> first 
>>>>>>>>> question), or happen as a result of non-physical behavior in the 
>>>>>>>>> simulation, which is often related to improper sim parameters wrt the 
>>>>>>>>> sphere radius. So it's hard to say without context. One thing you 
>>>>>>>>> should do 
>>>>>>>>> is of course, visualize simulation results before the crash and see 
>>>>>>>>> if 
>>>>>>>>> there is something non-physical.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Ruochun
>>>>>>>>>
>>>>>>>>> On Monday, May 16, 2022 at 10:41:03 AM UTC-5 [email protected] 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Actually, it looks like the particle source still isn’t working, 
>>>>>>>>>> even when increasing the MAX_SPHERES_TOUCHED_BY_SPHERE up to 200. 
>>>>>>>>>> The 
>>>>>>>>>> simulation will run for longer, but still fail with the same contact 
>>>>>>>>>> pairs 
>>>>>>>>>> error. Interestingly, it seems like it will fail sooner if I made 
>>>>>>>>>> the 
>>>>>>>>>> particle source radius smaller (fails after 627 pebbles added (step 
>>>>>>>>>> 34) 
>>>>>>>>>> when the source radius is 0.26 and fails after 31499 pebbles added 
>>>>>>>>>> (step 
>>>>>>>>>> 85) when the source radius is 1.1.). Do I still just need to 
>>>>>>>>>> increase the 
>>>>>>>>>> number further or is this a different issue?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> David
>>>>>>>>>> On Monday, May 16, 2022 at 8:55:47 AM UTC-6 David Reger wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruochun,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the help, it seems to be working now! I was able to 
>>>>>>>>>>> get the particle relocation working as well.
>>>>>>>>>>>
>>>>>>>>>>> I am interested in the new solver. Let me know when a 
>>>>>>>>>>> release/test build is available for it, I’d like to try it out to 
>>>>>>>>>>> see if 
>>>>>>>>>>> it’s faster for these applications. 
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On Friday, May 13, 2022 at 3:43:36 PM UTC-6 Ruochun Zhang wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> This issue is a weakness in the default assumption we made that 
>>>>>>>>>>>> a sphere can have at most 12 contacts. This assumption is made to 
>>>>>>>>>>>> save GPU 
>>>>>>>>>>>> memory and to help identify some large-penetration problems in 
>>>>>>>>>>>> simulation 
>>>>>>>>>>>> which is typical with insufficient time step size. This assumption 
>>>>>>>>>>>> is fine 
>>>>>>>>>>>> with near-rigid spherical contacts, but problematic when meshes 
>>>>>>>>>>>> are 
>>>>>>>>>>>> involved (each mesh facet in contact with a sphere eats up one 
>>>>>>>>>>>> slot as 
>>>>>>>>>>>> well). Imagine a sphere sitting on the tip of a needle made of 
>>>>>>>>>>>> mesh, it 
>>>>>>>>>>>> could have contacts with tens of mesh facets, and we haven't 
>>>>>>>>>>>> counted the 
>>>>>>>>>>>> sphere neighbors it can potentially have.
>>>>>>>>>>>>
>>>>>>>>>>>> The fix is easy, please go to the file *ChGpuDefines.h* (in 
>>>>>>>>>>>> chrono\src\chrono_gpu), and replace
>>>>>>>>>>>> *#define MAX_SPHERES_TOUCHED_BY_SPHERE 12*
>>>>>>>>>>>> by
>>>>>>>>>>>> *#define MAX_SPHERES_TOUCHED_BY_SPHERE 20*
>>>>>>>>>>>> or some even larger number if you need it. Rebuild it and your 
>>>>>>>>>>>> script should run fine. Note the error messages are hard-coded to 
>>>>>>>>>>>> say 12 is 
>>>>>>>>>>>> not enough if  *MAX_SPHERES_TOUCHED_BY_SPHERE* is exceeded, so 
>>>>>>>>>>>> if 20 is not enough and you need even more, just change it and do 
>>>>>>>>>>>> not let 
>>>>>>>>>>>> the error messages confuse you.
>>>>>>>>>>>>
>>>>>>>>>>>> Another thing is that it is better to use meshes with 
>>>>>>>>>>>> relatively uniform triangle sizes. I attached a rebuilt mesh based 
>>>>>>>>>>>> on your 
>>>>>>>>>>>> original one. It's optional and does not seem to affect this 
>>>>>>>>>>>> simulation, 
>>>>>>>>>>>> but it's a good practice.
>>>>>>>>>>>>
>>>>>>>>>>>> To answer your other questions: Unfortunately C::GPU does not 
>>>>>>>>>>>> currently have an *efficient* way of streaming particles into 
>>>>>>>>>>>> the system. The method you are using (re-initialization) is 
>>>>>>>>>>>> probably what I 
>>>>>>>>>>>> would do too if I have to. With a problem size similar to yours, 
>>>>>>>>>>>> it should 
>>>>>>>>>>>> be fine. And C::GPU does not have an official API that enforces 
>>>>>>>>>>>> manual 
>>>>>>>>>>>> particle position changes. However this should be fairly 
>>>>>>>>>>>> straightforward to 
>>>>>>>>>>>> implement. The naive approach is of course, do it on the host side 
>>>>>>>>>>>> with a 
>>>>>>>>>>>> for loop. If you care about efficiency, then we should instead add 
>>>>>>>>>>>> one 
>>>>>>>>>>>> custom GPU kernel call at the end of each iteration, that scans 
>>>>>>>>>>>> the z 
>>>>>>>>>>>> coordinates of all particles, and add an offset to them if they 
>>>>>>>>>>>> are below a 
>>>>>>>>>>>> certain value. It would be nice if you can tailor it to your 
>>>>>>>>>>>> needs, but if 
>>>>>>>>>>>> you need help implementing this custom kernel you can let us know 
>>>>>>>>>>>> (it may 
>>>>>>>>>>>> be good to add it as a permanent feature).
>>>>>>>>>>>>
>>>>>>>>>>>> Lastly, I don't know if you are interested or not but in the 
>>>>>>>>>>>> new generation of DEM simulator that we are currently developing, 
>>>>>>>>>>>> apart 
>>>>>>>>>>>> from supporting non-trivial particle geometries, there will be 
>>>>>>>>>>>> *efficient* ways to do both things (sleeper and active 
>>>>>>>>>>>> entities; periodic boundary with no extra cost). It is not out 
>>>>>>>>>>>> yet, however.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Ruochun
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, May 12, 2022 at 10:47:27 PM UTC-5 
>>>>>>>>>>>> [email protected] wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have been working on trying to use the GPU module in project 
>>>>>>>>>>>>> chrono to fill a vessel with spherical particles. I have been 
>>>>>>>>>>>>> able to 
>>>>>>>>>>>>> successfully do so by using the method in the demos of generating 
>>>>>>>>>>>>> particle 
>>>>>>>>>>>>> sheets and allowing them to settle in the vessel. I have 
>>>>>>>>>>>>> recently, however, 
>>>>>>>>>>>>> been attempting to fill the vessel with a "particle source" 
>>>>>>>>>>>>> method that 
>>>>>>>>>>>>> continuously streams particles into the domain until a certain 
>>>>>>>>>>>>> number of 
>>>>>>>>>>>>> particles is reached. I am unsure if this method is officially 
>>>>>>>>>>>>> supported 
>>>>>>>>>>>>> with the GPU module, and I am encountering crash that 
>>>>>>>>>>>>> continuously seems to 
>>>>>>>>>>>>> happen. I receive the error *No available contact pair slots 
>>>>>>>>>>>>> for body # and body # *after the simulation has progressed. 
>>>>>>>>>>>>> It seems to occur sometime after the particles hit the bottom of 
>>>>>>>>>>>>> the 
>>>>>>>>>>>>> vessel. I have tried reducing my timestep, reducing the "flow 
>>>>>>>>>>>>> rate" of 
>>>>>>>>>>>>> incoming particles, changing the height of the particle inflow, 
>>>>>>>>>>>>> and 
>>>>>>>>>>>>> altering some stiffness/damping constants, but this error seems 
>>>>>>>>>>>>> to always 
>>>>>>>>>>>>> happen soon after the particles make contact with the vessel. I 
>>>>>>>>>>>>> have 
>>>>>>>>>>>>> attached my input files, any help would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> An unrelated question, but does the GPU module support the 
>>>>>>>>>>>>> changing of particle positions during the simulation (i.e. taking 
>>>>>>>>>>>>> all 
>>>>>>>>>>>>> particles below a certain z and moving them to the top to 
>>>>>>>>>>>>> "recycle" them 
>>>>>>>>>>>>> continuously during the simulation)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/380d22af-e63d-48ff-8a03-a3d977b7c9abn%40googlegroups.com.

[chrono] Re: GPU Module No available contact pair

Reply via email to