[chrono] Re: DEME-Engine single sphere drop performance

Ruochun Zhang Fri, 22 Mar 2024 02:29:39 -0700

Hi Yves,

About that you have to create inactive particles at dummy locations, and 
the scaling is not the most ideal, could both be remedied when a mechanism 
that fully freezes a family (not just disable contact and fix in position) 
is implemented. Like I said last time, I haven't got the time to do that. 
But with what we have now, you are doing this correctly.


To understand how the performance can be improved, we have to understand 
the time spent on each sub-task. There are a few possibilities that I can 
think of just by looking at the script:
1. If *dt *is extremely small (close to the step size length), then perhaps 
most of the runtime is spent on changing families. We can do very little if 
the physics of the problem calls for that.
2. But I guess *dt *is in fact some meaningful amount of time (say 0.05s). 
In that case, the first concern is that by adding spheres one by one, this 
makes an extremely long simulation, thousands of seconds. What we can do to 
improve such a long simulation is limited since there is a limit on how 
much we can optimize the step size. You first should think if you need to 
do this in the first place, or at least make sure you just do this once, 
and save the simulation status for future use after this one time is done. 
Then, in this case, the majority of runtime should be on *DoDynamics(dt)*. 
We need to know the load balance (CollaborationStats and TimingStats). It 
might be that the contact detection is so heavy but the number of contacts 
is so low, dT is essentially always waiting for kT. The first 50k or so 
particles do not scale anyway (due to using GPU), then past that, the 
scaling is probably more affected by the load balance.
3. I assume by scaling you meant, say right now there are 100k particles in 
the simulation, then at the same time, having 400k yet-to-be-inserted 
particles makes the simulation run roughly twice as slow as having 150k 
yet-to-be-inserted particles. Let's make sure we are saying the same thing, 
because well, obviously the entire simulation scales with *N_target *because 
you have that many for loops.

Thank you,
Ruochun

On Tuesday, March 19, 2024 at 10:39:42 PM UTC+8 [email protected] wrote:

> Hi Ruochun,
>
> Based on that discussion, I developed a routine that creates the target 
> number of spheres (say, 500,000), puts them into an "idle" family which 
> should not interact with the environment, and then converts them one by one 
> to fill the core. 
> Here is a simple snap of an example:
>
> # Dummy input
> World = {'half_width':5, 'height':20, 'family':0}
> sphere = {'radius':0.05, 'family': 1, 'idle_family':2, 'N_target':500000}
> sphere['template'] = DEMSim.LoadSphereType(sphere['mass'], sphere['radius'], 
> material)
>
>
> # Set an idle family which does not interact with the environment
> DEMSim.DisableFamilyOutput(sphere['idle_family'])
> DEMSim.SetFamilyFixed(sphere['idle_family'])
> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
> 'idle_family'])
> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
> 'family'])
> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], World['family'
> ])
>
> # Add spheres before initializing
> dummy_pos = [[np.random.uniform(-World['half_width'] + sphere['radius'], 
> World['half_width'] - sphere['radius']), np.random.uniform(-World[
> 'half_width'] + sphere['radius'], World['half_width'] - sphere['radius']), 
> 0] for _ in range(sphere['N_stored'])]
> dummy_vel = [[0.,0.,0.]] * sphere['N_stored']
> sphere['object'] = DEMSim.AddClumps(sphere['template'], dummy_pos)
> sphere['object'].SetVel(dummy_vel)
> sphere['object'].SetFamilies([sphere['idle_family']]*sphere['N_target'])
> sphere['tracker'] = DEMSim.Track(sphere['object'])
>
> # Initialize
> sphere['N_inserted'] = 0
> DEMSim.Initialize()
>
> # Run and insert spheres at the top of the geometry when possible
> while sphere['N_inserted'] < sphere['N_target']:
> if can_insert():
> sphere['N_inserted']
> sphere['tracker'].SetPos(pos=[0,0, World['height']-sphere['radius']], 
> offset=sphere['N_inserted'])
> sphere['tracker'].SetVel(vel=[0,0,0], offset=sphere['N_inserted'])
> sphere['tracker'].SetFamily(fam_num=sphere['family'], offset=sphere[
> 'N_inserted'])
> sphere['N_inserted'] += 1
> DEMSim.DoDynamics(dt)
>
> As you can see, I have to put the spheres in dummy dispersed positions 
> otherwise I get an error about having too many spheres in the same bin, but 
> is it the way to do it for me?
> Then, time-wise, each step is faster than before (thanks to the removal of 
> UpdateClumps) but still quite slow, and scales linearly with the number of 
> target spheres and not really the number of currently inserted spheres.
> Am I doing this wrong?
>
> Thank you
>
> On Saturday, February 3, 2024 at 7:21:11 AM UTC-5 Ruochun Zhang wrote:
>
>> Hi Yves,
>>
>> In terms of further simplifying the computation, my understanding is that 
>> if the scale of your simulation is around 50,000 or 100,000 particles, then 
>> saving time by partially "relaxing" the simulation domain is probably not 
>> necessary. This is because the number of bodies is low anyway, and further 
>> reducing the "effective" number of active simulation bodies might further 
>> blur the performance edge of a GPU-based tool. However, letting the 
>> simulation cover a longer simulation time using fewer time steps should 
>> always help.
>>
>> I feel the best approach is to dynamically select the time step size. If 
>> you know during certain periods of the simulation, everything is relatively 
>> "dormant", then you can use large time step sizes during it, using the 
>> method *UpdateStepSize*. You can change it back using the same method if 
>> you believe a collision that requires fine time steps to resolve is about 
>> to happen.
>>
>> If you still wish to "relax" a subset of the clumps in the simulation, 
>> then perhaps family-based magics are the way to go. If you believe some 
>> clumps are effectively fixed in place during a period, then you can again, 
>> freeze them using the approach I discussed above. This indeed saves time 
>> because those clumps will simply not have contacts among themselves. You 
>> could also massage the material associated with a subset of the clumps 
>> using the method *SetFamilyClumpMaterial*. However, I have to mention 
>> that different material properties hardly make any impact on computational 
>> efficiency. Soft materials with more damping could allow for a more lenient 
>> time step size selection, but the step size is still determined by the 
>> "harshest" contact that you have to resolve.
>>
>> The ultimate tool is of course the custom force model. If you can design 
>> a model that is fast to solve and accurate enough for you, and potentially 
>> resolves different parts of the simulation domain differently like you 
>> wished, that's probably the best. For a starter, if you do not need 
>> friction, then try calling *UseFrictionlessHertzianModel()* before 
>> system initialization to use the frictionless Hertzian contact model. And 
>> you can develop even cheaper and more specific models after that.
>>
>> Thank you,
>> Ruochun
>> On Friday, February 2, 2024 at 11:31:02 PM UTC+8 [email protected] 
>> wrote:
>>
>>> Hello Ruochun,
>>>
>>> Thank you for your answer.
>>>
>>> That makes a lot of sense, especially since, in my case, I know how many 
>>> I need from the beginning.
>>> Your proposed method is quite smart; I will try to implement it.  I will 
>>> run some tests and come back here to report the difference.
>>>
>>> Something else I was also wondering is there any way to kind of "relax" 
>>> the problem in some parts of the geometry? The bottom of the geometry will 
>>> not see large velocities and strong changes once few spheres have covered 
>>> it, and that applies to the layers above later in the simulation.
>>> If that is a possibility somehow, I am expecting this to be a large time 
>>> saver as well.
>>>
>>> Thank you,
>>> Yves
>>>
>>> On Thursday, February 1, 2024 at 3:35:11 AM UTC-5 Ruochun Zhang wrote:
>>>
>>>> Hi Yves,
>>>>
>>>> I only had a brief look at the script. So what you needed is to add 
>>>> more spherical particles into the simulation, one by one, and I assume you 
>>>> need to do this thousands of times.
>>>>
>>>> The problem is that adding clumps, or say *UpdateClumps()*, is not 
>>>> designed to be called too frequently, and it's really for adding a big 
>>>> batch of clumps. When you call it, you need to sync the threads (perhaps 
>>>> the cost of one round of contact detection), then the system goes through 
>>>> a 
>>>> process that is similar to initialization (no just-in-time compilation, 
>>>> but 
>>>> still a lot of memory accesses). Although I would expect it to be better 
>>>> than what you measured (6.2s), maybe you also included the time needed to 
>>>> advance a frame in between---I didn't look into that much detail.
>>>>
>>>> In any case, it's much better to get rid of adding clumps. If you know 
>>>> how many you will have to add eventually, then initialize the system with 
>>>> them in, but frozen (in a family that is fixed and has contacts disabled 
>>>> with all other families). Track these clumps using a tracker (or more 
>>>> trackers, if you want). Then each time you need to add a clump, use this 
>>>> tracker to change a clump in this family (using offset, starting from 
>>>> offset 0, then moving on to 1, 2... each time) to be in a different family 
>>>> so it becomes an "active" simulation object. Potentially, you can SetPos 
>>>> this clump before activating it. This should be much more efficient, as a 
>>>> known-sized simulation should be. As for material properties, I don't 
>>>> think 
>>>> they have significant effects here.
>>>>
>>>> Let me know if there is any difficulty implementing it,
>>>> Ruochun
>>>>
>>>> On Wednesday, January 31, 2024 at 1:27:17 AM UTC+8 [email protected] 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am working on a problem which involves dropping one sphere at a time 
>>>>> in a geometry from its top in DEME-Engine. The geometry can have multiple 
>>>>> hundreds of thousands of spheres poured in it, so I would need something 
>>>>> efficient. The constraint is that I have to always drop the sphere with a 
>>>>> null velocity from the same spot.
>>>>>
>>>>> The problem I have is that it is very slow.
>>>>>
>>>>> I made an example attached, where I fast-forward to 50,000 spheres in 
>>>>> the geometry, then drop them one by one. When measuring the performance 
>>>>> (see log attached), I obtain something like 6.2 seconds per drop. The 
>>>>> overhead I measured, when starting from 0, was ~0.2s, so it gives 
>>>>> 6/50000=120e-6 s/sphere. If I adjust perfectly the step size to have a 
>>>>> drop, that means that to fill the geometry with, says 500,000 spheres, it 
>>>>> would take me around 6 months of computation to complete.
>>>>>
>>>>> Therefore, I write to see if:
>>>>>
>>>>>    1. Something is wrong my script. 
>>>>>    2. Some values can be safely relaxed. The Young's modulus and 
>>>>>    other sphere parameters were taken from a paper, so I would prefer not 
>>>>> to 
>>>>>    touch it. The time step seems already fairly high in my example.
>>>>>    3. If there are techniques that could be applied to lower the 
>>>>>    computation for this kind of common problem.
>>>>>
>>>>> Thank you!
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/3d7db9ce-1b1b-4800-9c6d-c445b524d716n%40googlegroups.com.

[chrono] Re: DEME-Engine single sphere drop performance

Reply via email to