[chrono] Re: DEME-Engine single sphere drop performance

Yves ROBERT Fri, 22 Mar 2024 07:12:31 -0700

Ruochun,

Thank you for your answer.


dt is quite large compared to the step size length, your estimate is 
correct. 
I will run a simple simulation to get you the CollaborationStats and 
TimingStats. 
I can always simplify the case, but it would lead to a much less realistic 
distribution if I, for example, do a column instead of a single drop since 
we would have the wrong speed when colliding with the packed bed. 
I could also make a disk, but it would drastically change the distribution, 
especially at the top where a cone should be formed.

You got my poor definition of scaling right, I apologize for the confusion.
Since I want to recirculate the spheres once dropped, and that many times 
("deleting" and creating new ones to mimic that behavior), if I have 
500,000 spheres that I want to recirculate 10 times, dropping spheres one 
by one, the performance drop by introducing 5,000,000 spheres instead of 
500,000 is huge.
Restarting the calculation every time would be quite a headache as well, 
and would still lead to the simulation of 1,000,000 spheres.

Thank you,
Yves


On Friday, March 22, 2024 at 5:29:35 AM UTC-4 Ruochun Zhang wrote:

> Hi Yves,
>
> About that you have to create inactive particles at dummy locations, and 
> the scaling is not the most ideal, could both be remedied when a mechanism 
> that fully freezes a family (not just disable contact and fix in position) 
> is implemented. Like I said last time, I haven't got the time to do that. 
> But with what we have now, you are doing this correctly.
>
> To understand how the performance can be improved, we have to understand 
> the time spent on each sub-task. There are a few possibilities that I can 
> think of just by looking at the script:
> 1. If *dt *is extremely small (close to the step size length), then 
> perhaps most of the runtime is spent on changing families. We can do very 
> little if the physics of the problem calls for that.
> 2. But I guess *dt *is in fact some meaningful amount of time (say 
> 0.05s). In that case, the first concern is that by adding spheres one by 
> one, this makes an extremely long simulation, thousands of seconds. What we 
> can do to improve such a long simulation is limited since there is a limit 
> on how much we can optimize the step size. You first should think if you 
> need to do this in the first place, or at least make sure you just do this 
> once, and save the simulation status for future use after this one time is 
> done. Then, in this case, the majority of runtime should be on 
> *DoDynamics(dt)*. We need to know the load balance (CollaborationStats 
> and TimingStats). It might be that the contact detection is so heavy but 
> the number of contacts is so low, dT is essentially always waiting for kT. 
> The first 50k or so particles do not scale anyway (due to using GPU), then 
> past that, the scaling is probably more affected by the load balance.
> 3. I assume by scaling you meant, say right now there are 100k particles 
> in the simulation, then at the same time, having 400k yet-to-be-inserted 
> particles makes the simulation run roughly twice as slow as having 150k 
> yet-to-be-inserted particles. Let's make sure we are saying the same thing, 
> because well, obviously the entire simulation scales with *N_target *because 
> you have that many for loops.
>
> Thank you,
> Ruochun
>
> On Tuesday, March 19, 2024 at 10:39:42 PM UTC+8 [email protected] 
> wrote:
>
>> Hi Ruochun,
>>
>> Based on that discussion, I developed a routine that creates the target 
>> number of spheres (say, 500,000), puts them into an "idle" family which 
>> should not interact with the environment, and then converts them one by one 
>> to fill the core. 
>> Here is a simple snap of an example:
>>
>> # Dummy input
>> World = {'half_width':5, 'height':20, 'family':0}
>> sphere = {'radius':0.05, 'family': 1, 'idle_family':2, 'N_target':500000}
>> sphere['template'] = DEMSim.LoadSphereType(sphere['mass'], sphere[
>> 'radius'], material)
>>
>>
>> # Set an idle family which does not interact with the environment
>> DEMSim.DisableFamilyOutput(sphere['idle_family'])
>> DEMSim.SetFamilyFixed(sphere['idle_family'])
>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
>> 'idle_family'])
>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
>> 'family'])
>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], World[
>> 'family'])
>>
>> # Add spheres before initializing
>> dummy_pos = [[np.random.uniform(-World['half_width'] + sphere['radius'], 
>> World['half_width'] - sphere['radius']), np.random.uniform(-World[
>> 'half_width'] + sphere['radius'], World['half_width'] - sphere['radius']), 
>> 0] for _ in range(sphere['N_stored'])]
>> dummy_vel = [[0.,0.,0.]] * sphere['N_stored']
>> sphere['object'] = DEMSim.AddClumps(sphere['template'], dummy_pos)
>> sphere['object'].SetVel(dummy_vel)
>> sphere['object'].SetFamilies([sphere['idle_family']]*sphere['N_target'])
>> sphere['tracker'] = DEMSim.Track(sphere['object'])
>>
>> # Initialize
>> sphere['N_inserted'] = 0
>> DEMSim.Initialize()
>>
>> # Run and insert spheres at the top of the geometry when possible
>> while sphere['N_inserted'] < sphere['N_target']:
>> if can_insert():
>> sphere['N_inserted']
>> sphere['tracker'].SetPos(pos=[0,0, World['height']-sphere['radius']], 
>> offset=sphere['N_inserted'])
>> sphere['tracker'].SetVel(vel=[0,0,0], offset=sphere['N_inserted'])
>> sphere['tracker'].SetFamily(fam_num=sphere['family'], offset=sphere[
>> 'N_inserted'])
>> sphere['N_inserted'] += 1
>> DEMSim.DoDynamics(dt)
>>
>> As you can see, I have to put the spheres in dummy dispersed positions 
>> otherwise I get an error about having too many spheres in the same bin, but 
>> is it the way to do it for me?
>> Then, time-wise, each step is faster than before (thanks to the removal 
>> of UpdateClumps) but still quite slow, and scales linearly with the number 
>> of target spheres and not really the number of currently inserted spheres.
>> Am I doing this wrong?
>>
>> Thank you
>>
>> On Saturday, February 3, 2024 at 7:21:11 AM UTC-5 Ruochun Zhang wrote:
>>
>>> Hi Yves,
>>>
>>> In terms of further simplifying the computation, my understanding is 
>>> that if the scale of your simulation is around 50,000 or 100,000 particles, 
>>> then saving time by partially "relaxing" the simulation domain is probably 
>>> not necessary. This is because the number of bodies is low anyway, and 
>>> further reducing the "effective" number of active simulation bodies might 
>>> further blur the performance edge of a GPU-based tool. However, letting the 
>>> simulation cover a longer simulation time using fewer time steps should 
>>> always help.
>>>
>>> I feel the best approach is to dynamically select the time step size. If 
>>> you know during certain periods of the simulation, everything is relatively 
>>> "dormant", then you can use large time step sizes during it, using the 
>>> method *UpdateStepSize*. You can change it back using the same method 
>>> if you believe a collision that requires fine time steps to resolve is 
>>> about to happen.
>>>
>>> If you still wish to "relax" a subset of the clumps in the simulation, 
>>> then perhaps family-based magics are the way to go. If you believe some 
>>> clumps are effectively fixed in place during a period, then you can again, 
>>> freeze them using the approach I discussed above. This indeed saves time 
>>> because those clumps will simply not have contacts among themselves. You 
>>> could also massage the material associated with a subset of the clumps 
>>> using the method *SetFamilyClumpMaterial*. However, I have to mention 
>>> that different material properties hardly make any impact on computational 
>>> efficiency. Soft materials with more damping could allow for a more lenient 
>>> time step size selection, but the step size is still determined by the 
>>> "harshest" contact that you have to resolve.
>>>
>>> The ultimate tool is of course the custom force model. If you can design 
>>> a model that is fast to solve and accurate enough for you, and potentially 
>>> resolves different parts of the simulation domain differently like you 
>>> wished, that's probably the best. For a starter, if you do not need 
>>> friction, then try calling *UseFrictionlessHertzianModel()* before 
>>> system initialization to use the frictionless Hertzian contact model. And 
>>> you can develop even cheaper and more specific models after that.
>>>
>>> Thank you,
>>> Ruochun
>>> On Friday, February 2, 2024 at 11:31:02 PM UTC+8 [email protected] 
>>> wrote:
>>>
>>>> Hello Ruochun,
>>>>
>>>> Thank you for your answer.
>>>>
>>>> That makes a lot of sense, especially since, in my case, I know how 
>>>> many I need from the beginning.
>>>> Your proposed method is quite smart; I will try to implement it.  I 
>>>> will run some tests and come back here to report the difference.
>>>>
>>>> Something else I was also wondering is there any way to kind of "relax" 
>>>> the problem in some parts of the geometry? The bottom of the geometry will 
>>>> not see large velocities and strong changes once few spheres have covered 
>>>> it, and that applies to the layers above later in the simulation.
>>>> If that is a possibility somehow, I am expecting this to be a large 
>>>> time saver as well.
>>>>
>>>> Thank you,
>>>> Yves
>>>>
>>>> On Thursday, February 1, 2024 at 3:35:11 AM UTC-5 Ruochun Zhang wrote:
>>>>
>>>>> Hi Yves,
>>>>>
>>>>> I only had a brief look at the script. So what you needed is to add 
>>>>> more spherical particles into the simulation, one by one, and I assume 
>>>>> you 
>>>>> need to do this thousands of times.
>>>>>
>>>>> The problem is that adding clumps, or say *UpdateClumps()*, is not 
>>>>> designed to be called too frequently, and it's really for adding a big 
>>>>> batch of clumps. When you call it, you need to sync the threads (perhaps 
>>>>> the cost of one round of contact detection), then the system goes through 
>>>>> a 
>>>>> process that is similar to initialization (no just-in-time compilation, 
>>>>> but 
>>>>> still a lot of memory accesses). Although I would expect it to be better 
>>>>> than what you measured (6.2s), maybe you also included the time needed to 
>>>>> advance a frame in between---I didn't look into that much detail.
>>>>>
>>>>> In any case, it's much better to get rid of adding clumps. If you know 
>>>>> how many you will have to add eventually, then initialize the system with 
>>>>> them in, but frozen (in a family that is fixed and has contacts disabled 
>>>>> with all other families). Track these clumps using a tracker (or more 
>>>>> trackers, if you want). Then each time you need to add a clump, use this 
>>>>> tracker to change a clump in this family (using offset, starting from 
>>>>> offset 0, then moving on to 1, 2... each time) to be in a different 
>>>>> family 
>>>>> so it becomes an "active" simulation object. Potentially, you can SetPos 
>>>>> this clump before activating it. This should be much more efficient, as a 
>>>>> known-sized simulation should be. As for material properties, I don't 
>>>>> think 
>>>>> they have significant effects here.
>>>>>
>>>>> Let me know if there is any difficulty implementing it,
>>>>> Ruochun
>>>>>
>>>>> On Wednesday, January 31, 2024 at 1:27:17 AM UTC+8 
>>>>> [email protected] wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am working on a problem which involves dropping one sphere at a 
>>>>>> time in a geometry from its top in DEME-Engine. The geometry can have 
>>>>>> multiple hundreds of thousands of spheres poured in it, so I would need 
>>>>>> something efficient. The constraint is that I have to always drop the 
>>>>>> sphere with a null velocity from the same spot.
>>>>>>
>>>>>> The problem I have is that it is very slow.
>>>>>>
>>>>>> I made an example attached, where I fast-forward to 50,000 spheres in 
>>>>>> the geometry, then drop them one by one. When measuring the performance 
>>>>>> (see log attached), I obtain something like 6.2 seconds per drop. The 
>>>>>> overhead I measured, when starting from 0, was ~0.2s, so it gives 
>>>>>> 6/50000=120e-6 s/sphere. If I adjust perfectly the step size to have a 
>>>>>> drop, that means that to fill the geometry with, says 500,000 spheres, 
>>>>>> it 
>>>>>> would take me around 6 months of computation to complete.
>>>>>>
>>>>>> Therefore, I write to see if:
>>>>>>
>>>>>>    1. Something is wrong my script. 
>>>>>>    2. Some values can be safely relaxed. The Young's modulus and 
>>>>>>    other sphere parameters were taken from a paper, so I would prefer 
>>>>>> not to 
>>>>>>    touch it. The time step seems already fairly high in my example.
>>>>>>    3. If there are techniques that could be applied to lower the 
>>>>>>    computation for this kind of common problem.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/6360c198-b463-4f1b-a481-5ec40affbb64n%40googlegroups.com.

[chrono] Re: DEME-Engine single sphere drop performance

Reply via email to