[chrono] Re: DEME-Engine single sphere drop performance

Ruochun Zhang Fri, 22 Mar 2024 13:09:35 -0700

Let's see how the timing stats are. And you decide the physics. If 
inserting one by one is what you need, then... even if it's a 1000s 
simulation, you probably have to brute force through.


You know, in your case I think the best approach is a hybrid one: You 
UpdateClumps() each time with a small batch, say 1000 spheres. All those 
1000 spheres start with a dormant family. Then they are activated one by 
one. Once they are all done, you add another batch of 1000. This should 
scale better (theoretically scales with the number of currently alive 
clumps), and since the batch is bigger, you don't have that many 
UpdateClumps(), so it's not too expensive there, either. But of course, if 
the load balance is indeed very bad, this probably does not resolve 
everything.

Thank you,
Ruochun 

On Friday, March 22, 2024 at 10:12:27 PM UTC+8 [email protected] wrote:

> Ruochun,
>
> Thank you for your answer.
>
> dt is quite large compared to the step size length, your estimate is 
> correct. 
> I will run a simple simulation to get you the CollaborationStats and 
> TimingStats. 
> I can always simplify the case, but it would lead to a much less realistic 
> distribution if I, for example, do a column instead of a single drop since 
> we would have the wrong speed when colliding with the packed bed. 
> I could also make a disk, but it would drastically change the 
> distribution, especially at the top where a cone should be formed.
>
> You got my poor definition of scaling right, I apologize for the confusion.
> Since I want to recirculate the spheres once dropped, and that many times 
> ("deleting" and creating new ones to mimic that behavior), if I have 
> 500,000 spheres that I want to recirculate 10 times, dropping spheres one 
> by one, the performance drop by introducing 5,000,000 spheres instead of 
> 500,000 is huge.
> Restarting the calculation every time would be quite a headache as well, 
> and would still lead to the simulation of 1,000,000 spheres.
>
> Thank you,
> Yves
>
>
> On Friday, March 22, 2024 at 5:29:35 AM UTC-4 Ruochun Zhang wrote:
>
>> Hi Yves,
>>
>> About that you have to create inactive particles at dummy locations, and 
>> the scaling is not the most ideal, could both be remedied when a mechanism 
>> that fully freezes a family (not just disable contact and fix in position) 
>> is implemented. Like I said last time, I haven't got the time to do that. 
>> But with what we have now, you are doing this correctly.
>>
>> To understand how the performance can be improved, we have to understand 
>> the time spent on each sub-task. There are a few possibilities that I can 
>> think of just by looking at the script:
>> 1. If *dt *is extremely small (close to the step size length), then 
>> perhaps most of the runtime is spent on changing families. We can do very 
>> little if the physics of the problem calls for that.
>> 2. But I guess *dt *is in fact some meaningful amount of time (say 
>> 0.05s). In that case, the first concern is that by adding spheres one by 
>> one, this makes an extremely long simulation, thousands of seconds. What we 
>> can do to improve such a long simulation is limited since there is a limit 
>> on how much we can optimize the step size. You first should think if you 
>> need to do this in the first place, or at least make sure you just do this 
>> once, and save the simulation status for future use after this one time is 
>> done. Then, in this case, the majority of runtime should be on 
>> *DoDynamics(dt)*. We need to know the load balance (CollaborationStats 
>> and TimingStats). It might be that the contact detection is so heavy but 
>> the number of contacts is so low, dT is essentially always waiting for kT. 
>> The first 50k or so particles do not scale anyway (due to using GPU), then 
>> past that, the scaling is probably more affected by the load balance.
>> 3. I assume by scaling you meant, say right now there are 100k particles 
>> in the simulation, then at the same time, having 400k yet-to-be-inserted 
>> particles makes the simulation run roughly twice as slow as having 150k 
>> yet-to-be-inserted particles. Let's make sure we are saying the same thing, 
>> because well, obviously the entire simulation scales with *N_target *because 
>> you have that many for loops.
>>
>> Thank you,
>> Ruochun
>>
>> On Tuesday, March 19, 2024 at 10:39:42 PM UTC+8 [email protected] 
>> wrote:
>>
>>> Hi Ruochun,
>>>
>>> Based on that discussion, I developed a routine that creates the target 
>>> number of spheres (say, 500,000), puts them into an "idle" family which 
>>> should not interact with the environment, and then converts them one by one 
>>> to fill the core. 
>>> Here is a simple snap of an example:
>>>
>>> # Dummy input
>>> World = {'half_width':5, 'height':20, 'family':0}
>>> sphere = {'radius':0.05, 'family': 1, 'idle_family':2, 'N_target':500000
>>> }
>>> sphere['template'] = DEMSim.LoadSphereType(sphere['mass'], sphere[
>>> 'radius'], material)
>>>
>>>
>>> # Set an idle family which does not interact with the environment
>>> DEMSim.DisableFamilyOutput(sphere['idle_family'])
>>> DEMSim.SetFamilyFixed(sphere['idle_family'])
>>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
>>> 'idle_family'])
>>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], sphere[
>>> 'family'])
>>> DEMSim.DisableContactBetweenFamilies(sphere['idle_family'], World[
>>> 'family'])
>>>
>>> # Add spheres before initializing
>>> dummy_pos = [[np.random.uniform(-World['half_width'] + sphere['radius'], 
>>> World['half_width'] - sphere['radius']), np.random.uniform(-World[
>>> 'half_width'] + sphere['radius'], World['half_width'] - sphere['radius']), 
>>> 0] for _ in range(sphere['N_stored'])]
>>> dummy_vel = [[0.,0.,0.]] * sphere['N_stored']
>>> sphere['object'] = DEMSim.AddClumps(sphere['template'], dummy_pos)
>>> sphere['object'].SetVel(dummy_vel)
>>> sphere['object'].SetFamilies([sphere['idle_family']]*sphere['N_target'])
>>> sphere['tracker'] = DEMSim.Track(sphere['object'])
>>>
>>> # Initialize
>>> sphere['N_inserted'] = 0
>>> DEMSim.Initialize()
>>>
>>> # Run and insert spheres at the top of the geometry when possible
>>> while sphere['N_inserted'] < sphere['N_target']:
>>> if can_insert():
>>> sphere['N_inserted']
>>> sphere['tracker'].SetPos(pos=[0,0, World['height']-sphere['radius']], 
>>> offset=sphere['N_inserted'])
>>> sphere['tracker'].SetVel(vel=[0,0,0], offset=sphere['N_inserted'])
>>> sphere['tracker'].SetFamily(fam_num=sphere['family'], offset=sphere[
>>> 'N_inserted'])
>>> sphere['N_inserted'] += 1
>>> DEMSim.DoDynamics(dt)
>>>
>>> As you can see, I have to put the spheres in dummy dispersed positions 
>>> otherwise I get an error about having too many spheres in the same bin, but 
>>> is it the way to do it for me?
>>> Then, time-wise, each step is faster than before (thanks to the removal 
>>> of UpdateClumps) but still quite slow, and scales linearly with the number 
>>> of target spheres and not really the number of currently inserted spheres.
>>> Am I doing this wrong?
>>>
>>> Thank you
>>>
>>> On Saturday, February 3, 2024 at 7:21:11 AM UTC-5 Ruochun Zhang wrote:
>>>
>>>> Hi Yves,
>>>>
>>>> In terms of further simplifying the computation, my understanding is 
>>>> that if the scale of your simulation is around 50,000 or 100,000 
>>>> particles, 
>>>> then saving time by partially "relaxing" the simulation domain is probably 
>>>> not necessary. This is because the number of bodies is low anyway, and 
>>>> further reducing the "effective" number of active simulation bodies might 
>>>> further blur the performance edge of a GPU-based tool. However, letting 
>>>> the 
>>>> simulation cover a longer simulation time using fewer time steps should 
>>>> always help.
>>>>
>>>> I feel the best approach is to dynamically select the time step size. 
>>>> If you know during certain periods of the simulation, everything is 
>>>> relatively "dormant", then you can use large time step sizes during it, 
>>>> using the method *UpdateStepSize*. You can change it back using the 
>>>> same method if you believe a collision that requires fine time steps to 
>>>> resolve is about to happen.
>>>>
>>>> If you still wish to "relax" a subset of the clumps in the simulation, 
>>>> then perhaps family-based magics are the way to go. If you believe some 
>>>> clumps are effectively fixed in place during a period, then you can again, 
>>>> freeze them using the approach I discussed above. This indeed saves time 
>>>> because those clumps will simply not have contacts among themselves. You 
>>>> could also massage the material associated with a subset of the clumps 
>>>> using the method *SetFamilyClumpMaterial*. However, I have to mention 
>>>> that different material properties hardly make any impact on computational 
>>>> efficiency. Soft materials with more damping could allow for a more 
>>>> lenient 
>>>> time step size selection, but the step size is still determined by the 
>>>> "harshest" contact that you have to resolve.
>>>>
>>>> The ultimate tool is of course the custom force model. If you can 
>>>> design a model that is fast to solve and accurate enough for you, and 
>>>> potentially resolves different parts of the simulation domain differently 
>>>> like you wished, that's probably the best. For a starter, if you do not 
>>>> need friction, then try calling *UseFrictionlessHertzianModel()* 
>>>> before system initialization to use the frictionless Hertzian contact 
>>>> model. And you can develop even cheaper and more specific models after 
>>>> that.
>>>>
>>>> Thank you,
>>>> Ruochun
>>>> On Friday, February 2, 2024 at 11:31:02 PM UTC+8 [email protected] 
>>>> wrote:
>>>>
>>>>> Hello Ruochun,
>>>>>
>>>>> Thank you for your answer.
>>>>>
>>>>> That makes a lot of sense, especially since, in my case, I know how 
>>>>> many I need from the beginning.
>>>>> Your proposed method is quite smart; I will try to implement it.  I 
>>>>> will run some tests and come back here to report the difference.
>>>>>
>>>>> Something else I was also wondering is there any way to kind of 
>>>>> "relax" the problem in some parts of the geometry? The bottom of the 
>>>>> geometry will not see large velocities and strong changes once few 
>>>>> spheres 
>>>>> have covered it, and that applies to the layers above later in the 
>>>>> simulation.
>>>>> If that is a possibility somehow, I am expecting this to be a large 
>>>>> time saver as well.
>>>>>
>>>>> Thank you,
>>>>> Yves
>>>>>
>>>>> On Thursday, February 1, 2024 at 3:35:11 AM UTC-5 Ruochun Zhang wrote:
>>>>>
>>>>>> Hi Yves,
>>>>>>
>>>>>> I only had a brief look at the script. So what you needed is to add 
>>>>>> more spherical particles into the simulation, one by one, and I assume 
>>>>>> you 
>>>>>> need to do this thousands of times.
>>>>>>
>>>>>> The problem is that adding clumps, or say *UpdateClumps()*, is not 
>>>>>> designed to be called too frequently, and it's really for adding a big 
>>>>>> batch of clumps. When you call it, you need to sync the threads (perhaps 
>>>>>> the cost of one round of contact detection), then the system goes 
>>>>>> through a 
>>>>>> process that is similar to initialization (no just-in-time compilation, 
>>>>>> but 
>>>>>> still a lot of memory accesses). Although I would expect it to be better 
>>>>>> than what you measured (6.2s), maybe you also included the time needed 
>>>>>> to 
>>>>>> advance a frame in between---I didn't look into that much detail.
>>>>>>
>>>>>> In any case, it's much better to get rid of adding clumps. If you 
>>>>>> know how many you will have to add eventually, then initialize the 
>>>>>> system 
>>>>>> with them in, but frozen (in a family that is fixed and has contacts 
>>>>>> disabled with all other families). Track these clumps using a tracker 
>>>>>> (or 
>>>>>> more trackers, if you want). Then each time you need to add a clump, use 
>>>>>> this tracker to change a clump in this family (using offset, starting 
>>>>>> from 
>>>>>> offset 0, then moving on to 1, 2... each time) to be in a different 
>>>>>> family 
>>>>>> so it becomes an "active" simulation object. Potentially, you can SetPos 
>>>>>> this clump before activating it. This should be much more efficient, as 
>>>>>> a 
>>>>>> known-sized simulation should be. As for material properties, I don't 
>>>>>> think 
>>>>>> they have significant effects here.
>>>>>>
>>>>>> Let me know if there is any difficulty implementing it,
>>>>>> Ruochun
>>>>>>
>>>>>> On Wednesday, January 31, 2024 at 1:27:17 AM UTC+8 
>>>>>> [email protected] wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am working on a problem which involves dropping one sphere at a 
>>>>>>> time in a geometry from its top in DEME-Engine. The geometry can have 
>>>>>>> multiple hundreds of thousands of spheres poured in it, so I would need 
>>>>>>> something efficient. The constraint is that I have to always drop the 
>>>>>>> sphere with a null velocity from the same spot.
>>>>>>>
>>>>>>> The problem I have is that it is very slow.
>>>>>>>
>>>>>>> I made an example attached, where I fast-forward to 50,000 spheres 
>>>>>>> in the geometry, then drop them one by one. When measuring the 
>>>>>>> performance 
>>>>>>> (see log attached), I obtain something like 6.2 seconds per drop. The 
>>>>>>> overhead I measured, when starting from 0, was ~0.2s, so it gives 
>>>>>>> 6/50000=120e-6 s/sphere. If I adjust perfectly the step size to have a 
>>>>>>> drop, that means that to fill the geometry with, says 500,000 spheres, 
>>>>>>> it 
>>>>>>> would take me around 6 months of computation to complete.
>>>>>>>
>>>>>>> Therefore, I write to see if:
>>>>>>>
>>>>>>>    1. Something is wrong my script. 
>>>>>>>    2. Some values can be safely relaxed. The Young's modulus and 
>>>>>>>    other sphere parameters were taken from a paper, so I would prefer 
>>>>>>> not to 
>>>>>>>    touch it. The time step seems already fairly high in my example.
>>>>>>>    3. If there are techniques that could be applied to lower the 
>>>>>>>    computation for this kind of common problem.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/2c88df4f-1211-4180-84cf-afcd01931243n%40googlegroups.com.

[chrono] Re: DEME-Engine single sphere drop performance

Reply via email to