Hi Mohammad,

If it is still working fine, then that is good and I can explain this new 
fix. For an end user, it is however not a big concern. 

There is a step in the domain discretization that is materialized by doing 
two essentially identical kernels in serial, one for allocating the correct 
size of memory and one for actually filling out the memory. The code used 
to rely on those two kernels producing the same result, and in most cases 
they do. However, when triangles are involved, there is something like a 1 
in ten or one hundred billion chance that they do not agree, presumably 
because of the floating point arithmetic execution order. It can in the end 
leave one integer slot at a random state, and that one will then be used as 
an index, causing a segmentation fault. It is subtle, and I did learn 
something in the process of chasing it down.

Thank you,
Ruochun

On Sunday, January 22, 2023 at 7:40:03 PM UTC-6 [email protected] wrote:

> Hi Ruochun, 
>
> Thank you for your reply. 
>
> I would like to confirm that I did not see an error saying a bin exceeding 
> the maximum allowance of 256" in the error report.
> In addition, when I ran multiple tests, it seemed that the simulation 
> failed at different points in the simulation. 
>
> I have pulled from the new repository and it seems that everything is 
> working fine right now. 
>
> Thank you so much, 
>
> On Sunday, January 22, 2023 at 1:38:15 PM UTC-7 Ruochun Zhang wrote:
>
>> Hi Mohammad,
>>
>> First I would like you to make sure that you did not see something like 
>> "geometries in a bin exceeding maximum allowance of 256" in the error 
>> report. If you did not see it, then I'd like you to confirm that the crash 
>> happens a while into the simulation, and may occur at different time spots 
>> across several simulations.
>>
>> If you can confirm these two, then it's probably a good time to have this 
>> discussion... I'll ask you to try this: Pull the latest repo, which 
>> contains a stability fix that I pushed today. Build that and run your 
>> script (if you work on your own branch it'd be the best since you can just 
>> merge the main into it). Please build it using a new and empty build 
>> folder, or alternatively, you can remove the *kernel *directory in your 
>> old build directory and then build it again there. Let's try if it helps, 
>> and if it does I can explain what happened.
>>
>> About the update frequency, when you see what you described, that means 
>> you can increase the update frequency. If you make the bins very small, 
>> kT's workload is high so a larger CDFreq number is needed. The reported 
>> frequency can be maybe 1 or 2 higher than the limit you define just by how 
>> I calculate it, and when you see that you know dT waits for kT from time to 
>> time in the simulation (which can be confirmed in the final collaboration 
>> stats as well; but I guess if it crashes then you don't get to see it), and 
>> you should increase that number to improve efficiency. All of these should 
>> not be a problem after the solver gains the ability to adjust the frequency 
>> by itself.
>>
>> Thank you,
>> Ruochun
>>
>> On Sunday, January 22, 2023 at 11:01:43 AM UTC-6 [email protected] 
>> wrote:
>>
>>> Hi, 
>>>
>>> This is a DEME-related question. 
>>>
>>> I have been running into a problem where my simulation crashes after 
>>> being normal for a while. The error I get is the following:
>>> //////////////////////////////////////////////////
>>> -------- Simulation crashed potentially due to too many geometries in a 
>>> bin --------
>>> Right now, the dT reported (by user specification or by calculation) max 
>>> velocity is 0.133465
>>> The contact margin thickness is 9.35108e-06
>>> If the velocity is extremely large, then the simulation probably 
>>> diverged due to encountering large particle velocities, and decreasing the 
>>> step size could help.
>>> If the velocity is fair but the margin is large compared to particle 
>>> sizes, then perhaps too many contact geometries are in one bin, and 
>>> decreasing the step size, update frequency or the bin size could help.
>>> If they are both fair and you do not see "exceeding maximum allowance" 
>>> reports before the crash, then it is probably not too many geometries in a 
>>> bin and it crashed for other reasons.
>>>
>>> terminate called after throwing an instance of 'std::runtime_error'
>>>   what():  GPU Assertion: an illegal memory access was encountered. This 
>>> happened in /DEM-Engine/src/algorithms/DEMCubContactDetection.cu:
>>>
>>> ////////////////////////////////////////////////////////////////
>>>
>>> I have tried to reduce my simulation bin size to as small as 
>>> 0.5*particle radius. I have also tried to reduce/increase other parameters, 
>>> such as update frequency and safety multiplier but still, the simulation 
>>> crashes after being normal for a while (I have a video that I could share 
>>> with you via email if you like). In addition, I have tried to reduce the 
>>> time step size very much (4e-7) but that did not seem to work.  Also, I 
>>> have reduced my mesh to have a Total num of triangles: 6790 which I do not 
>>> think is really large. I have attached my sim file for your reference. 
>>>
>>> In addition, I have tried to use a different material from one of your 
>>> demos with the same time step but I still seem to have the same problem. 
>>>
>>> Also, one thing that I noticed, every time I increased the CDupdate 
>>> frequency value, the simulation reports a higher value of Average steps per 
>>> dynamic update. for example, when I set my update frequency to 15, the 
>>> simulation reports the Average steps per dynamic update: 16.94662. In 
>>> addition, when I increase my CD update to 20  the simulation reports the 
>>> Average steps per dynamic update: 21.997. Is that how it is supposed to be?
>>>
>>> Thank you in advance for your help, 
>>>
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/projectchrono/fe9f66de-608f-4040-9882-fef5e5ce4687n%40googlegroups.com.

Reply via email to