I think the problem is due to the size of my database, which has 44,000
records. When I ran a database test with reduced sizes (10,000 and 20,000
first records), the routine ran normally.

You ask me to check the memory while running the DBScan routine, but I do
not know how to do that (if I did, I would have done that already).

I think the routine is not ready to work with too much data. The problem is
that my computer freezes and I can not analyze the case. I've tried to
figure out if any changes work (like changing routine parameters), but all
alternatives with lots of data (about 40,000 records) generate error.

I believe that package routines have no exception handling to improve
performance. So I suggest that there is a test version that shows a proper
message when an error occurs.

To summarize: 1) How to check the memory of the computer during the
execution of the routine? 2) I suggest developing test versions of routines
that may have a memory error.

Att.,
Mauricio Reis

2018-05-13 5:34 GMT-03:00 Roman Yurchak <rth.yurc...@gmail.com>:

> Could you please check memory usage while running DBSCAN to make sure
> freezing is due to running out of memory and not to something else?
> Which parameters do you run DBSCAN with? Changing algorithm, leaf_size
> parameters and ensuring n_jobs=1 could help.
>
> Assuming eps is reasonable, I think it shouldn't be an issue to run DBSCAN
> on L2 normalized data: using the default euclidean metric, this should
> produce somewhat similar results to clustering not normalized data with
> metric='cosine'.
>
> On 13/05/18 00:20, Andrew Nystrom wrote:
>
>> If you’re l2 norming your data, you’re making it live on the surface of a
>> hypershere. That surface will have a high density of points and may not
>> have areas of low density, in which case the entire surface could be
>> recognized as a single cluster if epsilon is high enough and min neighbors
>> is low enough. I’d suggest not using l2 norm with DBSCAN.
>> On Sat, May 12, 2018 at 7:27 AM Mauricio Reis <rei...@gmail.com <mailto:
>> rei...@gmail.com>> wrote:
>>
>>     The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my
>>     computer without any warning message!
>>
>>     I am using WinPython 3.6.5 64 bit.
>>
>>     The method works normally with the original data, but freezes when I
>>     use the normalized data (between 0 and 1).
>>
>>     What should I do?
>>
>>     Att.,
>>     Mauricio Reis
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn@python.org <mailto:scikit-learn@python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to