> So I suggest that there is a test version that shows a proper message when an 
> error occurs.

I think the freezing that happens in your case is operating system specific and 
it would require some weird workarounds to detect at which RAM usage the 
combination of machine and operating system might freeze (i.e., I never 
observed my system freezing when I run out of RAM, since it has a pretty swift 
SSD, but the sklearn process may take a very long time to finish). Plus, 
scikit-learn would require to know and constantly check how much memory would 
be used and currently available (due to the use of other apps and the OS 
kernel), which wouldn't be feasible. 

I am not sure if this helps (depending where the memory-usage bottleneck is), 
but it could maybe help providing a sparse (CSR) array instead of a dense one 
to the .fit() method. Another thing to try would be to pre-compute the 
distances and give that to the .fit() method after initializing the DBSCAN 
object with metric='precomputed')

Best,
Sebastian

> On May 13, 2018, at 7:23 PM, Mauricio Reis <rei...@gmail.com> wrote:
> 
> I think the problem is due to the size of my database, which has 44,000 
> records. When I ran a database test with reduced sizes (10,000 and 20,000 
> first records), the routine ran normally.
> 
> You ask me to check the memory while running the DBScan routine, but I do not 
> know how to do that (if I did, I would have done that already).
> 
> I think the routine is not ready to work with too much data. The problem is 
> that my computer freezes and I can not analyze the case. I've tried to figure 
> out if any changes work (like changing routine parameters), but all 
> alternatives with lots of data (about 40,000 records) generate error.
> 
> I believe that package routines have no exception handling to improve 
> performance. So I suggest that there is a test version that shows a proper 
> message when an error occurs.
> 
> To summarize: 1) How to check the memory of the computer during the execution 
> of the routine? 2) I suggest developing test versions of routines that may 
> have a memory error.
> 
> Att.,
> Mauricio Reis
> 
> 2018-05-13 5:34 GMT-03:00 Roman Yurchak <rth.yurc...@gmail.com>:
> Could you please check memory usage while running DBSCAN to make sure 
> freezing is due to running out of memory and not to something else?
> Which parameters do you run DBSCAN with? Changing algorithm, leaf_size 
> parameters and ensuring n_jobs=1 could help.
> 
> Assuming eps is reasonable, I think it shouldn't be an issue to run DBSCAN on 
> L2 normalized data: using the default euclidean metric, this should produce 
> somewhat similar results to clustering not normalized data with 
> metric='cosine'.
> 
> On 13/05/18 00:20, Andrew Nystrom wrote:
> If you’re l2 norming your data, you’re making it live on the surface of a 
> hypershere. That surface will have a high density of points and may not have 
> areas of low density, in which case the entire surface could be recognized as 
> a single cluster if epsilon is high enough and min neighbors is low enough. 
> I’d suggest not using l2 norm with DBSCAN.
> On Sat, May 12, 2018 at 7:27 AM Mauricio Reis <rei...@gmail.com 
> <mailto:rei...@gmail.com>> wrote:
> 
>     The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my
>     computer without any warning message!
> 
>     I am using WinPython 3.6.5 64 bit.
> 
>     The method works normally with the original data, but freezes when I
>     use the normalized data (between 0 and 1).
> 
>     What should I do?
> 
>     Att.,
>     Mauricio Reis
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn@python.org <mailto:scikit-learn@python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to