I think the problem is due to the size of my database, which has 44,000 records. When I ran a database test with reduced sizes (10,000 and 20,000 first records), the routine ran normally.
You ask me to check the memory while running the DBScan routine, but I do not know how to do that (if I did, I would have done that already). I think the routine is not ready to work with too much data. The problem is that my computer freezes and I can not analyze the case. I've tried to figure out if any changes work (like changing routine parameters), but all alternatives with lots of data (about 40,000 records) generate error. I believe that package routines have no exception handling to improve performance. So I suggest that there is a test version that shows a proper message when an error occurs. To summarize: 1) How to check the memory of the computer during the execution of the routine? 2) I suggest developing test versions of routines that may have a memory error. Att., Mauricio Reis 2018-05-13 5:34 GMT-03:00 Roman Yurchak <rth.yurc...@gmail.com>: > Could you please check memory usage while running DBSCAN to make sure > freezing is due to running out of memory and not to something else? > Which parameters do you run DBSCAN with? Changing algorithm, leaf_size > parameters and ensuring n_jobs=1 could help. > > Assuming eps is reasonable, I think it shouldn't be an issue to run DBSCAN > on L2 normalized data: using the default euclidean metric, this should > produce somewhat similar results to clustering not normalized data with > metric='cosine'. > > On 13/05/18 00:20, Andrew Nystrom wrote: > >> If you’re l2 norming your data, you’re making it live on the surface of a >> hypershere. That surface will have a high density of points and may not >> have areas of low density, in which case the entire surface could be >> recognized as a single cluster if epsilon is high enough and min neighbors >> is low enough. I’d suggest not using l2 norm with DBSCAN. >> On Sat, May 12, 2018 at 7:27 AM Mauricio Reis <rei...@gmail.com <mailto: >> rei...@gmail.com>> wrote: >> >> The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my >> computer without any warning message! >> >> I am using WinPython 3.6.5 64 bit. >> >> The method works normally with the original data, but freezes when I >> use the normalized data (between 0 and 1). >> >> What should I do? >> >> Att., >> Mauricio Reis >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org <mailto:scikit-learn@python.org> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn