I read a large data into memory and it cost about 2GB ram(I have 4GB ram) Size get from sys.getsizeof(train_X) *63963248*
And I evalute clustering with gridsearchcv below: def grid_search_clu(X): def cv_scorer(estimator, X): estimator.fit(X) cluster_labels = estimator.labels_ if hasattr(estimator, 'labels_') else estimator.predict(X)#estimator.predict(X)#.labels_ num_labels = len(set(cluster_labels)) num_samples = len(X) if num_labels == 1 or num_labels == num_samples: return -1 else: return -metrics.davies_bouldin_score(X, cluster_labels) m = cluster.Birch(n_clusters=None, compute_labels=True) m_param = {'branching_factor' : range(10,60,10), 'threshold' : np.arange(0.1, 0.6, 0.1).round(decimals=3) } clf = GridSearchCV(m, m_param, cv=[(slice(None), slice(None))], scoring=cv_scorer, verbose=1, n_jobs=1, return_train_score=False).fit(X) And I got memoryerror, how should I do to solve this? Adjust the parameters' range? thx
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn