Hi Olivier, Thanks for your info.I will follow it from now on. Details of traceback are given below:
----------Full traceback--------------- Fitting 3 folds for each of 10 candidates, totalling 30 fits C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20. DeprecationWarning) ---------------------------------------------------------------------------OverflowError Traceback (most recent call last)<ipython-input-19-321b410b10ad> in <module>() 18 19 ---> 20 random_search_sg.fit(scaled_data, labels) 21 22 print("RandomizedSearchCV took %.2f seconds for %d candidates" C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py in fit(self, X, y) 1023 self.n_iter, 1024 random_state=self.random_state)-> 1025 return self._fit(X, y, sampled_params) C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py in _fit(self, X, y, parameter_iterable) 571 self.fit_params, return_parameters=True, 572 error_score=self.error_score)--> 573 for parameters in parameter_iterable 574 for train, test in cv) 575 C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable) 756 # was dispatched. In particular this covers the edge 757 # case of Parallel used with an exhausted iterator.--> 758 while self.dispatch_one_batch(iterator): 759 self._iterating = True 760 else: C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator) 601 602 with self._lock:--> 603 tasks = BatchedCalls(itertools.islice(iterator, batch_size)) 604 if len(tasks) == 0: 605 # No more tasks available in the iterator: tell caller to stop. C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __init__(self, iterator_slice) 125 126 def __init__(self, iterator_slice):--> 127 self.items = list(iterator_slice) 128 self._size = len(self.items) 129 C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py in <genexpr>(.0) 567 pre_dispatch=pre_dispatch 568 )(--> 569 delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_, 570 train, test, self.verbose, parameters, 571 self.fit_params, return_parameters=True, C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py in __iter__(self) 250 + " For exhaustive searches, use GridSearchCV.") 251 for i in sample_without_replacement(grid_size, self.n_iter,--> 252 random_state=rnd): 253 yield param_grid[i] 254 sklearn\utils\_random.pyx in sklearn.utils._random.sample_without_replacement (sklearn\utils\_random.c:3975)() OverflowError: Python int too large to convert to C long -------------------End of traceback----------------------------- Shape of scaled_data and labels are: (772330, 15) and (772330,) (I tried using scaled_data as CSR matrix as well as numpy array) btw, when I run it separately (without *randomizedsearchCV*), it works fine with the same dataset: ---- ---------------------------Code below runs fine------------------------------------- params_c = { 'n_estimators': 310, 'learning_rate': 0.1, 'min_child_weight': 5, 'max_depth': 10, 'gamma': 0, 'max_delta_step': 14, 'max_depth':5, 'subsample': 1, 'colsample_bytree': 1, 'colsample_bylevel': 1, 'reg_lambda': 1, 'reg_alpha': 0, 'scale_pos_weight': 1, 'objective': 'binary:logistic', 'silent': False, } c = xgb.XGBClassifier(**params_c) X_train, X_test, y_train, y_test = train_test_split(scaled_data, labels) from sklearn.metrics import confusion_matrix c.fit(X_train,y_train) y_pred = c.predict(X_test) cm3 = confusion_matrix(y_test, y_pred) print(cm3) ---------End of code that runs fine -------------------- On Wed, Apr 19, 2017 at 4:45 PM, Olivier Grisel <olivier.gri...@ensta.org> wrote: > Please provide the full traceback. Without it it's impossible to tell > whether the problem is in scikit-learn or xgboost. > > Also, please provide a minimal reproduction script as explained in: > > http://scikit-learn.org/stable/faq.html#what-s-the- > best-way-to-get-help-on-scikit-learn-usage > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn