Thanks Piotr for your feedback ! I did look into the sparkit-learn yesterday but couldn't locate the fact that it contained RandomForestClassifier method in it. I would need to request customer for downloading this for me as I don't have permission for that. May I please get your possible help whether sparkit-learn will have the following methods (corresponding to skikit learn):
1.sklearn.ensemble -> RandomForestClassifier 2.sklearn.cross_validation -> StratifiedKFold 3.sklearn.cross_validation -> train_test_split Do we have a URl for sparkit-learn similar to skikit learn where all the methods are listed I have figured out that sparkit-learn needs to be downloaded from https://pypi.python.org/pypi/sparkit-learn but apart from it does anything else need to be downloaded. Just wanted to check once before requesting my customer as otherwise it would be a bit embarrassing. Thanks again ! Cheers, Debu On Fri, Dec 9, 2016 at 3:37 PM, Piotr Bialecki <piotr.biale...@hotmail.de> wrote: > Hi Debu, > > I have not worked with pyspark yet and cannot resolve your error, > but have you tried out sparkit-learn? > https://github.com/lensacom/sparkit-learn > > It seems to be a package combining pyspark with sklearn and it also has a > RandomForest and other classifiers: > (SparkRandomForestClassifier, https://github.com/lensacom/ > sparkit-learn/blob/master/splearn/ensemble/__init__.py) > > > Greets, > Piotr > > On 09.12.2016 10:56, Debabrata Ghosh wrote: > > Hi Piotr, > Yes, I did use n_jobs = - 1 as well. But the code > didn't run successfully. On my output screen , I got the following message > instead of the JobLibMemoryError: > > 16/12/08 22:12:26 INFO YarnExtensionServices: In shutdown hook for > org.apache.spark.scheduler.cluster.YarnExtensionServices$$anon$1@176b071d > 16/12/08 22:12:26 INFO YarnHistoryService: Shutting down: pushing out 0 > events > 16/12/08 22:12:26 INFO YarnHistoryService: Event handler thread stopping > the service > 16/12/08 22:12:26 INFO YarnHistoryService: Stopping dequeue service, final > queue size is 0 > 16/12/08 22:12:26 INFO YarnHistoryService: Stopped: Service History > Service in state History Service: STOPPED endpoint= > <https://w3-01.ibm.com/tools/forms/ica/icaroute.nsf/bysrcall/ica201612786?OpenDocument> > http://servername.com:8188/ws/v1/timeline/ > <http://toplxhdmp001.rails.rwy.bnsf.com:8188/ws/v1/timeline/>; bonded to > ATS=false; listening=true; batchSize=3; flush count=17; current queue > size=0; total number queued=52, processed=50; post failures=0; > 16/12/08 22:12:26 INFO SparkContext: Invoking stop() from shutdown hook > 16/12/08 22:12:26 INFO YarnHistoryService: History service stopped; > ignoring queued event : [1481256746854]: SparkListenerApplicationEnd(14 > 81256746854) > > Just to get you a background I am executing the > scikit-learn Random Classifier using pyspark command. I am not getting what > has gone wrong while using n_jobs = -1 and suddenly the program is shutting > down certain services. Please can you suggest a remedy as I have been given > the task to run this via pyspark itself. > > Thanks in advance ! > > Cheers, > > Debu > > On Fri, Dec 9, 2016 at 2:48 PM, Piotr Bialecki <piotr.biale...@hotmail.de> > wrote: > >> Hi Debu, >> >> it seems that you run out of memory. >> Try using fewer processes. >> I don't think that n_jobs = 1000 will perform as you wish. >> >> Setting n_jobs to -1 uses the number of cores in your system. >> >> >> Greets, >> Piotr >> >> >> On 09.12.2016 08:16, Debabrata Ghosh wrote: >> >> Hi All, >> >> Greetings ! >> >> >> >> I am getting JoblibMemoryError while executing a scikit-learn >> RandomForestClassifier code. Here is my algorithm in short: >> >> >> >> from sklearn.ensemble import RandomForestClassifier >> >> from sklearn.cross_validation import train_test_split >> >> import pandas as pd >> >> import numpy as np >> >> clf = RandomForestClassifier(n_estimators=5000, n_jobs=1000) >> >> clf.fit(p_input_features_train,p_input_labels_train) >> >> >> The dataframe p_input_features contain 134 columns (features) and 5 >> million rows (observations). The exact *error message* is given below: >> >> >> Executing Random Forest Classifier >> Traceback (most recent call last): >> File "/home/user/rf_fold.py", line 43, in <module> >> clf.fit(p_features_train,p_labels_train) >> File "/var/opt/ lib/python2.7/site-packages/sklearn/ensemble/forest.py", >> line 290, in fit >> for i, t in enumerate(trees)) >> File >> "/var/opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", >> line 810, in __call__ >> self.retrieve() >> File "/var/opt/lib /python2.7/site-packages/sklea >> rn/externals/joblib/parallel.py", line 757, in retrieve >> raise exception >> sklearn.externals.joblib.my_exceptions.JoblibMemoryError: >> JoblibMemoryError >> ____________________________________________________________ >> _______________ >> Multiprocessing exception: >> ............................................................ >> ............... >> >> /var/opt/lib/python2.7/site-packages/sklearn/ensemble/forest.py in >> fit(self=RandomForestClassifier(bootstrap=True, class_wei...te=None, >> verbose=0, >> warm_start=False), X=array([[ 0. , 0. , >> 0. , .... 0. , 0. ]], dtype=float32), >> y=array([[ 0.], >> [ 0.], >> [ 0.], >> ..., >> [ 0.], >> [ 0.], >> [ 0.]]), sample_weight=None) >> 285 trees = Parallel(n_jobs=self.n_jobs, >> verbose=self.verbose, >> 286 backend="threading")( >> 287 delayed(_parallel_build_trees)( >> 288 t, self, X, y, sample_weight, i, len(trees), >> 289 verbose=self.verbose, >> class_weight=self.class_weight) >> --> 290 for i, t in enumerate(trees)) >> i = 4999 >> 291 >> 292 # Collect newly grown trees >> 293 self.estimators_.extend(trees) >> 294 >> >> ............................................................ >> ............... >> >> >> >> Please can you help me to identify a possible resolution to this. >> >> >> Thanks, >> >> Debu >> >> >> _______________________________________________ >> scikit-learn mailing >> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ scikit-learn mailing >> list scikit-learn@python.org https://mail.python.org/mailma >> n/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn