Re: [scikit-learn] Need Urgent help please in resolving JobLibMemoryError

Piotr Bialecki Fri, 09 Dec 2016 02:09:13 -0800

Hi Debu,

I have not worked with pyspark yet and cannot resolve your error,
but have you tried out sparkit-learn?
https://github.com/lensacom/sparkit-learn


It seems to be a package combining pyspark with sklearn and it also has a 
RandomForest and other classifiers:
(SparkRandomForestClassifier, 
https://github.com/lensacom/sparkit-learn/blob/master/splearn/ensemble/__init__.py)


Greets,
Piotr

On 09.12.2016 10:56, Debabrata Ghosh wrote:
Hi Piotr,
                     Yes, I did use n_jobs = - 1 as well. But the code didn't 
run successfully. On my output screen , I got the following message instead of 
the JobLibMemoryError:

16/12/08 22:12:26 INFO YarnExtensionServices: In shutdown hook for 
org.apache.spark.scheduler.cluster.YarnExtensionServices$$anon$1@176b071d
16/12/08 22:12:26 INFO YarnHistoryService: Shutting down: pushing out 0 events
16/12/08 22:12:26 INFO YarnHistoryService: Event handler thread stopping the 
service
16/12/08 22:12:26 INFO YarnHistoryService: Stopping dequeue service, final 
queue size is 0
16/12/08 22:12:26 INFO YarnHistoryService: Stopped: Service History Service in 
state History Service: STOPPED 
endpoint=<https://w3-01.ibm.com/tools/forms/ica/icaroute.nsf/bysrcall/ica201612786?OpenDocument>http://servername.com:8188/ws/v1/timeline/<http://toplxhdmp001.rails.rwy.bnsf.com:8188/ws/v1/timeline/>;
 bonded to ATS=false; listening=true; batchSize=3; flush count=17; current 
queue size=0; total number queued=52, processed=50; post failures=0;
16/12/08 22:12:26 INFO SparkContext: Invoking stop() from shutdown hook
16/12/08 22:12:26 INFO YarnHistoryService: History service stopped; ignoring 
queued event : [1481256746854]: SparkListenerApplicationEnd(1481256746854)

                     Just to get you a background I am executing the 
scikit-learn Random Classifier using pyspark command. I am not getting what has 
gone wrong while using n_jobs = -1 and suddenly the program is shutting down 
certain services. Please can you suggest a remedy as I have been given the task 
to run this via pyspark itself.

                      Thanks in advance !

Cheers,

Debu

On Fri, Dec 9, 2016 at 2:48 PM, Piotr Bialecki 
<piotr.biale...@hotmail.de<mailto:piotr.biale...@hotmail.de>> wrote:
Hi Debu,

it seems that you run out of memory.
Try using fewer processes.
I don't think that n_jobs = 1000 will perform as you wish.

Setting n_jobs to -1 uses the number of cores in your system.


Greets,
Piotr


On 09.12.2016 08:16, Debabrata Ghosh wrote:
Hi All,
                      Greetings !

I am getting JoblibMemoryError while executing a scikit-learn 
RandomForestClassifier code. Here is my algorithm in short:

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import pandas as pd
import numpy as np
clf = RandomForestClassifier(n_estimators=5000, n_jobs=1000)
clf.fit(p_input_features_train,p_input_labels_train)

The dataframe p_input_features contain 134 columns (features) and 5 million 
rows (observations). The exact error message is given below:

Executing Random Forest Classifier
Traceback (most recent call last):
  File "/home/user/rf_fold.py", line 43, in <module>
    clf.fit(p_features_train,p_labels_train)
  File "/var/opt/ lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 
290, in fit
    for i, t in enumerate(trees))
  File 
"/var/opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", 
line 810, in __call__
    self.retrieve()
  File "/var/opt/lib 
/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 757, in 
retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibMemoryError: JoblibMemoryError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/var/opt/lib/python2.7/site-packages/sklearn/ensemble/forest.py in 
fit(self=RandomForestClassifier(bootstrap=True, class_wei...te=None, verbose=0,
            warm_start=False), X=array([[ 0.        ,  0.        ,  0.        , 
....        0.        ,  0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ...,
       [ 0.],
       [ 0.],
       [ 0.]]), sample_weight=None)
    285             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
    286                              backend="threading")(
    287                 delayed(_parallel_build_trees)(
    288                     t, self, X, y, sample_weight, i, len(trees),
    289                     verbose=self.verbose, 
class_weight=self.class_weight)
--> 290                 for i, t in enumerate(trees))
        i = 4999
    291
    292             # Collect newly grown trees
    293             self.estimators_.extend(trees)
    294
...........................................................................

Please can you help me to identify a possible resolution to this.

Thanks,

Debu



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________ scikit-learn mailing list 
scikit-learn@python.org<mailto:scikit-learn@python.org> 
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Need Urgent help please in resolving JobLibMemoryError

Reply via email to