[Scikit-learn-general] load_svmlight_file value error

2016-02-12 Thread Gunjan Dewan
Hi all, I am using the following dataset from kaggle (train.csv): https://www.kaggle.com/c/lshtc/data The dataset is in libSVM format. However while trying to load it using load_svmlight_file, i get the following error File "_svmlight_format.pyx", line 72, in sklearn.datasets._svmlight_format._

Re: [Scikit-learn-general] load_svmlight_file value error

2016-02-12 Thread Mathieu Blondel
Hi Gunjan, Apparently the dataset is multi-label, so you need to use the multilabel=True option. http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html Mathieu On Fri, Feb 12, 2016 at 10:04 PM, Gunjan Dewan wrote: > Hi all, > > I am using the following datas

Re: [Scikit-learn-general] load_svmlight_file value error

2016-02-12 Thread Gunjan Dewan
Hi Mathieu, Thanks a lot for the help. But even after changing the multilabel option it is giving a value error : File "_svmlight_format.pyx", line 67, in sklearn.datasets._svmlight_format._load_svmlight_file (sklearn\datasets\_svmlight_format.c:2055) ValueError: could not convert string to f

[Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread muhammad waseem
Hi, I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fittin

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Sebastian Raschka
Hi, Waseem, I think lowering the value of n_jobs would help; as far as I know, each process get a copy of the data? Just stumbled upon spark-sklearn a few days ago, maybe that could help as well: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html When I understand

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Manoj Kumar
Hi Sebastian, This is true but only if the data is less than 1M. After that it is memmapped to a temp folder and is shared by all processes ( https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping ) You can try varying "max_nbytes" parameter wherever

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread muhammad waseem
Hi Sebastian and Manoj, @Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc? Thanks Kindest Regards Waseem On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka wrote: > Hi, Waseem, > I think lowering

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread muhammad waseem
Hi Sebastian and Manoj, @Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc? @Sebastian: Will the Spark implication will also improve the memory use or just the CPU? Thanks Kindest Regards On Fri, Fe

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Sebastian Raschka
Thanks for the note, Manoj, didn't know that! @muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case,

Re: [Scikit-learn-general] Using Typed MemoryViews for Numpy Arrays

2016-02-12 Thread mahesh ravishankar
Thanks Jacob V. and Jacob S. I have forked scikit-learn into my github and will start making my changes to my branch. I will send a code-review once I am done. Mahesh On Thu, Feb 11, 2016 at 11:18 AM, Jacob Vanderplas < jake...@cs.washington.edu> wrote: > Thanks Mahesh, > That particular code wa

Re: [Scikit-learn-general] Using Typed MemoryViews for Numpy Arrays

2016-02-12 Thread Jacob Schreiber
I would be interested in knowing if using typed memoryviews did not decrease performance. Please ping me once you have results! On Fri, Feb 12, 2016 at 11:04 AM, mahesh ravishankar < mahesh.ravishan...@gmail.com> wrote: > Thanks Jacob V. and Jacob S. > I have forked scikit-learn into my github an

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Manoj Kumar
Hi, That would depend on the size of the original dataset. But I think you should try Sebastian's suggestion first to make sure if the real issue is data duplication or not. On Fri, Feb 12, 2016 at 12:29 PM, muhammad waseem wrote: > Hi Sebastian and Manoj, > @Manoj: What should be the value of

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread muhammad waseem
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM. Is there a way t

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Sebastian Raschka
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there. Sent from my iPhone > On Feb 12, 2016, at 2:57 PM, muhammad waseem wrote: > > @Sebastian: I tried with n_jobs=10 (total is

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

2016-02-12 Thread Jacob Schreiber
I don't think that the data is copied for tree based classifiers. It uses the threading backend, so each thread should be sharing memory. On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka wrote: > I'd suggest trying n_jobs=1 and check if swap memory is used (you don't > have to run it until co

Re: [Scikit-learn-general] load_svmlight_file value error

2016-02-12 Thread Mathieu Blondel
It seems like our svmlight reader doesn't support spaces between labels: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71 Could you report an issue on github? In the mean time, you can write a small Python script that deletes the space between lab

Re: [Scikit-learn-general] load_svmlight_file value error

2016-02-12 Thread Gunjan Dewan
Ill do that. Thanks a lot. Gunjan On Sat, Feb 13, 2016 at 6:04 AM, Mathieu Blondel wrote: > It seems like our svmlight reader doesn't support spaces between labels: > > https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71 > > Could you report an is