Re: java.io.IOException: No space left on device

2015-04-29 Thread selim namsi
Sorry I put the log messages when creating the thread in
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-td22702.html
but I forgot that raw messages will not be sent in emails.

So this is the log related to the error :

15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_0 not found, computing it
15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_0 locally
15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_1 not found, computing it
15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_1 locally
15/04/29 02:49:13 WARN MemoryStore: Not enough space to cache rdd_19_1
in memory! (computed 1106.0 MB so far)
15/04/29 02:49:13 INFO MemoryStore: Memory use = 234.0 MB (blocks) +
2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage
limit = 3.1 GB.
15/04/29 02:49:13 WARN CacheManager: Persisting partition rdd_19_1 to
disk instead.
15/04/29 02:49:28 WARN MemoryStore: Not enough space to cache rdd_19_0
in memory! (computed 1745.7 MB so far)
15/04/29 02:49:28 INFO MemoryStore: Memory use = 234.0 MB (blocks) +
2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage
limit = 3.1 GB.
15/04/29 02:49:28 WARN CacheManager: Persisting partition rdd_19_0 to
disk instead.
15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_0 failed
15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_1 failed
15/04/29 03:56:12 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7)
java.io.IOException: No space left on *device

*It seems that the partitions rdd_19_0 and rdd_9=19_1 needs both of
them  2.9 GB.

Thanks


On Wed, Apr 29, 2015 at 12:34 PM Dean Wampler  wrote:

> Makes sense. "/" is where /tmp would be. However, 230G should be plenty of
> space. If you have INFO logging turned on (set in
> $SPARK_HOME/conf/log4j.properties), you'll see messages about saving data
> to disk that will list sizes. The web console also has some summary
> information about this.
>
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Wed, Apr 29, 2015 at 6:25 AM, selim namsi 
> wrote:
>
>> This is the output of df -h so as you can see I'm using only one disk
>> mounted on /
>>
>> df -h
>> Filesystem  Size  Used Avail Use% Mounted on
>> /dev/sda8   276G   34G  229G  13% /none4.0K 0  4.0K   0% 
>> /sys/fs/cgroup
>> udev7.8G  4.0K  7.8G   1% /dev
>> tmpfs   1.6G  1.4M  1.6G   1% /runnone5.0M 0  5.0M   
>> 0% /run/locknone7.8G   37M  7.8G   1% /run/shmnone
>> 100M   40K  100M   1% /run/user
>> /dev/sda1   496M   55M  442M  11% /boot/efi
>>
>> Also when running the program, I noticed that the Used% disk space
>> related to the partition mounted on "/" was growing very fast
>>
>> On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle 
>> wrote:
>>
>>> Do you have multiple disks? Maybe your work directory is not in the
>>> right disk?
>>>
>>> On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
>>>> output,the training data is a file containing 156060 (size 8.1M).
>>>>
>>>> The problem is that when trying to presist a partition into memory and
>>>> there
>>>> is not enought memory, the partition is persisted on disk and despite
>>>> Having
>>>> 229G of free disk space, I got " No space left on device"..
>>>>
>>>> This is how I'm running the program :
>>>>
>>>> ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline
>>>> --master
>>>> local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
>>>> testData.tsv
>>>>
>>>> And this is a part of the log:
>>>>
>>>>
>>>>
>>>> If you need more informations, please let me know.
>>>> Thanks
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>


Re: java.io.IOException: No space left on device

2015-04-29 Thread selim namsi
This is the output of df -h so as you can see I'm using only one disk
mounted on /

df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda8   276G   34G  229G  13% /none4.0K 0
4.0K   0% /sys/fs/cgroup
udev7.8G  4.0K  7.8G   1% /dev
tmpfs   1.6G  1.4M  1.6G   1% /runnone5.0M 0
5.0M   0% /run/locknone7.8G   37M  7.8G   1% /run/shmnone
  100M   40K  100M   1% /run/user
/dev/sda1   496M   55M  442M  11% /boot/efi

Also when running the program, I noticed that the Used% disk space related
to the partition mounted on "/" was growing very fast

On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle 
wrote:

> Do you have multiple disks? Maybe your work directory is not in the right
> disk?
>
> On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi 
> wrote:
>
>> Hi,
>>
>> I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
>> output,the training data is a file containing 156060 (size 8.1M).
>>
>> The problem is that when trying to presist a partition into memory and
>> there
>> is not enought memory, the partition is persisted on disk and despite
>> Having
>> 229G of free disk space, I got " No space left on device"..
>>
>> This is how I'm running the program :
>>
>> ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
>> local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
>> testData.tsv
>>
>> And this is a part of the log:
>>
>>
>>
>> If you need more informations, please let me know.
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


java.io.IOException: No space left on device

2015-04-29 Thread Selim Namsi
Hi,

I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
output,the training data is a file containing 156060 (size 8.1M).

The problem is that when trying to presist a partition into memory and there
is not enought memory, the partition is persisted on disk and despite Having
229G of free disk space, I got " No space left on device"..

This is how I'm running the program : 

./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
testData.tsv

And this is a part of the log:



If you need more informations, please let me know.
Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Multiclass classification using Ml logisticRegression

2015-04-29 Thread selim namsi
Thank you for your Answer!
Yes I would like to work on it.

Selim

On Mon, Apr 27, 2015 at 5:23 AM Joseph Bradley 
wrote:

> Unfortunately, the Pipelines API doesn't have multiclass logistic
> regression yet, only binary.  It's really a matter of modifying the current
> implementation; I just added a JIRA for it:
> https://issues.apache.org/jira/browse/SPARK-7159
>
> You'll need to use the old LogisticRegression API to do multiclass for
> now, until that JIRA gets completed.  (If you're interested in doing it,
> let me know via the JIRA!)
>
> Joseph
>
> On Fri, Apr 24, 2015 at 3:26 AM, Selim Namsi 
> wrote:
>
>> Hi,
>>
>> I just started using spark ML pipeline to implement a multiclass
>> classifier
>> using LogisticRegressionWithLBFGS (which accepts as a parameters number of
>> classes), I followed the Pipeline example in ML- guide and I used
>> LogisticRegression class which calls LogisticRegressionWithLBFGS class :
>>
>> val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)
>>
>> the problem is that LogisticRegression doesn't take numClasses as
>> parameters
>>
>> Any idea how to solve this problem?
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Multiclass-classification-using-Ml-logisticRegression-tp22644.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Multiclass classification using Ml logisticRegression

2015-04-24 Thread Selim Namsi
Hi,

I just started using spark ML pipeline to implement a multiclass classifier
using LogisticRegressionWithLBFGS (which accepts as a parameters number of
classes), I followed the Pipeline example in ML- guide and I used 
LogisticRegression class which calls LogisticRegressionWithLBFGS class :

val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)

the problem is that LogisticRegression doesn't take numClasses as parameters 

Any idea how to solve this problem?

Thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Multiclass-classification-using-Ml-logisticRegression-tp22644.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org