Re: java.io.IOException: No space left on device
Sorry I put the log messages when creating the thread in http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-td22702.html but I forgot that raw messages will not be sent in emails. So this is the log related to the error : 15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_0 not found, computing it 15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_0 locally 15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_1 not found, computing it 15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_1 locally 15/04/29 02:49:13 WARN MemoryStore: Not enough space to cache rdd_19_1 in memory! (computed 1106.0 MB so far) 15/04/29 02:49:13 INFO MemoryStore: Memory use = 234.0 MB (blocks) + 2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage limit = 3.1 GB. 15/04/29 02:49:13 WARN CacheManager: Persisting partition rdd_19_1 to disk instead. 15/04/29 02:49:28 WARN MemoryStore: Not enough space to cache rdd_19_0 in memory! (computed 1745.7 MB so far) 15/04/29 02:49:28 INFO MemoryStore: Memory use = 234.0 MB (blocks) + 2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage limit = 3.1 GB. 15/04/29 02:49:28 WARN CacheManager: Persisting partition rdd_19_0 to disk instead. 15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_0 failed 15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_1 failed 15/04/29 03:56:12 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7) java.io.IOException: No space left on *device *It seems that the partitions rdd_19_0 and rdd_9=19_1 needs both of them 2.9 GB. Thanks On Wed, Apr 29, 2015 at 12:34 PM Dean Wampler wrote: > Makes sense. "/" is where /tmp would be. However, 230G should be plenty of > space. If you have INFO logging turned on (set in > $SPARK_HOME/conf/log4j.properties), you'll see messages about saving data > to disk that will list sizes. The web console also has some summary > information about this. > > dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Wed, Apr 29, 2015 at 6:25 AM, selim namsi > wrote: > >> This is the output of df -h so as you can see I'm using only one disk >> mounted on / >> >> df -h >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda8 276G 34G 229G 13% /none4.0K 0 4.0K 0% >> /sys/fs/cgroup >> udev7.8G 4.0K 7.8G 1% /dev >> tmpfs 1.6G 1.4M 1.6G 1% /runnone5.0M 0 5.0M >> 0% /run/locknone7.8G 37M 7.8G 1% /run/shmnone >> 100M 40K 100M 1% /run/user >> /dev/sda1 496M 55M 442M 11% /boot/efi >> >> Also when running the program, I noticed that the Used% disk space >> related to the partition mounted on "/" was growing very fast >> >> On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle >> wrote: >> >>> Do you have multiple disks? Maybe your work directory is not in the >>> right disk? >>> >>> On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi >>> wrote: >>> >>>> Hi, >>>> >>>> I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf >>>> output,the training data is a file containing 156060 (size 8.1M). >>>> >>>> The problem is that when trying to presist a partition into memory and >>>> there >>>> is not enought memory, the partition is persisted on disk and despite >>>> Having >>>> 229G of free disk space, I got " No space left on device".. >>>> >>>> This is how I'm running the program : >>>> >>>> ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline >>>> --master >>>> local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv >>>> testData.tsv >>>> >>>> And this is a part of the log: >>>> >>>> >>>> >>>> If you need more informations, please let me know. >>>> Thanks >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >
Re: java.io.IOException: No space left on device
This is the output of df -h so as you can see I'm using only one disk mounted on / df -h Filesystem Size Used Avail Use% Mounted on /dev/sda8 276G 34G 229G 13% /none4.0K 0 4.0K 0% /sys/fs/cgroup udev7.8G 4.0K 7.8G 1% /dev tmpfs 1.6G 1.4M 1.6G 1% /runnone5.0M 0 5.0M 0% /run/locknone7.8G 37M 7.8G 1% /run/shmnone 100M 40K 100M 1% /run/user /dev/sda1 496M 55M 442M 11% /boot/efi Also when running the program, I noticed that the Used% disk space related to the partition mounted on "/" was growing very fast On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle wrote: > Do you have multiple disks? Maybe your work directory is not in the right > disk? > > On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi > wrote: > >> Hi, >> >> I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf >> output,the training data is a file containing 156060 (size 8.1M). >> >> The problem is that when trying to presist a partition into memory and >> there >> is not enought memory, the partition is persisted on disk and despite >> Having >> 229G of free disk space, I got " No space left on device".. >> >> This is how I'm running the program : >> >> ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master >> local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv >> testData.tsv >> >> And this is a part of the log: >> >> >> >> If you need more informations, please let me know. >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>
java.io.IOException: No space left on device
Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got " No space left on device".. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Multiclass classification using Ml logisticRegression
Thank you for your Answer! Yes I would like to work on it. Selim On Mon, Apr 27, 2015 at 5:23 AM Joseph Bradley wrote: > Unfortunately, the Pipelines API doesn't have multiclass logistic > regression yet, only binary. It's really a matter of modifying the current > implementation; I just added a JIRA for it: > https://issues.apache.org/jira/browse/SPARK-7159 > > You'll need to use the old LogisticRegression API to do multiclass for > now, until that JIRA gets completed. (If you're interested in doing it, > let me know via the JIRA!) > > Joseph > > On Fri, Apr 24, 2015 at 3:26 AM, Selim Namsi > wrote: > >> Hi, >> >> I just started using spark ML pipeline to implement a multiclass >> classifier >> using LogisticRegressionWithLBFGS (which accepts as a parameters number of >> classes), I followed the Pipeline example in ML- guide and I used >> LogisticRegression class which calls LogisticRegressionWithLBFGS class : >> >> val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01) >> >> the problem is that LogisticRegression doesn't take numClasses as >> parameters >> >> Any idea how to solve this problem? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Multiclass-classification-using-Ml-logisticRegression-tp22644.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>
Multiclass classification using Ml logisticRegression
Hi, I just started using spark ML pipeline to implement a multiclass classifier using LogisticRegressionWithLBFGS (which accepts as a parameters number of classes), I followed the Pipeline example in ML- guide and I used LogisticRegression class which calls LogisticRegressionWithLBFGS class : val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01) the problem is that LogisticRegression doesn't take numClasses as parameters Any idea how to solve this problem? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Multiclass-classification-using-Ml-logisticRegression-tp22644.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org