How to disable input split
Is it possible to disable input split if input is already small?
Re: input split size
Thanks, Andrew. What about reading out of local? On Fri, Oct 17, 2014 at 5:38 PM, Andrew Ash and...@andrewash.com wrote: When reading out of HDFS it's the HDFS block size. On Fri, Oct 17, 2014 at 5:27 PM, Larry Liu larryli...@gmail.com wrote: What is the default input split size? How to change it?
wordcount job slow while input from NFS mount
A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry
wordcount job slow while input from NFS mount
Hi, A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry
Re: wordcount job slow while input from NFS mount
Hi, Matei Thanks for your response. I tried to copy the file (1G) from NFS and took 10 seconds. The NFS mount is a LAN environment and the NFS server is running on the same server that Spark is running on. So basically I mount the NFS on the same bare metal machine. Larry On Wed, Dec 17, 2014 at 11:42 AM, Matei Zaharia matei.zaha...@gmail.com wrote: The problem is very likely NFS, not Spark. What kind of network is it mounted over? You can also test the performance of your NFS by copying a file from it to a local disk or to /dev/null and seeing how many bytes per second it can copy. Matei On Dec 17, 2014, at 9:38 AM, Larryliu larryli...@gmail.com wrote: A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/wordcount-job-slow-while-input-from-NFS-mount-tp20747.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: wordcount job slow while input from NFS mount
Thanks, Matei. I will give it a try. Larry On Wed, Dec 17, 2014 at 1:01 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I see, you may have something else configured weirdly then. You should look at CPU and disk utilization while your Spark job is reading from NFS and, if you see high CPU use, run jstack to see where the process is spending time. Also make sure Spark's local work directories (spark.local.dir) are not on NFS. They shouldn't be though, that should be /tmp. Matei On Dec 17, 2014, at 11:56 AM, Larry Liu larryli...@gmail.com wrote: Hi, Matei Thanks for your response. I tried to copy the file (1G) from NFS and took 10 seconds. The NFS mount is a LAN environment and the NFS server is running on the same server that Spark is running on. So basically I mount the NFS on the same bare metal machine. Larry On Wed, Dec 17, 2014 at 11:42 AM, Matei Zaharia matei.zaha...@gmail.com wrote: The problem is very likely NFS, not Spark. What kind of network is it mounted over? You can also test the performance of your NFS by copying a file from it to a local disk or to /dev/null and seeing how many bytes per second it can copy. Matei On Dec 17, 2014, at 9:38 AM, Larryliu larryli...@gmail.com wrote: A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/wordcount-job-slow-while-input-from-NFS-mount-tp20747.html Sent from the Apache Spark User List mailing list archive at Nabble.com . - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to use more executors
Will SPARK-1706 be included in next release? On Wed, Jan 21, 2015 at 2:50 PM, Ted Yu yuzhih...@gmail.com wrote: Please see SPARK-1706 On Wed, Jan 21, 2015 at 2:43 PM, Larry Liu larryli...@gmail.com wrote: I tried to submit a job with --conf spark.cores.max=6 or --total-executor-cores 6 on a standalone cluster. But I don't see more than 1 executor on each worker. I am wondering how to use multiple executors when submitting jobs. Thanks larry
How to use more executors
I tried to submit a job with --conf spark.cores.max=6 or --total-executor-cores 6 on a standalone cluster. But I don't see more than 1 executor on each worker. I am wondering how to use multiple executors when submitting jobs. Thanks larry
where storagelevel DISK_ONLY persists RDD to
I would like to persist RDD TO HDFS or NFS mount. How to change the location?
Re: where storagelevel DISK_ONLY persists RDD to
Hi, Charles Thanks for your reply. Is it possible to persist RDD to HDFS? What is the default location to persist RDD with storagelevel DISK_ONLY? On Sun, Jan 25, 2015 at 6:26 AM, Charles Feduke charles.fed...@gmail.com wrote: I think you want to instead use `.saveAsSequenceFile` to save an RDD to someplace like HDFS or NFS it you are attempting to interoperate with another system, such as Hadoop. `.persist` is for keeping the contents of an RDD around so future uses of that particular RDD don't need to recalculate its composite parts. On Sun Jan 25 2015 at 3:36:31 AM Larry Liu larryli...@gmail.com wrote: I would like to persist RDD TO HDFS or NFS mount. How to change the location?
Re: Shuffle to HDFS
Hi,Jerry Thanks for your reply. The reason I have this question is that in Hadoop, mapper intermediate output (shuffle) will be stored in HDFS. I think the default location for spark is /tmp I think. Larry On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai saisai.s...@intel.com wrote: Hi Larry, I don’t think current Spark’s shuffle can support HDFS as a shuffle output. Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this will severely increase the shuffle time. Thanks Jerry *From:* Larry Liu [mailto:larryli...@gmail.com] *Sent:* Sunday, January 25, 2015 4:45 PM *To:* u...@spark.incubator.apache.org *Subject:* Shuffle to HDFS How to change shuffle output to HDFS or NFS?