Dear Spark developers,
What happens if RDD does not fit into memory and cache would not work in the
code below? Will all previous iterations repeated each new iteration within
iterative RDD update (as described below)?
Also, could you clarify regarding DataFrame and GC overhead: does setting
Given the following code which just reads from s3, then saves files to s3
val inputFileName: String = s3n://input/file/path
val outputFileName: String = s3n://output/file/path
val conf = new
SparkConf().setAppName(this.getClass.getName).setMaster(local[4])
val
Patching hadoop's build will fix this long term, but not until Hadoop-2.7.2
I think just adding the openstack JAR to the spark classpath should be enough
to pick this up, which the --jars command can do with ease
On that topic, one thing I would like to see (knowing what it takes to get
azure
To move this forward, I think one of two things needs to happen:
1. Move this guidance to the wiki. Seems that people gathered here
believe that resolves the issue. Done.
2. Put disclaimers on the current downloads page. This may resolve the
issue, but then we bring it up on the right mailing
You can possibly raise a JIRA ticket for feature and start working on it,
once done you can send a pull request with the code changes.
Thanks
Best Regards
On Wed, Jul 15, 2015 at 7:30 PM, Joel Zambrano jo...@microsoft.com wrote:
Thanks Akhil! For the one where I change the rest client, how
Hi, some time ago we’ve found that it’s better use Kryo serializer instead of
Java one.
So, we turned it on and use it everywhere.
I have pretty complex objects, which I can’t change. Previously my algo was
building such an objects and then storing them into external storage. It was
not
Can you provide a bit more information such as:
release of Spark you use
snippet of your SparkSQL query
Thanks
On Thu, Jul 16, 2015 at 5:31 AM, nipun ibnipu...@gmail.com wrote:
I have a dataframe. I register it as a temp table and run a spark sql query
on it to get another dataframe. Now
I have tested on another pc which has 8 CPU cores.
But it hangs when defaultParallelismLevel 4, e.g.
sparkConf.setMaster(local[*])
local[1] ~ local[3] work well.
4 is the mysterious boundary.
It seems that I am not the only one encountered this problem:
Hi Burak,
If I change the code as you suggested then it fails with (given that blockSize
is 1):
“org.apache.spark.SparkException: The MatrixBlock at (3, 3) has dimensions
different than rowsPerBlock: 2, and colsPerBlock: 2. Blocks on the
right and bottom edges can have smaller