Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
the default compression codec (LZF) is that the implementation we use
allocates buffers pretty generously. I did a simple experiment and found
that creating 1000 LZFOutputStream allocated 198976424 bytes
1. The first error I met is the different SerializationVersionUID in
ExecuterStatus
I resolved by explicitly declare SerializationVersionUID in
ExecuterStatus.scala and recompile branch-0.1-jdbc
I don't think there is a class in Spark named ExecuterStatus (sic) ...
or ExecutorStatus. Is
Ah, sorry, sorry
It's executorState under deploy package
On Monday, July 14, 2014, Patrick Wendell pwend...@gmail.com wrote:
1. The first error I met is the different SerializationVersionUID in
ExecuterStatus
I resolved by explicitly declare SerializationVersionUID in
We tried with lower block size for lzf, but it barfed all over the place.
Snappy was the way to go for our jobs.
Regards,
Mridul
On Mon, Jul 14, 2014 at 12:31 PM, Reynold Xin r...@databricks.com wrote:
Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
Hi all,
I've been evaluating YourKit and would like to profile the heap and CPU usage
of certain tests from the Spark test suite. In particular, I'm very interested
in tracking heap usage by allocation site. Unfortunately, I get a lot of
crashes running Spark tests with profiling (and thus
I haven't seen issues using the JVM's own tools (jstack, jmap, hprof and such),
so maybe there's a problem in YourKit or in your release of the JVM. Otherwise
I'd suggest increasing the heap size of the unit tests a bit (you can do this
in the SBT build file). Maybe they are very close to full
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus easy to remove, and I would like catalyst to be usable outside of
Spark. A pull request to make this possible would be welcome.
Ideally, we'd
Hi all, just wanted to give a heads up that we're seeing a reproducible
deadlock with spark 1.0.1 with 2.3.0-mr1-cdh5.0.2
If jira is a better place for this, apologies in advance - figured talking
about it on the mailing list was friendlier than randomly (re)opening jira
tickets.
I know Gary had
Yeah, I'd just add a spark-util that has these things.
Matei
On Jul 14, 2014, at 1:04 PM, Michael Armbrust mich...@databricks.com wrote:
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus
Hey Cody,
This Jstack seems truncated, would you mind giving the entire stack
trace? For the second thread, for instance, we can't see where the
lock is being acquired.
- Patrick
On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger
cody.koenin...@mediacrossing.com wrote:
Hi all, just wanted to give
Thanks, Matei; I have also had some success with jmap and friends and will
probably just stick with them!
best,
wb
- Original Message -
From: Matei Zaharia matei.zaha...@gmail.com
To: dev@spark.apache.org
Sent: Monday, July 14, 2014 1:02:04 PM
Subject: Re: Profiling Spark tests
The full jstack would still be useful, but our current working theory is
that this is due to the fact that Configuration#loadDefaults goes through
every Configuration object that was ever created (via
Configuration.REGISTRY) and locks it, thus introducing a dependency from
new Configuration to
Hi Aaron, I'm not sure if synchronizing on an arbitrary lock object would
help. I suspect we will start seeing the ConcurrentModificationException
again. The right fix has gone into Hadoop through 10456. Unfortunately, I
don't have any bright ideas on how to synchronize this at the Spark level
Just a comment from the peanut gallery, but these buffers are a real
PITA for us as well. Probably 75% of our non-user-error job failures
are related to them.
Just naively, what about not doing compression on the fly? E.g. during
the shuffle just write straight to disk, uncompressed?
For us, we
Stephen,
Often the shuffle is bound by writes to disk, so even if disks have enough
space to store the uncompressed data, the shuffle can complete faster by
writing less data.
Reynold,
This isn't a big help in the short term, but if we switch to a sort-based
shuffle, we'll only need a single
You can actually turn off shuffle compression by setting spark.shuffle.compress
to false. Try that out, there will still be some buffers for the various
OutputStreams, but they should be smaller.
Matei
On Jul 14, 2014, at 3:30 PM, Stephen Haberman stephen.haber...@gmail.com
wrote:
Just a
I just wanted to send out a quick note about a change in the handling of
strings when loading / storing data using parquet and Spark SQL. Before,
Spark SQL did not support binary data in Parquet, so all binary blobs were
implicitly treated as Strings. 9fe693
Hey Nishkam,
Aaron's fix should prevent two concurrent accesses to getJobConf (and
the Hadoop code therein). But if there is code elsewhere that tries to
mutate the configuration, then I could see how we might still have the
ConcurrentModificationException.
I looked at your patch for
We use the Hadoop configuration inside of our code executing on Spark as we
need to list out files in the path. Maybe that is why it is exposed for us.
On Mon, Jul 14, 2014 at 6:57 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey Nishkam,
Aaron's fix should prevent two concurrent accesses
Copying Jon here since he worked on the lzf library at Ning.
Jon - any comments on this topic?
On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
You can actually turn off shuffle compression by setting
spark.shuffle.compress to false. Try that out, there will
We'll try to run a build tomorrow AM.
On Mon, Jul 14, 2014 at 7:22 PM, Patrick Wendell pwend...@gmail.com wrote:
Andrew and Gary,
Would you guys be able to test
https://github.com/apache/spark/pull/1409/files and see if it solves
your problem?
- Patrick
On Mon, Jul 14, 2014 at 4:18 PM,
The patch won't solve the problem where two people try to add a
configuration option at the same time, but I think there is currently an
issue where two people can try to initialize the Configuration at the same
time and still run into a ConcurrentModificationException. This at least
reduces
Maybe we could try LZ4 [1], which has better performance and smaller footprint
than LZF and Snappy. In fast scan mode, the performance is 1.5 - 2x
higher than LZF[2],
but memory used is 10x smaller than LZF (16k vs 190k).
[1] https://github.com/jpountz/lz4-java
[2]
I've a clean clone of spark master repository, and I generated the
intellij project file by sbt gen-idea as usual. There are two issues
we have after merging SPARK-1776 (read dependencies from Maven).
1) After SPARK-1776, sbt gen-idea will download the dependencies from
internet even those jars
Is the held memory due to just instantiating the LZFOutputStream? If so,
I'm a surprised and I consider that a bug.
I suspect the held memory may be due to a SoftReference - memory will be
released with enough memory pressure.
Finally, is it necessary to keep 1000 (or more) decoders active?
One of the core problems here is the number of open streams we have, which
is (# cores * # reduce partitions), which can easily climb into the tens of
thousands for large jobs. This is a more general problem that we are
planning on fixing for our largest shuffles, as even moderate buffer sizes
can
- Original Message -
From: Aaron Davidson ilike...@gmail.com
To: dev@spark.apache.org
Sent: Monday, July 14, 2014 5:21:10 PM
Subject: Re: Profiling Spark tests with YourKit (or something else)
Out of curiosity, what problems are you seeing with Utils.getCallSite?
Aaron, if I enable
Would you mind filing a JIRA for this? That does sound like something bogus
happening on the JVM/YourKit level, but this sort of diagnosis is
sufficiently important that we should be resilient against it.
On Mon, Jul 14, 2014 at 6:01 PM, Will Benton wi...@redhat.com wrote:
- Original
Just launched an EC2 cluster from git hash
9fe693b5b6ed6af34ee1e800ab89c8a11991ea38. Calling take() on an RDD
accessing data in S3 yields the following error output.
I understand that NoClassDefFoundError errors may mean something in the
deployment was messed up. Is that correct? When I launch a
hi,Cody
i met this issue days before and i post a PR for this(
https://github.com/apache/spark/pull/1385)
it's very strange that if i synchronize conf it will deadlock but it is ok when
synchronize initLocalJobConfFuncOpt
Here's the entire jstack output.
On Mon, Jul 14, 2014 at 4:44 PM,
I resolved the issue by setting an internal maven repository to contain the
Spark-1.0.1 jar compiled from branch-0.1-jdbc and replacing the dependency to
the central repository with our own repository
I believe there should be some more lightweight way
Best,
--
Nan Zhu
On Monday, July
Sure thing:
https://issues.apache.org/jira/browse/SPARK-2486
https://github.com/apache/spark/pull/1413
best,
wb
- Original Message -
From: Aaron Davidson ilike...@gmail.com
To: dev@spark.apache.org
Sent: Monday, July 14, 2014 8:38:16 PM
Subject: Re: Profiling Spark tests with
I'm not sure either of those PRs will fix the concurrent adds to
Configuration issue I observed. I've got a stack trace and writeup I'll
share in an hour or two (traveling today).
On Jul 14, 2014 9:50 PM, scwf wangf...@huawei.com wrote:
hi,Cody
i met this issue days before and i post a PR for
Andrew is your issue also a regression from 1.0.0 to 1.0.1? The
immediate priority is addressing regressions between these two
releases.
On Mon, Jul 14, 2014 at 9:05 PM, Andrew Ash and...@andrewash.com wrote:
I'm not sure either of those PRs will fix the concurrent adds to
Configuration issue I
Adding new build modules is pretty high overhead, so if this is a case
where a small amount of duplicated code could get rid of the
dependency, that could also be a good short-term option.
- Patrick
On Mon, Jul 14, 2014 at 2:15 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Yeah, I'd just add
I don't believe mine is a regression. But it is related to thread safety on
Hadoop Configuration objects. Should I start a new thread?
On Jul 15, 2014 12:55 AM, Patrick Wendell pwend...@gmail.com wrote:
Andrew is your issue also a regression from 1.0.0 to 1.0.1? The
immediate priority is
This one is typically due to a mismatch between the Hadoop versions --
i.e., Spark is compiled against 1.0.4 but is running with 2.3.0 in the
classpath, or something like that. Not certain why you're seeing this with
spark-ec2, but I'm assuming this is related to the issues you posted in a
My guess is that this is related to
https://issues.apache.org/jira/browse/SPARK-2471 where the S3 library gets
excluded from the SBT assembly jar. I am not sure if the assembly jar used
in EC2 is generated using SBT though.
Shivaram
On Mon, Jul 14, 2014 at 10:02 PM, Aaron Davidson
Yeah - this is likely caused by SPARK-2471.
On Mon, Jul 14, 2014 at 10:11 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
My guess is that this is related to
https://issues.apache.org/jira/browse/SPARK-2471 where the S3 library gets
excluded from the SBT assembly jar. I am not sure
Hey Andrew,
Yeah, that would be preferable. Definitely worth investigating both,
but the regression is more pressing at the moment.
- Patrick
On Mon, Jul 14, 2014 at 10:02 PM, Andrew Ash and...@andrewash.com wrote:
I don't believe mine is a regression. But it is related to thread safety on
Okie doke--added myself as a watcher on that issue.
On a related note, what are the thoughts on automatically spinning up/down
EC2 clusters and running tests against them? It would probably be way too
cumbersome to do that for every build, but perhaps on some schedule it
could help validate that
41 matches
Mail list logo