Actually I met similar issue when doing groupByKey and then count if the
shuffle size is big e.g. 1tb.
Thanks.
Zhan Zhang
Sent from my iPhone
On Sep 21, 2014, at 10:56 PM, Nishkam Ravi nr...@cloudera.com wrote:
Thanks for the quick follow up Reynold and Patrick. Tried a run with
Hey all. We had also the same problem described by Nishkam almost in the
same big data setting. We fixed the fetch failure by increasing the timeout
for acks in the driver:
set(spark.core.connection.ack.wait.timeout, 600) // 10 minutes timeout
for acks between nodes
Cheers, Christoph
2014-09-22
I've run into this with large shuffles - I assumed that there was
contention between the shuffle output files and the JVM for memory.
Whenever we start getting these fetch failures, it corresponds with high
load on the machines the blocks are being fetched from, and in some cases
complete
FWD to dev mail list for helps
From: Haopu Wang
Sent: 2014年9月22日 16:35
To: u...@spark.apache.org
Subject: Spark SQL 1.1.0: NPE when join two cached table
I have two data sets and want to join them on each first field. Sample data are
below:
data set
If you think it as necessary to fix, I would like to resubmit that PR (seems to
have some conflicts with the current DAGScheduler)
My suggestion is to make it as an option in accumulator, e.g. some algorithms
utilizing accumulator for result calculation, it needs a deterministic
accumulator,
MapReduce counters do not count duplications. In MapReduce, if a task
needs to be re-run, the value of the counter from the second task
overwrites the value from the first task.
-Sandy
On Mon, Sep 22, 2014 at 4:55 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
If you think it as necessary to fix,
Unfortunately we were somewhat rushed to get things working again and did
not keep the exact stacktraces, but one of the issues we saw was similar to
that reported in
https://issues.apache.org/jira/browse/SPARK-3032
We also saw FAILED_TO_UNCOMPRESS errors from snappy when reading the
shuffle
Hi Marcelo,
Interested to hear the approach to be taken. Shading guava itself seems
extreme, but that might make sense.
Gary
On Sat, Sep 20, 2014 at 9:38 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hmm, looks like the hack to maintain backwards compatibility in the
Java API didn't work
Hi:
I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
variable for testing. But in the SparkConf.scala, this is deprecated in Spark
1.0+.
So what this variable for? should we just remove this variable?
--
Ye Xianjin
Sent with Sparrow
I see, thanks for pointing this out
--
Nan Zhu
On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote:
MapReduce counters do not count duplications. In MapReduce, if a task needs
to be re-run, the value of the counter from the second task overwrites the
value from the first
Hi Cody,
I'm still writing a test to make sure I understood exactly what's
going on here, but from looking at the stack trace, it seems like the
newer Guava library is picking up the Optional class from the Spark
assembly.
Could you try one of the options that put the user's classpath before
the
Hi Cody,
There are currently no concrete plans for adding buckets to Spark SQL, but
thats mostly due to lack of resources / demand for this feature. Adding
full support is probably a fair amount of work since you'd have to make
changes throughout parsing/optimization/execution. That said, there
We're using Mesos, is there a reasonable expectation that
spark.files.userClassPathFirst will actually work?
On Mon, Sep 22, 2014 at 1:42 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Cody,
I'm still writing a test to make sure I understood exactly what's
going on here, but from looking
Hmmm, a quick look at the code indicates this should work for
executors, but not for the driver... (maybe this deserves a bug being
filed, if there isn't one already?)
If it's feasible for you, you could remove the Optional.class file
from the Spark assembly you're using.
On Mon, Sep 22, 2014 at
We've worked around it for the meantime by excluding guava from transitive
dependencies in the job assembly and specifying the same version of guava
14 that spark is using. Obviously things break whenever a guava 15 / 16
feature is used at runtime, so a long term solution is needed.
On Mon, Sep
FYI I filed SPARK-3647 to track the fix (some people internally have
bumped into this also).
On Mon, Sep 22, 2014 at 1:28 PM, Cody Koeninger c...@koeninger.org wrote:
We've worked around it for the meantime by excluding guava from transitive
dependencies in the job assembly and specifying the
After commit 8856c3d8 switched from gzip to snappy as default parquet
compression codec, I'm seeing the following when trying to read parquet
files saved using the new default (same schema and roughly same size as
files that were previously working):
java.lang.OutOfMemoryError: Direct buffer
Hey Cody,
In terms of Spark 1.1.1 - we wouldn't change a default value in a spot
release. Changing this to default is slotted for 1.2.0:
https://issues.apache.org/jira/browse/SPARK-3280
- Patrick
On Mon, Sep 22, 2014 at 9:08 AM, Cody Koeninger c...@koeninger.org wrote:
Unfortunately we were
18 matches
Mail list logo