. Specifically, entries
>>> in this feed are test failures which a) occurred in the last week, b) were
>>> not part of a build which had 20 or more failed tests, and c) were not
>>> observed to fail in during the previous week (i.e. no failures from [2
>>> w
(1) I'm pretty hesitant to merge these larger changes, even if they're
feature flagged, because:
(a) For some of these changes, it's not obvious that they'll always
improve performance. e.g., for SPARK-14649, it's possible that the tasks
that got re-started (and temporarily are running in two
Hi all,
I've noticed the Spark tests getting increasingly flaky -- it seems more
common than not now that the tests need to be re-run at least once on PRs
before they pass. This is both annoying and problematic because it makes
it harder to tell when a PR is introducing new flakiness.
To try to
08 builds
> >> 16 builds.gc <--- failures
> >>
> >> it's also happening across all workers at about the same rate.
> >>
> >> and best of all, there seems to be no pattern to which tests are
> >> failing (different each time). i'll look a l
> > what to do next.
> >
> > On Tue, Jan 3, 2017 at 6:49 PM, shane knapp <skn...@berkeley.edu> wrote:
> >> nope, no changes to jenkins in the past few months. ganglia graphs
> >> show higher, but not worrying, memory usage on the workers when the
> >
I believe that these two were indeed originally related. In the old
hash-based shuffle, we wrote objects out immediately to disk as they were
generated by an RDD's iterator. On the other hand, with the original
version of the new sort-based shuffle, Spark buffered a bunch of objects
before
Hi all,
I noticed that when the JVM for an executor fails, in Standalone mode, we
have two duplicate code paths that handle the failure, one via Akka, and
the second via the Worker/ExecutorRunner:
via Akka:
(1) CoarseGrainedSchedulerBackend is notified that the remote Akka endpoint
is
Here’s how the shuffle works. This explains what happens for a single
task; this will happen in parallel for each task running on the machine,
and as Imran said, Spark runs up to “numCores” tasks concurrently on each
machine. There's also an answer to the original question about why CPU use
is
. This should be preserved for reference somewhere
searchable.
-Gerard.
On Fri, Jun 12, 2015 at 1:19 AM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:
Here’s how the shuffle works. This explains what happens for a single
task; this will happen in parallel for each task running on the machine
Hi Alexander,
The stack trace is a little misleading here: all of the time is spent in
MemoryStore, but that's because MemoryStore is unrolling an iterator (note
the iterator.next()) call so that it can be stored in-memory. Essentially
all of the computation for the tasks happens as part of that
Did you see the longer descriptions under the Learn More link?
Developer
This track will present technical deep dive content across a wide range of
advanced/basic topics.
Data Science
This track will focus on the practice of data science using Spark. Sessions
should cover innovative techniques,
There's a JIRA tracking this here:
https://issues.apache.org/jira/browse/SPARK-2387
On Mon, Feb 2, 2015 at 9:48 PM, Xuelin Cao xuelincao2...@gmail.com wrote:
In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*
which can be used to start reducer stage when X% mappers are
+1 to Patrick's proposal of strong LGTM semantics. On past projects, I've
heard the semantics of LGTM expressed as I've looked at this thoroughly
and take as much ownership as if I wrote the patch myself. My
understanding is that this is the level of review we expect for all patches
that
Hi all,
I've noticed a bunch of times lately where a pull request changes to be
pretty different from the original pull request, and the title /
description never get updated. Because the pull request title and
description are used as the commit message, the incorrect description lives
on
Hi,
Shivaram and I stumbled across this problem a few weeks ago, and AFAIK
there is no nice solution. We worked around it by avoiding jobs with tasks
that have tasks with two locality levels.
To fix this problem, we really need to fix the underlying problem in the
scheduling code, which
Hi Mridul,
In the case Shivaram and I saw, and based on my understanding of Ma chong's
description, I don't think that completely fixes the problem.
To be very concrete, suppose your job has two tasks, t1 and t2, and they
each have input data (in HDFS) on h1 and h2, respectively, and that h1 and
+1 (binding)
I see this as a way to increase transparency and efficiency around a
process that already informally exists, with benefits to both new
contributors and committers. For new contributors, it makes clear who they
should ping about a pending patch. For committers, it's a good reference
Hi Nick,
This hasn't yet been directly supported by Spark because of a lack of
demand. The last time I ran a throughput test on the default Spark
scheduler (~1 year ago, so this may have changed), it could launch
approximately 1500 tasks / second. If, for example, you have a cluster of
100
On Fri, Nov 7, 2014 at 6:20 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
If, for example, you have a cluster of 100 machines, this means the
scheduler can launch 150 tasks per machine per second.
Did you mean 15 tasks per machine per second here? Or alternatively, 10
machines?
I don't have much more info than what Shivaram said. My sense is that,
over time, task launch overhead with Spark has slowly grown as Spark
supports more and more functionality. However, I haven't seen it be as
high as the 100ms Michael quoted (maybe this was for jobs with tasks that
have much
Hi Nick,
No -- we're doing a much more constrained thing of just trying to get
things set up to easily run TPC-DS on SparkSQL (which involves generating
the data, storing it in HDFS, getting all the queries in the right format,
etc.).
Cloudera does have a repo here:
There's been an effort in the AMPLab at Berkeley to set up a shared
codebase that makes it easy to run TPC-DS on SparkSQL, since it's something
we do frequently in the lab to evaluate new research. Based on this
thread, it sounds like making this more widely-available is something that
would be
Are you guys sure this is a bug? In the task scheduler, we keep two
identifiers for each task: the index, which uniquely identifiers the
computation+partition, and the taskId which is unique across all tasks
for that Spark context (See
).
On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:
Are you guys sure this is a bug? In the task scheduler, we keep two
identifiers for each task: the index, which uniquely identifiers the
computation+partition, and the taskId which is unique across all tasks
Reynold you're totally right, as discussed offline -- I didn't think about
the limit use case when I wrote this. Sandy, is it easy to fix this as
part of your patch to use StatisticsData? If not, I can fix it in a
separate patch.
On Sat, Jul 26, 2014 at 12:12 PM, Reynold Xin
Hi Karthik,
The resourceOffer() method is invoked from a class implementing the
SchedulerBackend interface; in the case of a standalone cluster, it's
invoked from a CoarseGrainedSchedulerBackend (in the makeOffers() method).
If you look in TaskSchedulerImpl.submitTasks(), it calls
Hi all,
I've been doing a bunch of performance measurement of Spark and, as part of
doing this, added metrics that record the average CPU utilization, disk
throughput and utilization for each block device, and network throughput
while each task is running. These metrics are collected by reading
Git history to the rescue! It seems to have been added by Matei way back
in July 2012:
https://github.com/apache/spark/commit/5d1a887bed8423bd6c25660910d18d91880e01fe
and then was removed a few months later (replaced by RUNNING) by the same
Mr. Zaharia:
Hi all,
I had some trouble compiling an application (Shark) against Spark 1.0,
where Shark had a runtime exception (at the bottom of this message) because
it couldn't find the javax.servlet classes. SBT seemed to have trouble
downloading the servlet APIs that are dependencies of Jetty (used by
Hi all,
The InputStreamsSuite seems to have some serious flakiness issues -- I've
seen the file input stream fail many times and now I'm seeing some actor
input stream test failures (
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13846/consoleFull)
on what I think is an
I don't think the blacklisting is a priority and the CPUS_PER_TASK issue
was still broken after this patch (so broken that I'm convinced no one
actually uses this feature!!), so agree with TD's sentiment that this
shouldn't go into 0.9.1.
On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das
31 matches
Mail list logo