> Looking at https://github.com/apache/spark/pull/1222/files ,
> the following change may have caused what Stephen described:
>
> + if (!fileSystem.isDirectory(new Path(logBaseDir))) {
>
> When there is no schema associated with logBaseDir, local path
> should be assumed.
Yes, that looks right.
2:27:17 -0800
Krishna Sankar wrote:
> Stephen,
>Scala 2.11 worked fine for me. Did the dev change and then
> compile. Not using in production, but I go back and forth
> between 2.10 & 2.11. Cheers
>
>
> On Wed, Jan 28, 2015 at 12:18 PM, Stephen Haberman <
> stephen.
Hey,
I recently compiled Spark master against scala-2.11 (by running
the dev/change-versions script), but when I run spark-shell,
it looks like the "sc" variable is missing.
Is this a known/unknown issue? Are others successfully using
Spark with scala-2.11, and specifically spark-shell?
It is po
Hi Shixiong,
> The Iterable from cogroup is CompactBuffer, which is already
> materialized. It's not a lazy Iterable. So now Spark cannot handle
> skewed data that some key has too many values that cannot be fit into
> the memory.
Cool, thanks for the confirmation.
- Stephen
-
> It wasn't so much the cogroup that was optimized here, but what is
> done to the result of cogroup.
Right.
> Yes, it was a matter of not materializing the entire result of a
> flatMap-like function after the cogroup, since this will accept just
> an Iterator (actually, TraversableOnce).
Yeah.
Hey,
I saw this commit go by, and find it fairly fascinating:
https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305
For two reasons: 1) we have a report that is bogging down exactly in
a .join with lots of elements, so, glad to see the fix, but, more
interesting I think