Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Stephen Haberman
> Looking at https://github.com/apache/spark/pull/1222/files , > the following change may have caused what Stephen described: > > + if (!fileSystem.isDirectory(new Path(logBaseDir))) { > > When there is no schema associated with logBaseDir, local path > should be assumed. Yes, that looks right.

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-30 Thread Stephen Haberman
2:27:17 -0800 Krishna Sankar wrote: > Stephen, >Scala 2.11 worked fine for me. Did the dev change and then > compile. Not using in production, but I go back and forth > between 2.10 & 2.11. Cheers > > > On Wed, Jan 28, 2015 at 12:18 PM, Stephen Haberman < > stephen.

spark-shell working in scala-2.11

2015-01-28 Thread Stephen Haberman
Hey, I recently compiled Spark master against scala-2.11 (by running the dev/change-versions script), but when I run spark-shell, it looks like the "sc" variable is missing. Is this a known/unknown issue? Are others successfully using Spark with scala-2.11, and specifically spark-shell? It is po

Re: recent join/iterator fix

2014-12-29 Thread Stephen Haberman
Hi Shixiong, > The Iterable from cogroup is CompactBuffer, which is already > materialized. It's not a lazy Iterable. So now Spark cannot handle > skewed data that some key has too many values that cannot be fit into > the memory.​ Cool, thanks for the confirmation. - Stephen -

Re: recent join/iterator fix

2014-12-29 Thread Stephen Haberman
> It wasn't so much the cogroup that was optimized here, but what is > done to the result of cogroup. Right. > Yes, it was a matter of not materializing the entire result of a > flatMap-like function after the cogroup, since this will accept just > an Iterator (actually, TraversableOnce). Yeah.

recent join/iterator fix

2014-12-28 Thread Stephen Haberman
Hey, I saw this commit go by, and find it fairly fascinating: https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305 For two reasons: 1) we have a report that is bogging down exactly in a .join with lots of elements, so, glad to see the fix, but, more interesting I think