Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Ted Yu
HADOOP-10456 is fixed in hadoop 2.4.1 Does this mean that synchronization on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop 2.4.1 ? Cheers On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell wrote: > The most important issue in this release is actually an ammendment to > a

Re: GraphX graph partitioning strategy

2014-07-25 Thread Larry Xiao
On 7/26/14, 4:03 AM, Ankur Dave wrote: Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges

Re: Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
Maybe this is me misunderstanding the Spark system property behavior, but I'm not clear why the class being loaded ends up having '/' rather than '.' in it's fully qualified name. When I tested this out locally, the '/' were preventing the class from being loaded. On Fri, Jul 25, 2014 at 2:27 PM

Re: Suggestion for SPARK-1825

2014-07-25 Thread Patrick Wendell
Yeah I agree reflection is the best solution. Whenever we do reflection we should clearly document in the code which YARN API version corresponds to which code path. I'm guessing since YARN is adding new features... we'll just have to do this over time. - Patrick On Fri, Jul 25, 2014 at 3:35 PM,

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Nicholas Chammas
OK, thanks for the clarification. 2014년 7월 25일 금요일, Michael Armbrust님이 작성한 메시지: > That query is looking at "Fix Version" not "Target Version". The fact that > the first one is still open is only because the bug is not resolved in > master. It is fixed in 1.0.2. The second one is partially fixe

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Patrick Wendell
The most important issue in this release is actually an ammendment to an earlier fix. The original fix caused a deadlock which was a regression from 1.0.0->1.0.1: Issue: https://issues.apache.org/jira/browse/SPARK-1097 1.0.1 Fix: https://github.com/apache/spark/pull/1273/files (had a deadlock) 1

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Michael Armbrust
That query is looking at "Fix Version" not "Target Version". The fact that the first one is still open is only because the bug is not resolved in master. It is fixed in 1.0.2. The second one is partially fixed in 1.0.2, but is not worth blocking the release for. On Fri, Jul 25, 2014 at 4:23 PM

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Nicholas Chammas
TD, there are a couple of unresolved issues slated for 1.0.2 . Should they be edited somehow? On Fri, Jul 25, 2014 at 7:08 PM, Ta

[VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Tathagata Das
Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDef

Re: Suggestion for SPARK-1825

2014-07-25 Thread Reynold Xin
Actually reflection is probably a better, lighter weight process for this. An extra project brings more overhead for something simple. On Fri, Jul 25, 2014 at 3:09 PM, Colin McCabe wrote: > So, I'm leaning more towards using reflection for this. Maven profiles > could work, but it's tough s

Re: Suggestion for SPARK-1825

2014-07-25 Thread Colin McCabe
So, I'm leaning more towards using reflection for this. Maven profiles could work, but it's tough since we have new stuff coming in in 2.4, 2.5, etc. and the number of profiles will multiply quickly if we have to do it that way. Reflection is the approach HBase took in a similar situation. best

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave
Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges = unpartitionedGraph.triplets .map(e =

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave
Hi Larry, GraphX's graph constructor leaves the edges in their original partitions by default. To support arbitrary multipass graph partitioning, one idea is to take advantage of that by partitioning the graph externally to GraphX (though possibly using information from GraphX such as the degrees)

Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
After upgrading to Spark 1.0.1 from 0.9.1 everything seemed to be going well. Looking at the Mesos slave logs, I noticed: ERROR KryoSerializer: Failed to run spark.kryo.registrator java.lang.ClassNotFoundException: com/mediacrossing/verrazano/kryo/MxDataRegistrator My spark-env.sh has the follow

Re: Suggestion for SPARK-1825

2014-07-25 Thread Colin McCabe
I have a similar issue with SPARK-1767. There are basically three ways to resolve the issue: 1. Use reflection to access classes newer than 0.21 (or whatever the oldest version of Hadoop is that Spark supports) 2. Add a build variant (in Maven this would be a profile) that deals with this. 3. Aut

Re: Configuring Spark Memory

2014-07-25 Thread John Omernik
SO this is good information for standalone, but how is memory distributed within Mesos? There's coarse grain mode where the execute stays active, or theres fine grained mode where it appears each task is it's only process in mesos, how to memory allocations work in these cases? Thanks! On Thu,