from:"pwendell"

[GitHub] spark issue #14342: [SPARK-16685] Remove audit-release scripts.

2016-07-25 Thread pwendell

Github user pwendell commented on the issue:

https://github.com/apache/spark/pull/14342
  
LGTM - I added these and I think they are dead code right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...

2015-12-25 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/10461#discussion_r48445168
  
--- Diff: dev/deps/spark-deps-hadoop-2.4 ---
@@ -0,0 +1,185 @@
+JavaEWAH-0.3.2.jar
--- End diff --

yes these are automatically generated, so it's not a huge maintenance cost. 
I think having them int he repo is good so people can have a definitive 
reference for what dependencies exist in which package of spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11808] Remove Bagel.

2015-12-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/10395#issuecomment-166069156
  
LGTM (I downloaded your PR and did some grepping to make sure there are no 
references). One other thing that occured to me is someone could easily create 
a package with this if they want to continue using in in Spark 2.0+, or just 
copy-paste the source code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update branch-1.6 for 1.6.0 release

2015-12-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/10317#issuecomment-164923664
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix thread pools that cannot cache tasks in Wo...

2015-12-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/10108#issuecomment-161418651
  
Hey @marmbrus is actually managing the RC - it just has my name on it 
because some automated tooling uses my account. Ping @marmbrus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12101][Core]Fix thread pools that canno...

2015-12-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/10108#issuecomment-161419476
  
BTW @srowen, some protocol for announcing these is probably a good idea to 
avoid races. I think we haven't suffered from races in the past, but mostly out 
of luck.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3580][CORE] Add Consistent Method To Ge...

2015-12-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9767#issuecomment-161203603
  
Yeah I think it's fine to pull in - but do it quickly because an RC will go 
out very soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11903] Remove --skip-java-test

2015-11-23 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9924#issuecomment-159118994
  
Jenkins, retest this please. This LGTM - I think it's good to simply remove 
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2015-11-22 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1297#issuecomment-158851260
  
@josephlijia this feature has moved into a Spark package. If you want to 
file an issue report it's best to do it here:

https://github.com/amplab/spark-indexedrdd


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11732] Removes some MiMa false positive...

2015-11-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9697#issuecomment-157469133
  
Yep, this LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11081] Shade Jersey and javax.rs.ws

2015-11-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9615#issuecomment-155962272
  
Regarding testing, the best way is to inspect the contents of the jar to 
make sure that the shaded version is inlined. If Spark code uses the shaded 
dependency directly, you can also use `javap` to inspect the byte code and make 
sure that the references are to the shaded versions of jersey rather than the 
real one.

@mccheah given the comments by @vanzin, can you say more about the specific 
incompatibility you are facing? It would be good to make sure that if we shade 
something it's due to a known incompatibility between some set of versions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11081] Shade Jersey and javax.rs.ws

2015-11-10 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9615#discussion_r44503873
  
--- Diff: pom.xml ---
@@ -2165,6 +2166,9 @@
   org.eclipse.jetty:jetty-security
   org.eclipse.jetty:jetty-util
   org.eclipse.jetty:jetty-server
+  com.sun.jersey:jersey-core
+  com.sun.jersey:jersey-json
+  com.sun.jersey:jersey-server
--- End diff --

Hey @mccheah - if you look at jetty-server as an example, there are other 
build changes related to shading that you haven't done here:

1. In the root pom, they should be marked as provided.
2. In core/pom.xml they need to be added in includeArtifactIds.
3. They may also need to be listed in other poms as well due to some 
compiler bugs.

I'd just go through and look everywhere in the source code you see 
`jetty-server` and do the same for these 3 artifacts. If it's still not working 
then, let me know and I can take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6152] Use shaded ASM5 to support closur...

2015-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9512#issuecomment-155334662
  
This looks fine (i.e. LGTM). However, we could also look into actually 
shading asm ourselves in our published artifacts, similar to how we now shade 
jetty and other things. I'm fine to just use this already shaded one too 
though. Seems to do no harm and will allow us to work with Java 8. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...

2015-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9503#issuecomment-155341829
  
I only reviewed the build changes, but they look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7841][BUILD] Stop using retrieveManaged...

2015-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9575#issuecomment-155341204
  
It's hard for me to rule out that there is _no_ other reason lib_managed is 
used at present. I audited all the uses of it I could find in the codebase and 
it appears they all relate to the DataNucleus jars. So LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-05 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r44055168
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.dstream
+
+import scala.reflect.ClassTag
+
+import org.apache.spark._
+import org.apache.spark.rdd.{EmptyRDD, RDD}
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord}
+
+
+abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: 
ClassTag, T: ClassTag](
+ssc: StreamingContext) extends DStream[T](ssc) {
+
+  def stateSnapshots(): DStream[(K, S)]
--- End diff --

snapshotStream?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43980418
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
 ---
@@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
 )
   }
 
+  /** TODO: Add scala docs */
+  def trackStateByKey[S: ClassTag, T: ClassTag](
--- End diff --

Can you add the docs? Would make it easier to review the public API's here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43980438
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/TrackStateSpec.scala ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{HashPartitioner, Partitioner}
+import org.apache.spark.api.java.JavaPairRDD
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * Abstract class having all the specifications of 
DStream.trackStateByKey().
+ * Use the `TrackStateSpec.create()` or `TrackStateSpec.create()` to 
create instances of this class.
+ *
+ * {{{
+ *TrackStateSpec(trackingFunction)// in Scala
+ *TrackStateSpec.create(trackingFunction) // in Java
+ * }}}
+ */
+sealed abstract class TrackStateSpec[K: ClassTag, V: ClassTag, S: 
ClassTag, T: ClassTag]
--- End diff --

I would prefer to just call this `StateSpec`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43980731
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.dstream
+
+import scala.reflect.ClassTag
+
+import org.apache.spark._
+import org.apache.spark.rdd.{EmptyRDD, RDD}
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord}
+
+
+abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: 
ClassTag, T: ClassTag](
--- End diff --

could this be called `EmittedStateDStream`? I don't think the term "Record" 
has clear semantics here. It might be good to tie it back to the terms already 
defined (i.e. State).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43980502
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
 ---
@@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
 )
   }
 
+  /** TODO: Add scala docs */
+  def trackStateByKey[S: ClassTag, T: ClassTag](
--- End diff --

Especially if there can be a doc that describes (K, V, S, T) and what their 
semantics are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43981556
  
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala 
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming
+
+/**
+ * Abstract class for getting and updating the tracked state in the 
`trackStateByKey` operation of
+ * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair 
DStream]] and
+ * [[org.apache.spark.streaming.api.java.JavaPairDStream]].
+ * {{{
+ *
+ * }}}
+ */
+sealed abstract class State[S] {
+
+  /** Whether the state already exists */
+  def exists(): Boolean
+
+  /**
+   * Get the state if it exists, otherwise wise it will throw an exception.
+   * Check with `exists()` whether the state exists or not before calling 
`get()`.
+   */
+  def get(): S
+
+  /**
+   * Update the state with a new value. Note that you cannot update the 
state if the state is
+   * timing out (that is, `isTimingOut() return true`, or if the state has 
already been removed by
+   * `remove()`.
+   */
+  def update(newState: S): Unit
+
+  /** Remove the state if it exists. */
+  def remove(): Unit
--- End diff --

Should this be `delete` or `destroy`? Not sure if we have used similar 
terminology elsewhere. Also it would be good to state the semantics of calling 
this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43982348
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
 ---
@@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
 )
   }
 
+  /** TODO: Add scala docs */
+  def trackStateByKey[S: ClassTag, T: ClassTag](
+spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = 
{
+new EmittedRecordsDStreamImpl[K, V, S, T](
+  new TrackStateDStream[K, V, S, T](
+self,
+spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]]
--- End diff --

This cast is a little weird. Can you just have a single class 
`TrackStateSpec` that has both getters and setters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43981921
  
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala 
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming
+
+/**
+ * Abstract class for getting and updating the tracked state in the 
`trackStateByKey` operation of
+ * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair 
DStream]] and
+ * [[org.apache.spark.streaming.api.java.JavaPairDStream]].
+ * {{{
+ *
+ * }}}
+ */
+sealed abstract class State[S] {
+
+  /** Whether the state already exists */
+  def exists(): Boolean
+
+  /**
+   * Get the state if it exists, otherwise wise it will throw an exception.
+   * Check with `exists()` whether the state exists or not before calling 
`get()`.
+   */
+  def get(): S
+
+  /**
+   * Update the state with a new value. Note that you cannot update the 
state if the state is
+   * timing out (that is, `isTimingOut() return true`, or if the state has 
already been removed by
+   * `remove()`.
+   */
+  def update(newState: S): Unit
+
+  /** Remove the state if it exists. */
+  def remove(): Unit
--- End diff --

BTW I have no strong feeling here other than that it match existing things, 
if we have them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9256#issuecomment-153975680
  
I took a broad pass on the public API's and left comments throughout, 
mostly around naming rather than core structure. What would really be helpful 
for me is filling in the documentation on the key public classes `State`, 
`StateSpec`, `trackStateByKey`, and `stateSnapshots()` to make sure I fully 
understand their semantics. I can take a pass again once that is done, but the 
high level approach seems good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43981603
  
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala 
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming
+
+/**
+ * Abstract class for getting and updating the tracked state in the 
`trackStateByKey` operation of
+ * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair 
DStream]] and
+ * [[org.apache.spark.streaming.api.java.JavaPairDStream]].
+ * {{{
+ *
+ * }}}
+ */
+sealed abstract class State[S] {
+
+  /** Whether the state already exists */
+  def exists(): Boolean
+
+  /**
+   * Get the state if it exists, otherwise wise it will throw an exception.
+   * Check with `exists()` whether the state exists or not before calling 
`get()`.
+   */
+  def get(): S
+
+  /**
+   * Update the state with a new value. Note that you cannot update the 
state if the state is
+   * timing out (that is, `isTimingOut() return true`, or if the state has 
already been removed by
+   * `remove()`.
+   */
+  def update(newState: S): Unit
--- End diff --

minor nit: but prefer `def update(state: S): Unit` since it's implied by 
the semantics of update that it is new.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43981909
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.dstream
+
+import scala.reflect.ClassTag
+
+import org.apache.spark._
+import org.apache.spark.rdd.{EmptyRDD, RDD}
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord}
+
+
+abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: 
ClassTag, T: ClassTag](
--- End diff --

I see - i guess it depends what you define as state. I think of `S` as 
"stored state" and `T` as "emitted state". Maybe that's off? I think 
`EmittedDStream` could be okay, it's a bit awkward but I think still better 
than adding a new term.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43982371
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
 ---
@@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
 )
   }
 
+  /** TODO: Add scala docs */
+  def trackStateByKey[S: ClassTag, T: ClassTag](
+spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = 
{
+new EmittedRecordsDStreamImpl[K, V, S, T](
+  new TrackStateDStream[K, V, S, T](
+self,
+spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]]
--- End diff --

It doesn't seem like a big deal if the getters are public.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43982459
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
 ---
@@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
 )
   }
 
+  /** TODO: Add scala docs */
+  def trackStateByKey[S: ClassTag, T: ClassTag](
+spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = 
{
+new EmittedRecordsDStreamImpl[K, V, S, T](
+  new TrackStateDStream[K, V, S, T](
+self,
+spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]]
--- End diff --

Seems like this is what we did with DataFrameReader which follows a similar 
pattern:


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L47


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...

2015-11-04 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9256#discussion_r43982032
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/TrackStateSpec.scala ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{HashPartitioner, Partitioner}
+import org.apache.spark.api.java.JavaPairRDD
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * Abstract class having all the specifications of 
DStream.trackStateByKey().
+ * Use the `TrackStateSpec.create()` or `TrackStateSpec.create()` to 
create instances of this class.
+ *
+ * {{{
+ *TrackStateSpec(trackingFunction)// in Scala
+ *TrackStateSpec.create(trackingFunction) // in Java
+ * }}}
+ */
+sealed abstract class TrackStateSpec[K: ClassTag, V: ClassTag, S: 
ClassTag, T: ClassTag]
+  extends Serializable {
+
+  def initialState(rdd: RDD[(K, S)]): this.type
+  def initialState(javaPairRDD: JavaPairRDD[K, S]): this.type
+
+  def numPartitions(numPartitions: Int): this.type
+  def partitioner(partitioner: Partitioner): this.type
+
+  def timeout(interval: Duration): this.type
--- End diff --

A doc here to precisely define timeouts would be really helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-11-03 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-153469556
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-11-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-153259490
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9926] [SPARK-10340] [SQL] Use S3 bulk l...

2015-11-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8512#discussion_r43717724
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkS3Util.scala ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import com.amazonaws.{AmazonClientException, AmazonServiceException, 
ClientConfiguration, Protocol}
+import com.amazonaws.auth.{AWSCredentialsProvider, BasicAWSCredentials, 
InstanceProfileCredentialsProvider, STSAssumeRoleSessionCredentialsProvider}
+import com.amazonaws.internal.StaticCredentialsProvider
+import com.amazonaws.services.s3.AmazonS3Client
+import com.amazonaws.services.s3.model.{ListObjectsRequest, ObjectListing, 
S3ObjectSummary}
+
+import com.google.common.annotations.VisibleForTesting
+import com.google.common.base.{Preconditions, Strings}
+import com.google.common.cache.{Cache, CacheBuilder}
+import com.google.common.collect.AbstractSequentialIterator
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, GlobPattern, Path, PathFilter}
+import org.apache.hadoop.fs.s3.S3Credentials
+import org.apache.hadoop.io.compress.{CompressionCodecFactory, 
SplittableCompressionCodec}
+import org.apache.hadoop.mapred.{FileInputFormat, FileSplit, InputSplit, 
JobConf}
+
+import org.apache.spark.{Logging, SparkEnv}
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.util.Utils
+
+/**
+ * :: DeveloperApi ::
+ * Contains util methods to interact with S3 from Spark.
+ */
+@DeveloperApi
+object SparkS3Util extends Logging {
--- End diff --

Shouldn't this just be private[spark]? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9926] [SPARK-10340] [SQL] Use S3 bulk l...

2015-11-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8512#discussion_r43717704
  
--- Diff: core/pom.xml ---
@@ -40,6 +40,11 @@
   ${avro.mapred.classifier}
 
 
+  com.amazonaws
+  aws-java-sdk
+  ${aws.java.sdk.version}
+
+
--- End diff --

I took a quick look and unfortunately this has more than 50 transitive 
dependencies (jackson, joda time, apache http client) that are likely to cause 
conflicts. I don't think we can merge this until we look into this more deeply. 
Can we use a more narrow version, for instance only the s3 sdk? Even then we'll 
still have many potential conflicts but it would at least reduce the amount of 
auditing we need to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-11-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-153252094
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-11-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-153252357
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...

2015-11-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-153203911
  
Hey @calvinjia seems okay to merge since this is just triggering other 
failures. Can you explain more though how this works around the mima issue - 
the patch seems the same as #9204. Did something change between Tachyon 0.8.0 
and 0.8.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...

2015-11-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-153208299
  
Got it- thanks I can merge it then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-hadoop1.0][SPARK-11236][CORE] Update Tac...

2015-11-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-152892982
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11236][CORE] Update Tachyon dependency ...

2015-11-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-152891750
  
@calvinjia can you add "[test-hadoop1.0]" to the title of this PR and then 
retest it? That will run the tests with hadoop 1. See more info here:

https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-hadoop1.0][SPARK-11236][CORE] Update Tac...

2015-11-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-152923143
  
Hm - can you try also adding [test-maven] - might be better to test with
maven.

On Sun, Nov 1, 2015 at 9:40 PM, Calvin Jia <notificati...@github.com> wrote:

> @yhuai <https://github.com/yhuai> Thanks for the retest. I'm not sure if
> this will go away by re-running or if there is something up with
> Jenkins/Spark master branch. It seems like the current Spark-Master-SBT
> build is not happy on hadoop1.0 for the same reason. (See:
> 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3911/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/consoleFull
> )
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/9395#issuecomment-152919342>.
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...

2015-11-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9395#issuecomment-152929052
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11236][CORE] Update Tachyon dependency ...

2015-10-29 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9204#issuecomment-152415593
  
@haoyuan hey HY - can you not merge build related patches without asking 
for feedback from one of the build maintainers (me or @srowen)? This patch 
makes changes to Spark's dependency graph that need to be audited carefully 
because they affect all users. There is discussion of the maintainer/review 
process here:


https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers

I did a post hoc review and it appears this does not change the contents of 
the assembly jar. So I think it is okay.

Separately, it would be good to spin tachyon support out into a package so 
these changes do not need to go through the upstream review process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2533 - Add locality levels (with tasks c...

2015-10-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9117#issuecomment-149127284
  
Yeah this was my thought - could we provide a summary in the stage page 
rather than in the stage index.I do see how an aggregated summary is 
significantly more useful than just mentally aggregating based on the task 
table. Doing this on the stage page would avoid adding more columns to the 
index view, and this column could get complicated if stages have many locality 
levels in play. It might be nice to see an alternative patch that does that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2533 - Add locality levels (with tasks c...

2015-10-19 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9117#issuecomment-149125876
  
ping @kayousterhout to me it's the normal issue - sure it's useful for some 
cases, but is it worth putting in the index page? It could be better to just 
put a locality summary on the stage page itself, if one doesn't already exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11110] [Build] Remove transient annotat...

2015-10-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9126#issuecomment-148231844
  
ping @marmbrus for any thoughts. But I think removing them makes sense. If 
someone makes it a val later they will have to reason about whether it should 
be transient or not as we would for any new field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1537 publisher-side code and tests

2015-10-13 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8744#discussion_r41951480
  
--- Diff: yarn/pom.xml ---
@@ -164,6 +164,92 @@
  
   
 
+  
+
--- End diff --

Can this say "The YARN application server..." there is already a different 
component in Spark called history server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147493632
  
I see the underlying problem posed in the JIRA - it's difficult to assess 
duration since it currently includes the time spent waiting on dependent 
stages. However, this patch doesn't seem like the obvious way to fix that. I 
think there are some alternatives that would make more sense:

1. Re-define duration so that it's only defined starting when the first 
task in a stage launches (some concerns here about changing semantics, though).
2. Add a new field that represents the time spent servicing the stage 
"service time" (?)
3. Add a new field that represents the time spent queuing before any tasks 
launched "queue time" (?)

Those all seem better ways to address the issue in the JIRA. This way of 
showing the max task time, it seems indirect. And also not always helpful since 
max task time doesn't have a simple relationship with "duration" as desired 
here... for instance the max task could be pretty short but the duration is 
anyways really long for the stage.

/cc @rxin @kayousterhout for any thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146293403
  
SGTM.

On Wed, Oct 7, 2015 at 11:55 AM, Reynold Xin <notificati...@github.com>
wrote:

> (and data size should include all the data, including spilled)
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/8931#issuecomment-146292851>.
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41426589
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

Also it's weird to say "total" in some places but not in others. For 
instance, records in and out are also totals, but it doesn't say "total" in 
those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146285580
  
I think we want to have concise and consistent names here across all the 
metrics. Here is my proposal for naming:

```
input rows
output rows
spilled data
memory
task memory (max)
task spilled data (max)
```

I think the word "peak" is not necessary because I assume your report the 
peak memory over the lifetime of a task. I think the word "total" is not 
necessary because these are accumulated values and can be assumed to be total 
unless otherwise stated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146288685
  
The reason I like accumulated memory is that it's something that should be 
roughly constant over multiple runs of a workload so people can get a sense of 
how much data they are buffering during execution. The max and median will 
depend a lot on how tasks are scheduled, etc, so they don't give someone a 
great idea of how they can change their query or data to get memory under 
control. It's just how in hadoop you can see the total input size for a job. 
These totals are often really helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41426412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

Hey can you give a more precise definition here of what this means? I think 
the word "Peak" is throwing me off and maybe we could delete it, if you say 
"memory" I will assume you mean the maximum amount of memory a task is using 
over its lifetime. I think on this one it might be best to just discuss it 
briefly in person.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146289643
  
BTW - one alternative would be to create an accumulator that tracks max, 
min, median, and total and then have it display nicely in two lines. For 
instance:

```
memory total (min,med,max):
10GB (1MB,100MB,1GB)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10833] [BUILD] Inline, organize BSD/MIT...

2015-09-28 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8919#issuecomment-143837144
  
Hey Sean - looks good to me, but can't claim to be nearly as deep as you 
are on this stuff!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update branch-1.5 for 1.5.1 release.

2015-09-23 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8890#issuecomment-142783383
  
@rxin I think you should upgrade R/pkg/DESCRIPTION - otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.

2015-09-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8350#issuecomment-140562180
  
No problem - I should have caught it earlier. Hopefully people didn't spin
too many cycles on this.

On Tue, Sep 15, 2015 at 12:56 AM, Sean Owen <notificati...@github.com>
wrote:

> I think that's reasonable since the only failure mode this patch should
> cause is MiMa failure and that passed now. Thanks @pwendell
> <https://github.com/pwendell> for reminding me about the previousVersion
> thing, had totally overlooked how that worked.
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/8350#issuecomment-140312928>.
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10511][BUILD] Reset git repository befo...

2015-09-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8774#issuecomment-140508440
  
This is perfect, thanks. LGTM. Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.

2015-09-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8350#issuecomment-140213492
  
I think if you update previousVersion in MimaBuild.scala many of these 
should go away. I'm happy to look at the error output after doing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.

2015-09-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8350#issuecomment-140231316
  
I tested it locally, you'll need something like this:

```
   version match {
+case v if v.startsWith("1.6") =>
+  Seq(
+MimaBuild.excludeSparkPackage("deploy"),
+// These are needed if checking against the sbt build, since 
they are part of
+// the maven-generated artifacts in 1.3.
+excludePackage("org.spark-project.jetty"),
+MimaBuild.excludeSparkPackage("unused"),
+ProblemFilters.exclude[MissingClassProblem](
+  "org.apache.spark.sql.execution.datasources.DefaultSource")
+  )
+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.

2015-09-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8350#issuecomment-140286995
  
Hm - this is a test file that is in the source folder, so MIMA is
complaining about changes. The error message gives you a filter you can add
to the MIMA file to ignore it.

On Mon, Sep 14, 2015 at 10:29 PM, Apache Spark QA <notificati...@github.com>
wrote:

> Test build #1754 has finished
> 
<https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1754/console>
> for PR 8350 at commit ce5b5bb
> 
<https://github.com/apache/spark/commit/ce5b5bbe2efbc92df5105d3072ee103ccc8e3e36>
> .
>
>- This patch *fails MiMa tests*.
>- This patch merges cleanly.
>- This patch adds the following public classes *(experimental)*:
>   - class MinMaxScaler(JavaEstimator, HasInputCol, HasOutputCol):
>   - class MinMaxScalerModel(JavaModel):
>   - case class Stddev(child: Expression) extends StddevAgg(child)
>   - case class StddevPop(child: Expression) extends StddevAgg(child)
>   - case class StddevSamp(child: Expression) extends StddevAgg(child)
>   - abstract class StddevAgg(child: Expression) extends
>   AlgebraicAggregate
>   - abstract class StddevAgg1(child: Expression) extends
>   UnaryExpression with PartialAggregate1
>   - case class Stddev(child: Expression) extends StddevAgg1(child)
>   - case class StddevPop(child: Expression) extends StddevAgg1(child)
>   - case class StddevSamp(child: Expression) extends StddevAgg1(child)
>   - case class ComputePartialStd(child: Expression) extends
>   UnaryExpression with AggregateExpression1
>   - case class ComputePartialStdFunction (
>   - case class MergePartialStd(
>   - case class MergePartialStdFunction(
>   - case class StddevFunction(
>   - case class IntersectNode(conf: SQLConf, left: LocalNode, right:
>   LocalNode)
>   - case class SampleNode(
>   - case class TakeOrderedAndProjectNode(
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/8350#issuecomment-140286013>.
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...

2015-09-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8437#issuecomment-140288092
  
LGTM - I've modified that script recently enough to be familiar with how it 
works. This seems like a good approach and could be useful for us in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137170215
  
For the details link - can you use the standard drop down icon we use in 
other places?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137221455
  
Yes - I mean the triangle image used there and in a few other places (such 
as showing more metrics).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-30 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/7878#discussion_r38283921
  
--- Diff: dev/run-tests.py ---
@@ -227,11 +228,32 @@ def build_spark_documentation():
 os.chdir(SPARK_HOME)
 
 
+def get_zinc_port():
+
+Get a randomized port on which to start Zinc
+
+return random.randrange(3030, 4030)
--- End diff --

This logic is identical to that hard coded in the bash scripts that run the 
maven builds, in the past they've never failed for this reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-08-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-136268566
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-08-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-136255396
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-08-30 Thread pwendell

GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/8531

[SPARK-10359] Enumerate Spark's dependencies in a file and diff against it 
for new pull requests

DON'T MERGE ME - TESTING ON JENKINS

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark dependency-audits

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8531


commit 9cf442df2c04291ae1a7df567658b490eaa46708
Author: Patrick Wendell patr...@databricks.com
Date:   2015-08-28T22:43:25Z

Adding build test module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-136250200
  
Thanks for looking at this @vanzin. I do agree it would be a lot nicer to 
base things on comments, but because the comment stream isn't available as 
meta-data on jenkins, that's a huge amount of additional work that IMO is best 
left to an extension if someone is feeling interested. Given that it seems like 
you are cool to merge this for now as is, then look at evolving it more later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...

2015-08-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8531#issuecomment-136271740
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-28 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135871143
  
Okay this seems to be passing now - any thoughts @JoshRosen or @vanzin? IMO 
this would be really nice since we can test build changes with either build, 
before they make it into spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-28 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135870939
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135582563
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10004] [shuffle] Perform auth checks wh...

2015-08-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8218#issuecomment-135637236
  
ping @aarondav 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135635256
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-135144791
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-135144862
  
K - I just sent a hotfix to up the timeout.

On Wed, Aug 26, 2015 at 9:51 AM, Marcelo Vanzin notificati...@github.com
wrote:

 I'm working on fixing the root cause of the timeouts (running unnecessary
 tests). If you think it would be beneficial to just bump the timeout right
 now, please just send a PR for that; I'm pretty confident that this PR 
does
 not make the timeout issue any worse.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-135105812.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-134867614
  
Marcelo you will need to change the timeout in the code itself for it to
increase from 175
On Aug 25, 2015 11:24 PM, UCB AMPLab notificati...@github.com wrote:

 Test FAILed.
 Refer to this link for build results (access rights to CI server needed):
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41584/
 Test FAILed.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-134855001.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-135105365
  
I still don't understand - why not fix the issue in a separate PR, get this
passing and then merge this? It will then benefit the other PR's also that
are facing this issue.

On Wed, Aug 26, 2015 at 9:46 AM, Marcelo Vanzin notificati...@github.com
wrote:

 but this one seems to be timing out with certainty every time

 Often, but not deterministically every time. It seems to time out as often
 as any other PR that needs to run all tests (I've already cc'ed you on at
 least another one that times out just as often).

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-135104616.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-135103392
  
Okay then - I would do that separately, get this PR passing, then merge it.
It is not good to merge a PR that deterministically fails jenkins. Have we
done that in the recent past? I saw a few other PR's that hit an occasional
timeout, but this one seems to be timing out with certainty every time.

On Wed, Aug 26, 2015 at 9:15 AM, Marcelo Vanzin notificati...@github.com
wrote:

 you will need to change the timeout in the code

 Yes but I don't want to do that as part of this change, since they're
 unrelated things. All tests have been passing, the timeouts are unrelated
 to the PR.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-135082936.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-26 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-135200381
  
Great - looks good!
On Aug 26, 2015 3:53 PM, Marcelo Vanzin notificati...@github.com wrote:

 Yay!

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-135199475.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-25 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-134683004
  
Does this PR increase test time in some way? Just wondering why this would
consistently timeout when others don't.

On Tue, Aug 25, 2015 at 10:43 AM, Marcelo Vanzin notificati...@github.com
wrote:

 175m is starting to look really low. the scala/java unit tests took 143m
 to run. anyway, retest this please.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-134682606.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-25 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-134732350
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-24 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-134422382
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...

2015-08-24 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7629#issuecomment-134422378
  
Yeah sounds good - might be good to let it run one more time just to be
sure it's not affecting jenkins somehow.

On Mon, Aug 24, 2015 at 4:40 PM, Marcelo Vanzin notificati...@github.com
wrote:

 Seems like all tests passed, no idea why jenkins thinks they timed out.

 AFAICT, this is good to do. @pwendell https://github.com/pwendell ?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7629#issuecomment-134414895.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6196] [BUILD] Remove MapR profiles in f...

2015-08-24 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8338#issuecomment-134449642
  
Yes this LGTM - these are outdated and I don't even think MapR is advising 
their customers to use these.  They are asking people to use hadoop-provided, 
which was created to simplify using Spark with different versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7726: Add import so Scaladoc doesn't fai...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8095#issuecomment-129950381
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-129950164
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-129954901
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-129976266
  
test



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-130156023
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-130155218
  
Okay will merge this - I've been keeping things in a separate repo and it's 
much better to have it in the upstream in case others want to modify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-10 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-129708771
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7726 Add import so Scaladoc doesn't fail...

2015-08-10 Thread pwendell

GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/8095

SPARK-7726 Add import so Scaladoc doesn't fail.

This is another import needed so Scala 2.11 doc generation doesn't fail.
See SPARK-7726 for more detail. I tested this locally and the 2.11
install goes from failing to succeeding with this patch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark scaladoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8095


commit 5eba40b908824faa85ac324365f9c3374bbb1f0f
Author: Patrick Wendell patr...@databricks.com
Date:   2015-08-11T05:34:53Z

SPARK-7726 Add import so Scaladoc doesn't fail.

This is another import needed so Scala 2.11 doc generation doesn't fail.
See SPARK-7726 for more detail.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7726: Add import so Scaladoc doesn't fai...

2015-08-10 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8095#issuecomment-129713711
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-10 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/7411#discussion_r36684757
  
--- Diff: dev/create-release/release-build.sh ---
@@ -0,0 +1,320 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+function exit_with_usage {
+  cat  EOF
+usage: release-build.sh package|docs|publish-snapshot|publish-release
+Creates build deliverables from a Spark commit.
+
+Top level targets are
+  package: Create binary packages and copy them to people.apache
+  docs: Build docs and copy them to people.apache
+  publish-snapshot: Publish snapshot release to Apache snapshots
+  publish-release: Publish a release to Apache release repo
+
+All other inputs are environment variables
+
+GIT_REF - Release tag or commit to build from
+SPARK_VERSION - Release identifier used when publishing
+SPARK_PACKAGE_VERSION - Release identifier in top level package directory
+REMOTE_PARENT_DIR - Parent in which to create doc or release builds.
+REMOTE_PARENT_MAX_LENGTH - If set, parent directory will be cleaned to only
+ have this number of subdirectories (by deleting old ones). WARNING: This 
deletes data.
+
+ASF_USERNAME - Username of ASF committer account
+ASF_PASSWORD - Password of ASF committer account
+ASF_RSA_KEY - RSA private key file for ASF committer account
+
+GPG_KEY - GPG key used to sign release artifacts
+GPG_PASSPHRASE - Passphrase for GPG key
+EOF
+  exit 1
+}
+
+set -e
+
+if [ $# -eq 0 ]; then
+  exit_with_usage
+fi
+
+if [[ $@ == *help* ]]; then
+  exit_with_usage
+fi
+
+for env in ASF_USERNAME ASF_RSA_KEY GPG_PASSPHRASE GPG_KEY; do
+  if [ -z ${!env} ]; then
+echo ERROR: $env must be set to run this script
+exit_with_usage
+  fi
+done
+
+# Commit ref to checkout when building
+GIT_REF=${GIT_REF:-master}
+
+# Destination directory parent on remote server
+REMOTE_PARENT_DIR=${REMOTE_PARENT_DIR:-/home/$ASF_USERNAME/public_html}
+
+SSH=ssh -o StrictHostKeyChecking=no -i $ASF_RSA_KEY
+GPG=gpg --no-tty --batch
+NEXUS_ROOT=https://repository.apache.org/service/local/staging
+NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads
+BASE_DIR=$(pwd)
+
+PUBLISH_PROFILES=-Pyarn -Phive -Phadoop-2.2
+PUBLISH_PROFILES=$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl
+
+rm -rf spark
+git clone https://git-wip-us.apache.org/repos/asf/spark.git
+cd spark
+git checkout $GIT_REF
+git_hash=`git rev-parse --short HEAD`
+echo Checked out Spark git hash $git_hash
+
+if [ -z $SPARK_VERSION ]; then
+  SPARK_VERSION=$(mvn help:evaluate -Dexpression=project.version \
+| grep -v INFO | grep -v WARNING | grep -v Download)
+fi
+
+if [ -z $SPARK_PACKAGE_VERSION ]; then
+  SPARK_PACKAGE_VERSION=${SPARK_VERSION}-$(date 
+%Y_%m_%d_%H_%M)-${git_hash}
+fi
+
+DEST_DIR_NAME=spark-$SPARK_PACKAGE_VERSION
+USER_HOST=$asf_usern...@people.apache.org
+
+rm .gitignore
+rm -rf .git
+cd ..
+
+if [ -n $REMOTE_PARENT_MAX_LENGTH ]; then
+  old_dirs=$($SSH $USER_HOST ls -t $REMOTE_PARENT_DIR | tail -n 
+$REMOTE_PARENT_MAX_LENGTH)
+  for old_dir in $old_dirs; do
+echo Removing directory: $old_dir
+$SSH $USER_HOST rm -r $REMOTE_PARENT_DIR/$old_dir
+  done
+fi
+
+if [[ $1 == package ]]; then
+  # Source and binary tarballs
+  echo Packaging release tarballs
+  cp -r spark spark-$SPARK_VERSION
+  tar cvzf spark-$SPARK_VERSION.tgz spark-$SPARK_VERSION
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output 
spark-$SPARK_VERSION.tgz.asc \
+--detach-sig spark-$SPARK_VERSION.tgz
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md MD5 
spark-$SPARK_VERSION.tgz  \
+spark-$SPARK_VERSION.tgz.md5
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
+SHA512 spark-$SPARK_VERSION.tgz  spark-$SPARK_VERSION.tgz.sha

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-10 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/7411#discussion_r36684821
  
--- Diff: dev/create-release/release-build.sh ---
@@ -0,0 +1,320 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+function exit_with_usage {
+  cat  EOF
+usage: release-build.sh package|docs|publish-snapshot|publish-release
+Creates build deliverables from a Spark commit.
+
+Top level targets are
+  package: Create binary packages and copy them to people.apache
+  docs: Build docs and copy them to people.apache
+  publish-snapshot: Publish snapshot release to Apache snapshots
+  publish-release: Publish a release to Apache release repo
+
+All other inputs are environment variables
+
+GIT_REF - Release tag or commit to build from
+SPARK_VERSION - Release identifier used when publishing
+SPARK_PACKAGE_VERSION - Release identifier in top level package directory
+REMOTE_PARENT_DIR - Parent in which to create doc or release builds.
+REMOTE_PARENT_MAX_LENGTH - If set, parent directory will be cleaned to only
+ have this number of subdirectories (by deleting old ones). WARNING: This 
deletes data.
+
+ASF_USERNAME - Username of ASF committer account
+ASF_PASSWORD - Password of ASF committer account
+ASF_RSA_KEY - RSA private key file for ASF committer account
+
+GPG_KEY - GPG key used to sign release artifacts
+GPG_PASSPHRASE - Passphrase for GPG key
+EOF
+  exit 1
+}
+
+set -e
+
+if [ $# -eq 0 ]; then
+  exit_with_usage
+fi
+
+if [[ $@ == *help* ]]; then
+  exit_with_usage
+fi
+
+for env in ASF_USERNAME ASF_RSA_KEY GPG_PASSPHRASE GPG_KEY; do
+  if [ -z ${!env} ]; then
+echo ERROR: $env must be set to run this script
+exit_with_usage
+  fi
+done
+
+# Commit ref to checkout when building
+GIT_REF=${GIT_REF:-master}
+
+# Destination directory parent on remote server
+REMOTE_PARENT_DIR=${REMOTE_PARENT_DIR:-/home/$ASF_USERNAME/public_html}
+
+SSH=ssh -o StrictHostKeyChecking=no -i $ASF_RSA_KEY
+GPG=gpg --no-tty --batch
+NEXUS_ROOT=https://repository.apache.org/service/local/staging
+NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads
+BASE_DIR=$(pwd)
+
+PUBLISH_PROFILES=-Pyarn -Phive -Phadoop-2.2
+PUBLISH_PROFILES=$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl
+
+rm -rf spark
+git clone https://git-wip-us.apache.org/repos/asf/spark.git
+cd spark
+git checkout $GIT_REF
+git_hash=`git rev-parse --short HEAD`
+echo Checked out Spark git hash $git_hash
+
+if [ -z $SPARK_VERSION ]; then
+  SPARK_VERSION=$(mvn help:evaluate -Dexpression=project.version \
+| grep -v INFO | grep -v WARNING | grep -v Download)
+fi
+
+if [ -z $SPARK_PACKAGE_VERSION ]; then
+  SPARK_PACKAGE_VERSION=${SPARK_VERSION}-$(date 
+%Y_%m_%d_%H_%M)-${git_hash}
+fi
+
+DEST_DIR_NAME=spark-$SPARK_PACKAGE_VERSION
+USER_HOST=$asf_usern...@people.apache.org
+
+rm .gitignore
+rm -rf .git
+cd ..
+
+if [ -n $REMOTE_PARENT_MAX_LENGTH ]; then
+  old_dirs=$($SSH $USER_HOST ls -t $REMOTE_PARENT_DIR | tail -n 
+$REMOTE_PARENT_MAX_LENGTH)
+  for old_dir in $old_dirs; do
+echo Removing directory: $old_dir
+$SSH $USER_HOST rm -r $REMOTE_PARENT_DIR/$old_dir
+  done
+fi
+
+if [[ $1 == package ]]; then
+  # Source and binary tarballs
+  echo Packaging release tarballs
+  cp -r spark spark-$SPARK_VERSION
+  tar cvzf spark-$SPARK_VERSION.tgz spark-$SPARK_VERSION
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output 
spark-$SPARK_VERSION.tgz.asc \
+--detach-sig spark-$SPARK_VERSION.tgz
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md MD5 
spark-$SPARK_VERSION.tgz  \
+spark-$SPARK_VERSION.tgz.md5
+  echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
+SHA512 spark-$SPARK_VERSION.tgz  spark-$SPARK_VERSION.tgz.sha

[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...

2015-08-10 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7411#issuecomment-129605050
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9593] [SQL] Fixes Hadoop shims loading

2015-08-05 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7929#issuecomment-127889147
  
LGTM - feel free to merge, as it is really taking a toll on our tests right 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9593] [SQL] Fixes Hadoop shims loading

2015-08-05 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/7929#discussion_r36273191
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -62,6 +64,52 @@ private[hive] class ClientWrapper(
   extends ClientInterface
   with Logging {
 
+  overrideHadoopShims()
+
+  // !! HACK ALERT !!
+  //
+  // This method is a surgical fix for Hadoop version 2.0.0-mr1-cdh4.1.1, 
which is used by Spark EC2
+  // scripts.  We should remove this after upgrading Spark EC2 scripts to 
some more recent Hadoop
+  // version in the future.
+  //
+  // Internally, Hive `ShimLoader` tries to load different versions of 
Hadoop shims by checking
+  // version information gathered from Hadoop jar files.  If the major 
version number is 1,
+  // `Hadoop20SShims` will be loaded.  Otherwise, if the major version 
number is 2, `Hadoop23Shims`
+  // will be chosen.
+  //
+  // However, part of APIs in Hadoop 2.0.x and 2.1.x versions were in flux 
due to historical
+  // reasons. So 2.0.0-mr1-cdh4.1.1 is actually more Hadoop-1-like and 
should be used together with
--- End diff --

Yeah I agree the comment is slightly wrong. I think CDH4 named the release 
with mr1 because they took the upstream 2.0.X release but then packaged with 
the older (pre-yarn) version of MR. So this comment could be improved or just 
made shorter.

In terms of covering other Hadoop 2.0.x distributions - as far as I know no 
one other than cloudera ever really distributed this. I am pretty hesitant to 
make any assumptions about what other Hadoop 2.0.x distributions might contain, 
because that in general was not a time of API stability for Hadoop and there 
generally variance around API's. So my feeling was to just cover the one case 
we do distribute binary builds for (the chd4 distribution).

My main feeling was, we should make this work for the cdh4 version that we 
do provide binary builds for, but not go crazy trying to hypothesize about 
other one-off hadoop versions that were packaged around that time, if any exist.

I do agree though the comment could be made more succinct and accurate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9593] [SQL] Fixes Hadoop shims loading

2015-08-05 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7929#issuecomment-127890665
  
do feel free to get the comment thing hashed out with @srowen. My time zone 
is approaching bed time, so I have to sign off. Would be nice to get something 
of this nature in soon because of the test issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4362 matches

Mail list logo