[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21255
  
@huaxingao, that's absolutely fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90496/testReport)**
 for PR 21291 at commit 
[`015e2ad`](https://github.com/apache/spark/commit/015e2ad739e5ad7fe6d1d1ef3c919661d8ac3d29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90488/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21230
  
**[Test build #90488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90488/testReport)**
 for PR 21230 at commit 
[`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21291
  
Changed `outputPartitioning` changes executed plans.

E.g. in `WholeStageCodegenSuite`, a query like 
`spark.range(3).groupBy("id").count().orderBy("id")`. Its executed plan changes 
from

```
*(3) Sort [id#22L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(id#22L ASC NULLS FIRST, 5)
   +- *(2) HashAggregate(keys=[id#22L], functions=[count(1)], 
output=[id#22L, count#26L])
  +- Exchange hashpartitioning(id#22L, 5)
 +- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)], 
output=[id#22L, count#31L])
+- *(1) Range (0, 3, step=1, splits=2)
```
to

```
*(1) Sort [id#22L ASC NULLS FIRST], true, 0
+- *(1) HashAggregate(keys=[id#22L], functions=[count(1)], output=[id#22L, 
count#26L])
   +- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)], 
output=[id#22L, count#31L])
  +- *(1) Range (0, 3, step=1, splits=2)
```

I will update related tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3128/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90495/testReport)**
 for PR 21291 at commit 
[`6039094`](https://github.com/apache/spark/commit/6039094a3af35319e6ea10f27c761936be883b79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21255
  
@felixcheung @HyukjinKwon @viirya 
I am thinking of closing this one and open a new PR if no objection. It's 
messy to resolve the conflicts because I have quite a few patches. Sorry for my 
carelessness and for any inconvenience.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21291
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...

2018-05-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21106#discussion_r187520079
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodegenObjectFactory.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.codehaus.commons.compiler.CompileException
+import org.codehaus.janino.InternalCompilerException
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Utils
+
+/**
+ * Catches compile error during code generation.
+ */
+object CodegenError {
+  def unapply(throwable: Throwable): Option[Exception] = throwable match {
+case e: InternalCompilerException => Some(e)
+case e: CompileException => Some(e)
+case _ => None
+  }
+}
+
+/**
+ * Defines values for `SQLConf` config of fallback mode. Use for test only.
+ */
+object CodegenObjectFactoryMode extends Enumeration {
+  val AUTO, CODEGEN_ONLY, NO_CODEGEN = Value
+
+  def currentMode: CodegenObjectFactoryMode.Value = {
+// If we weren't on task execution, accesses that config.
+if (TaskContext.get == null) {
+  val config = SQLConf.get.getConf(SQLConf.CODEGEN_FACTORY_MODE)
+  CodegenObjectFactoryMode.withName(config)
+} else {
+  CodegenObjectFactoryMode.AUTO
+}
+  }
+}
+
+/**
+ * A factory which can be used to create objects that have both codegen 
and interpreted
+ * implementations. This tries to create codegen object first, if any 
compile error happens,
+ * it fallbacks to interpreted version.
+ */
+abstract class CodegenObjectFactory[IN, OUT] {
--- End diff --

any better name? How about `CodeGeneratorWithInterpretedFallback` and make 
it extends `CodeGenerator`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...

2018-05-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21106#discussion_r187519505
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
 ---
@@ -87,12 +87,11 @@ class InterpretedUnsafeProjection(expressions: 
Array[Expression]) extends Unsafe
 /**
  * Helper functions for creating an [[InterpretedUnsafeProjection]].
  */
-object InterpretedUnsafeProjection extends UnsafeProjectionCreator {
-
+object InterpretedUnsafeProjection {
   /**
* Returns an [[UnsafeProjection]] for given sequence of bound 
Expressions.
*/
-  override protected def createProjection(exprs: Seq[Expression]): 
UnsafeProjection = {
+  protected[sql] def createProjection(exprs: Seq[Expression]): 
UnsafeProjection = {
--- End diff --

we are inside `object`, if we want to expose it, just mark it as `public`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90487/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90487/testReport)**
 for PR 21291 at commit 
[`6039094`](https://github.com/apache/spark/commit/6039094a3af35319e6ea10f27c761936be883b79).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3127/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21118
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21118
  
**[Test build #90494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90494/testReport)**
 for PR 21118 at commit 
[`43d5893`](https://github.com/apache/spark/commit/43d5893de8413c152faad748f4951ffb3311b689).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3126/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression...

2018-05-10 Thread fangshil
Github user fangshil commented on a diff in the pull request:

https://github.com/apache/spark/pull/21276#discussion_r187518454
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2715,6 +2715,66 @@ private[spark] object Utils extends Logging {
 HashCodes.fromBytes(secretBytes).toString()
   }
 
+  /**
+   * A safer version than scala obj's getClass.getSimpleName and 
Utils.getFormattedClassName
+   * which may throw Malformed class name error.
+   * This method mimicks scalatest's getSimpleNameOfAnObjectsClass.
+   */
+  def getSimpleName(fullyQualifiedName: String): String = {
--- End diff --

a if statement to determine if the name is good or not may not be stable 
and comprehensive - how about we use try-catch here and only use the 
alternative if obj.getClass.getSimpleName throws the Malformed class name error


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21118
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21255
  
**[Test build #90493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90493/testReport)**
 for PR 21255 at commit 
[`675726c`](https://github.com/apache/spark/commit/675726c326f2dd201e94f0def0b395befb87d303).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3125/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90490/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21106
  
**[Test build #90490 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90490/testReport)**
 for PR 21106 at commit 
[`4c04d14`](https://github.com/apache/spark/commit/4c04d140d59d3cf7f11b6eae91896124eeb48496).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21129
  
**[Test build #90492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90492/testReport)**
 for PR 21129 at commit 
[`b970ecf`](https://github.com/apache/spark/commit/b970ecf2f673cfc0f4033747997b04bc5444006f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-10 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
BTW I’ve been and am currently traveling with a busy itinerary. I 
haven’t started work on this and probably won’t get to work on it until 
Monday at the very earliest.

> On May 5, 2018, at 8:32 AM, Xiao Li  wrote:
> 
> Yeah. That is fine. Will try to review the relevant PRs ASAP. Please ping 
me. Thanks again!
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
> 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90491/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21129
  
**[Test build #90491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90491/testReport)**
 for PR 21129 at commit 
[`46f4436`](https://github.com/apache/spark/commit/46f443619a7f47fd19da8002fbe2761b405c8e9a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3124/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21129
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17086
  
LGTM. @jkbradley @mengxr Would you mind take a look ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21129
  
**[Test build #90491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90491/testReport)**
 for PR 21129 at commit 
[`46f4436`](https://github.com/apache/spark/commit/46f443619a7f47fd19da8002fbe2761b405c8e9a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21255#discussion_r187515108
  
--- Diff: R/pkg/R/functions.R ---
@@ -1256,7 +1259,7 @@ setMethod("quarter",
 #' @details
 #' \code{reverse}: Reverses the string column and returns it as a new 
string column.
--- End diff --

Don't we need to update this doc that adds the fact it supports array now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21255#discussion_r187515152
  
--- Diff: R/pkg/R/functions.R ---
@@ -2047,18 +2050,8 @@ setMethod("countDistinct",
 #' \code{concat}: Concatenates multiple input columns together into a 
single column.
 #' If all inputs are binary, concat returns an output as binary. 
Otherwise, it returns as string.
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR

2018-05-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21278#discussion_r187514621
  
--- Diff: R/pkg/R/client.R ---
@@ -60,13 +60,48 @@ generateSparkSubmitArgs <- function(args, sparkHome, 
jars, sparkSubmitOpts, pack
   combinedArgs
 }
 
+checkJavaVersion <- function() {
+  javaBin <- "java"
+  javaHome <- Sys.getenv("JAVA_HOME")
+  javaReqs <- utils::packageDescription(utils::packageName(), 
fields=c("SystemRequirements"))
+  sparkJavaVersion <- as.numeric(tail(strsplit(javaReqs, "[(=)]")[[1]], n 
= 1L))
+  if (javaHome != "") {
+javaBin <- file.path(javaHome, "bin", javaBin)
--- End diff --

ah ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21298: [SPARK-24198][SparkR][SQL] Adding slice function ...

2018-05-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21298#discussion_r187514450
  
--- Diff: R/pkg/R/functions.R ---
@@ -3124,6 +3125,23 @@ setMethod("size",
 column(jc)
   })
 
+#' @details
+#' \code{slice}: Returns an array containing all the elements in x from 
the index start
+#' (or starting from the end if start is negative) with the specified 
length.
+#'
+#' @rdname column_collection_functions
+#' @param start an index indicating the first element occuring in the 
result.
+#' @param length a number of consecutive elements choosen to the result.
+#'
+#' @aliases slice slice,Column-method
+#' @note slice since 2.4.0
+setMethod("slice",
+signature(x = "Column"),
--- End diff --

please fix the ident - see "size" for example


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90485/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21106
  
**[Test build #90485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90485/testReport)**
 for PR 21106 at commit 
[`1f5cc17`](https://github.com/apache/spark/commit/1f5cc17741ef0feed5894fbafd21c36a44bc5373).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90489/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21163
  
**[Test build #90489 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90489/testReport)**
 for PR 21163 at commit 
[`3ccf94e`](https://github.com/apache/spark/commit/3ccf94e412a061277b58a56fa9b6cbdd6bdf08ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21290: [SPARK-24241][Submit]Do not fail fast when dynamic resou...

2018-05-10 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/21290
  
cc @srowen 
The last code change seems tp be  related to you, plz help to review, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3123/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21106
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21106
  
Any more comments on this? @hvanhovell @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21106
  
**[Test build #90490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90490/testReport)**
 for PR 21106 at commit 
[`4c04d14`](https://github.com/apache/spark/commit/4c04d140d59d3cf7f11b6eae91896124eeb48496).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187507940
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -93,6 +94,10 @@ private[spark] class EventLoggingListener(
   // Visible for tests only.
   private[scheduler] val logPath = getLogPath(logBaseDir, appId, 
appAttemptId, compressionCodecName)
 
+  // map of live stages, to peak executor metrics for the stage
+  private val liveStageExecutorMetrics = mutable.HashMap[(Int, Int),
--- End diff --

This is tracking peak metric values for executors for each stage, so that 
the peak values for the stage can be dumped at stage end. The purpose is to 
reduce the amount of logging, to only number of stages * number of executors 
ExecutorMetricsUpdate events.

I originally tried logging for new peak values, resetting when a new stage 
begins -- this is simpler, but can lead to more events being logged.

Having stage level information is useful for users trying to identify which 
stages are more memory intensive. This information could be useful they are 
trying to reduce the amount of memory used, since they would know which stages 
(and the relevant code) to focus on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187507139
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/PeakExecutorMetrics.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.executor.ExecutorMetrics
+import org.apache.spark.status.api.v1.PeakMemoryMetrics
+
+/**
+ * Records the peak values for executor level metrics. If 
jvmUsedHeapMemory is -1, then no
+ * values have been recorded yet.
+ */
+private[spark] class PeakExecutorMetrics {
--- End diff --

I got some errors when trying to add methods to ExecutorMetrics. I don't 
remember the details, but can try this again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187506958
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +179,27 @@ private[spark] class EventLoggingListener(
 
   // Events that trigger a flush
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
+// log the peak executor metrics for the stage, for each executor
+val accumUpdates = new ArrayBuffer[(Long, Int, Int, 
Seq[AccumulableInfo])]()
+val executorMap = liveStageExecutorMetrics.remove(
--- End diff --

Yes, it's safer to clean up earlier attempts -- I can add some code to 
iterate through earlier attemptIDs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187506780
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +179,27 @@ private[spark] class EventLoggingListener(
 
   // Events that trigger a flush
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
+// log the peak executor metrics for the stage, for each executor
+val accumUpdates = new ArrayBuffer[(Long, Int, Int, 
Seq[AccumulableInfo])]()
+val executorMap = liveStageExecutorMetrics.remove(
+  (event.stageInfo.stageId, event.stageInfo.attemptNumber()))
+executorMap.foreach {
+  executorEntry => {
+for ((executorId, peakExecutorMetrics) <- executorEntry) {
--- End diff --

The for loop (line 187) is going through the hashmap entries of executorId 
to peakExecutorMetrics, so there are multiple values. Could you please provide 
more detail for how "case (executorId, peakExecutorMetrics) =>" would work? If 
the for loop is OK, then I can add some comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3122/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3121/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21163
  
**[Test build #90489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90489/testReport)**
 for PR 21163 at commit 
[`3ccf94e`](https://github.com/apache/spark/commit/3ccf94e412a061277b58a56fa9b6cbdd6bdf08ba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3120/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/21163
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90487/testReport)**
 for PR 21291 at commit 
[`6039094`](https://github.com/apache/spark/commit/6039094a3af35319e6ea10f27c761936be883b79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21230
  
**[Test build #90488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90488/testReport)**
 for PR 21230 at commit 
[`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21291
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21106
  
I agree. It is what I saw when I've tried to change it locally. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21106
  
I took a deeper look, changing to case class needs a lot of changes, let's 
follow your previous approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r187502412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -345,6 +345,16 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 
   override val output: Seq[Attribute] = range.output
 
+  override def outputOrdering: Seq[SortOrder] = range.outputOrdering
+
+  override def outputPartitioning: Partitioning = {
+if (numSlices == 1) {
+  SinglePartition
--- End diff --

`SinglePartition` is better


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19183: [SPARK-21960][Streaming] Spark Streaming Dynamic Allocat...

2018-05-10 Thread karth295
Github user karth295 commented on the issue:

https://github.com/apache/spark/pull/19183
  
@sansagara go for it -- it'll be a few days until I'll have time to look at 
this again. I'll close my PR if/when you make a new one :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21291
  
retest this please.

On Fri, May 11, 2018, 10:33 AM UCB AMPLab  wrote:

> Test FAILed.
> Refer to this link for build results (access rights to CI server needed):
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90484/
> Test FAILed.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90484/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90484/testReport)**
 for PR 21291 at commit 
[`6039094`](https://github.com/apache/spark/commit/6039094a3af35319e6ea10f27c761936be883b79).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90486/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #90486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90486/testReport)**
 for PR 21153 at commit 
[`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187501194
  
--- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.TimeUnit
+
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Creates a heartbeat thread which will call the specified 
reportHeartbeat function at
+ * intervals of intervalMs.
+ *
+ * @param reportHeartbeat the heartbeat reporting function to call.
+ * @param intervalMs the interval between heartbeats.
+ */
+private[spark] class Heartbeater(reportHeartbeat: () => Unit, intervalMs: 
Long) {
+  // Executor for the heartbeat task
+  private val heartbeater = 
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater")
--- End diff --

Yes, this could be for the driver as well, so could be misleading. I can 
change to "heartbeater".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-10 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r187501157
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1753,9 +1766,21 @@ class DAGScheduler(
 messageScheduler.shutdownNow()
 eventProcessLoop.stop()
 taskScheduler.stop()
+heartbeater.stop()
+  }
+
+  /** Reports heartbeat metrics for the driver. */
+  private def reportHeartBeat(): Unit = {
--- End diff --

With cluster mode, including YARN, there isn't a local executor, so the 
metrics for the driver would not be collected. Perhaps this could be modified 
to skip this step for local mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21296#discussion_r187500240
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -267,7 +267,7 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 .options(Map("header" -> "true", "mode" -> "dropmalformed"))
 .load(testFile(carsFile))
 
-  assert(cars.select("year").collect().size === 2)
+  assert(cars.collect().size === 2)
--- End diff --

This changes behaviour and it's intendedly parsed to keep the backword 
compatibility. There was an issue about the different number of counts. I think 
you are basically saying `cars.select("year").collect().size` and 
`cars.collect().size` are different and they are correct, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3119/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21296#discussion_r187499921
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -73,11 +64,24 @@ class UnivocityParser(
   // Each input token is placed in each output row's position by mapping 
these. In this case,
   //
   //   output row - ["A", 2]
-  private val valueConverters: Array[ValueConverter] =
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  private val valueConverters: Array[ValueConverter] = {
+requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
 
-  private val tokenIndexArr: Array[Int] = {
-requiredSchema.map(f => schema.indexOf(f)).toArray
+  private val tokenizer = {
+val parserSetting = options.asParserSettings
+if (requiredSchema.length < schema.length) {
+  val tokenIndexArr = requiredSchema.map(f => 
java.lang.Integer.valueOf(schema.indexOf(f)))
+  parserSetting.selectIndexes(tokenIndexArr: _*)
--- End diff --

I think I tried this locally but I didn't submit a PR since the improvement 
was trivial and a test was broken fwiw.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21296
  
mind fixing the PR title? sounds we have never implemented the "column 
pruning" before in CSV.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21296#discussion_r187499797
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.{Column, Row, SparkSession}
+import org.apache.spark.sql.functions.lit
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{Benchmark, Utils}
+
+/**
+ * Benchmark to measure CSV read/write performance.
+ * To run this:
+ *  spark-submit --class  --jars 
+ */
+object CSVBenchmarks {
+  val conf = new SparkConf()
+
+  val spark = SparkSession.builder
+.master("local[1]")
+.appName("benchmark-csv-datasource")
+.config(conf)
+.getOrCreate()
+  import spark.implicits._
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def multiColumnsBenchmark(rowsNum: Int): Unit = {
+val colsNum = 1000
+val benchmark = new Benchmark(s"Wide rows with $colsNum columns", 
rowsNum)
+
+withTempPath { path =>
+  val fields = Seq.tabulate(colsNum)(i => StructField(s"col$i", 
IntegerType))
+  val schema = StructType(fields)
+  val values = (0 until colsNum).map(i => i.toString).mkString(",")
+  val columnNames = schema.fieldNames
+
+  spark.range(rowsNum)
+.select(Seq.tabulate(colsNum)(i => lit(i).as(s"col$i")): _*)
+.write.option("header", true)
+.csv(path.getAbsolutePath)
+
+  val ds = spark.read.schema(schema).csv(path.getAbsolutePath)
+
+  benchmark.addCase(s"Select $colsNum columns", 3) { _ =>
+ds.select("*").filter((row: Row) => true).count()
+  }
+  val cols100 = columnNames.take(100).map(Column(_))
+  benchmark.addCase(s"Select 100 columns", 3) { _ =>
+ds.select(cols100: _*).filter((row: Row) => true).count()
+  }
+  benchmark.addCase(s"Select one column", 3) { _ =>
+ds.select($"col1").filter((row: Row) => true).count()
+  }
+
+  /*
+  Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
+
+  Wide rows with 1000 columns: Best/Avg Time(ms)Rate(M/s)  
 Per Row(ns)   Relative
+  

+  Select 1000 columns 76910 / 78065  0.0   
76909.8   1.0X
+  Select 100 columns  28625 / 32884  0.0   
28625.1   2.7X
+  Select one column   22498 / 22669  0.0   
22497.8   3.4X
--- End diff --

BTW, I think we are already doing the column pruning by avoiding casting 
cost which is relatively expensive comparing to the parsing logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21294


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21298
  
@mn-mikke, let's do this with few more functions together next time per the 
suggestions from @felixcheung and @shivaram who're PMCs used to R side better 
than me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #90486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90486/testReport)**
 for PR 21153 at commit 
[`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21153
  
Done. Can you try again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21190: [SPARK-22938][SQL][followup] Assert that SQLConf....

2018-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21190


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21294: [SPARK-24197][SparkR][SQL] Adding array_sort function to...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21294
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21297: [SPARK-24246][SQL]Improve AnalysisException by setting t...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21297
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21297: [SPARK-24246][SQL]Improve AnalysisException by setting t...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90479/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21297: [SPARK-24246][SQL]Improve AnalysisException by setting t...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21297
  
**[Test build #90479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90479/testReport)**
 for PR 21297 at commit 
[`cd0dbcf`](https://github.com/apache/spark/commit/cd0dbcf0add48a0426b08c6ba4fdd5e5489a367b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21190
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21118
  
LGTM, although as you mentioned I think it'd definitely be valuable to 
follow up and understand why some operators insist on UnsafeRow even though 
this isn't what SparkPlan declares as the row type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21118
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21118
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90481/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >