[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-24 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/16690
  
Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71935/testReport)**
 for PR 16605 at commit 
[`f20de2c`](https://github.com/apache/spark/commit/f20de2c126e691183399b323a1b8abd4e50812eb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r97406162
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1218,7 +1225,9 @@ class DAGScheduler(
 logInfo("Resubmitting " + shuffleStage + " (" + 
shuffleStage.name +
   ") because some of its tasks had failed: " +
   shuffleStage.findMissingPartitions().mkString(", "))
-submitStage(shuffleStage)
+if (noActiveTaskSetManager) {
--- End diff --

shouldn't this condition go into the surrounding `if 
(!shuffleStage.isAvailable)` ?  the logInfo is very confusing in this case 
otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r97586026
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1193,7 +1193,14 @@ class DAGScheduler(
 }
 
 if (runningStages.contains(shuffleStage) && 
shuffleStage.pendingPartitions.isEmpty) {
-  markStageAsFinished(shuffleStage)
+  val noActiveTaskSetManager =
+taskScheduler.rootPool == null ||
+  !taskScheduler.rootPool.getSortedTaskSetQueue.exists {
+tsm => tsm.stageId == stageId && !tsm.isZombie
+  }
+  if (shuffleStage.isAvailable || noActiveTaskSetManager) {
+markStageAsFinished(shuffleStage)
+  }
--- End diff --

I have to admit, though this passes all the tests, this is really confusing 
to me.  I only somewhat understand why your original version didn't work, and 
why this should be used instead.  Perhaps some more commenting here would help? 
 The condition under which you do `markStageAsFinished` seems very broad, so 
perhaps its worth a comment on the case when you do *not* (and perhaps even a 
`logInfo` in an `else` branch).  The discrepancy between pendingPartitions and 
availableOutputs is also surprising -- perhaps that is worth extra comments on 
`Stage`, on how the meaning of those two are different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r97417513
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala 
---
@@ -648,4 +648,69 @@ class BasicSchedulerIntegrationSuite extends 
SchedulerIntegrationSuite[SingleCor
 }
 assertDataStructuresEmpty(noFailure = false)
   }
+
+  testScheduler("[SPARK-19263] DAGScheduler shouldn't resubmit active 
taskSet.") {
+val a = new MockRDD(sc, 2, Nil)
+val b = shuffle(2, a)
+val shuffleId = b.shuffleDeps.head.shuffleId
+
+def runBackend(): Unit = {
+  val (taskDescription, task) = backend.beginTask()
+  task.stageId match {
+// ShuffleMapTask
+case 0 =>
+  val stageAttempt = task.stageAttemptId
+  val partitionId = task.partitionId
+  (stageAttempt, partitionId) match {
+case (0, 0) =>
+  val fetchFailed = FetchFailed(
+DAGSchedulerSuite.makeBlockManagerId("hostA"), shuffleId, 
0, 0, "ignored")
+  backend.taskFailed(taskDescription, fetchFailed)
+case (0, 1) =>
+  // Wait until stage resubmission caused by FetchFailed is 
finished.
+  
waitUntilConditionBecomeTrue(taskScheduler.runningTaskSets.size==2, 5000,
+"Wait until stage is resubmitted caused by fetch failed")
+
+  // Task(stageAttempt=0, partition=1) will be bogus, because 
both two
+  // tasks(stageAttempt=0, partition=0, 1) run on hostA.
+  // Pending partitions are (0, 1) after stage resubmission,
+  // then change to be 0 after this bogus task.
+  backend.taskSuccess(taskDescription, 
DAGSchedulerSuite.makeMapStatus("hostA", 2))
+case (1, 1) =>
+  // Wait long enough until Success of task(stageAttempt=1 and 
partition=0)
+  // is handled by DAGScheduler.
+  Thread.sleep(5000)
+  // Task(stageAttempt=1 and partition=0) will cause stage 
resubmission,
+  // because shuffleStage.pendingPartitions.isEmpty,
+  // but shuffleStage.isAvailable is false.
+  backend.taskSuccess(taskDescription, 
DAGSchedulerSuite.makeMapStatus("hostB", 2))
+case _ =>
+  backend.taskSuccess(taskDescription, 
DAGSchedulerSuite.makeMapStatus("hostB", 2))
+  }
+// ResultTask
+case 1 => backend.taskSuccess(taskDescription, 10)
+  }
+}
+
+withBackend(runBackend _) {
+  val jobFuture = submit(b, (0 until 2).toArray)
+  val duration = Duration(15, SECONDS)
+  awaitJobTermination(jobFuture, duration)
+}
+assert(results === (0 until 2).map { _ -> 10}.toMap)
+  }
+
+  def waitUntilConditionBecomeTrue(condition: => Boolean, timeout: Long, 
msg: String): Unit = {
--- End diff --

nit: rename to `waitForCondition` (maybe irrevlant given other comments)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-24 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r97417399
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala 
---
@@ -648,4 +648,69 @@ class BasicSchedulerIntegrationSuite extends 
SchedulerIntegrationSuite[SingleCor
 }
 assertDataStructuresEmpty(noFailure = false)
   }
+
+  testScheduler("[SPARK-19263] DAGScheduler shouldn't resubmit active 
taskSet.") {
+val a = new MockRDD(sc, 2, Nil)
+val b = shuffle(2, a)
+val shuffleId = b.shuffleDeps.head.shuffleId
+
+def runBackend(): Unit = {
+  val (taskDescription, task) = backend.beginTask()
+  task.stageId match {
+// ShuffleMapTask
+case 0 =>
+  val stageAttempt = task.stageAttemptId
+  val partitionId = task.partitionId
+  (stageAttempt, partitionId) match {
+case (0, 0) =>
+  val fetchFailed = FetchFailed(
+DAGSchedulerSuite.makeBlockManagerId("hostA"), shuffleId, 
0, 0, "ignored")
+  backend.taskFailed(taskDescription, fetchFailed)
+case (0, 1) =>
+  // Wait until stage resubmission caused by FetchFailed is 
finished.
+  
waitUntilConditionBecomeTrue(taskScheduler.runningTaskSets.size==2, 5000,
+"Wait until stage is resubmitted caused by fetch failed")
+
+  // Task(stageAttempt=0, partition=1) will be bogus, because 
both two
+  // tasks(stageAttempt=0, partition=0, 1) run on hostA.
+  // Pending partitions are (0, 1) after stage resubmission,
+  // then change to be 0 after this bogus task.
+  backend.taskSuccess(taskDescription, 
DAGSchedulerSuite.makeMapStatus("hostA", 2))
+case (1, 1) =>
+  // Wait long enough until Success of task(stageAttempt=1 and 
partition=0)
+  // is handled by DAGScheduler.
+  Thread.sleep(5000)
--- End diff --

hmm, this is a nuisance.  I don't see any good way to get rid of this sleep 
... but now that I think about it, why can't you do this in 
`DAGSchedulerSuite`?  it seems like this can be entirely contained to the 
`DAGScheduler` and doesn't require tricky interactions with other parts of the 
scheduler.  (I'm sorry I pointed you in the wrong direction earlier -- I 
thought perhaps you had tried to copy the examples of `DAGSchedlerSuite` but 
there was some reason you couldn't.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16661
  
**[Test build #71937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71937/testReport)**
 for PR 16661 at commit 
[`5672d13`](https://github.com/apache/spark/commit/5672d1345f661665f521fd1dd4410313ef3ab554).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16693
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16693
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71933/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16693
  
**[Test build #71933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71933/testReport)**
 for PR 16693 at commit 
[`db00cf9`](https://github.com/apache/spark/commit/db00cf9061b2ad4263671f5ca9252642a091ee45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2017-01-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16329#discussion_r97580048
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java
 ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql;
+
+// $example on:typed_custom_aggregation$
+import java.io.Serializable;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.TypedColumn;
+import org.apache.spark.sql.expressions.Aggregator;
+// $example off:typed_custom_aggregation$
+
+public class JavaUserDefinedTypedAggregation {
+
+  // $example on:typed_custom_aggregation$
+  public static class Employee implements Serializable {
+private String name;
+private long salary;
+
+// Constructors, getters, setters...
+// $example off:typed_custom_aggregation$
+public String getName() {
+  return name;
+}
+
+public void setName(String name) {
+  this.name = name;
+}
+
+public long getSalary() {
+  return salary;
+}
+
+public void setSalary(long salary) {
+  this.salary = salary;
+}
+// $example on:typed_custom_aggregation$
+  }
+
+  public static class Average implements Serializable  {
+private long sum;
+private long count;
+
+// Constructors, getters, setters...
+// $example off:typed_custom_aggregation$
+public Average() {
+}
+
+public Average(long sum, long count) {
+  this.sum = sum;
+  this.count = count;
+}
+
+public long getSum() {
+  return sum;
+}
+
+public void setSum(long sum) {
+  this.sum = sum;
+}
+
+public long getCount() {
+  return count;
+}
+
+public void setCount(long count) {
+  this.count = count;
+}
+// $example on:typed_custom_aggregation$
+  }
+
+  public static class MyAverage extends Aggregator {
+// A zero value for this aggregation. Should satisfy the property that 
any b + zero = b
+public Average zero() {
--- End diff --

Is this meant to be `MyAverage`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread gmoehler
Github user gmoehler commented on the issue:

https://github.com/apache/spark/pull/16660
  
@viirya Which comment are you referring to? I thought i had included all of 
them ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71936/testReport)**
 for PR 16677 at commit 
[`9d4cadb`](https://github.com/apache/spark/commit/9d4cadb782afcba52b8081402f5dd89cb0a27ae5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16269
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71932/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16269
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16269
  
**[Test build #71932 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71932/testReport)**
 for PR 16269 at commit 
[`48535aa`](https://github.com/apache/spark/commit/48535aae6be613c28f900e408f073a5eb7ef76cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveFileFormat(fileSinkConf: FileSinkDesc)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16661#discussion_r97571098
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
 @Since("2.0.0")
 object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
 
+  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue 
*/
--- End diff --

Nit: ```Integer.MaxValue``` is not a standard convention, it should be 
```Int.MaxValue``` in Scala or ```Integer.MAX_VALUE``` in Java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16661#discussion_r97570174
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
 @Since("2.0.0")
 object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
 
+  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue 
*/
+  private[clustering] val MAX_NUM_FEATURES = 46000
--- End diff --

+1 @srowen It's better to use the real max. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71934/testReport)**
 for PR 16605 at commit 
[`c16b121`](https://github.com/apache/spark/commit/c16b121247394374fd6066309e1b7309b981eabb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11867: [SPARK-14049] [CORE] Add functionality in spark h...

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11867


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-24 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/11867
  
thanks @paragpc 
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71935/testReport)**
 for PR 16605 at commit 
[`f20de2c`](https://github.com/apache/spark/commit/f20de2c126e691183399b323a1b8abd4e50812eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2017-01-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14918
  
I do not agree with this change too by the same reason in 
https://github.com/apache/spark/pull/14918#issuecomment-250882422.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16614: [SPARK-19260] Spaces or "%20" in path parameter are not ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16614
  
**[Test build #3549 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3549/testReport)**
 for PR 16614 at commit 
[`23834a6`](https://github.com/apache/spark/commit/23834a6ec99ac7e8a8df9095ce523de0ede80aea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71934/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71934/testReport)**
 for PR 16605 at commit 
[`c16b121`](https://github.com/apache/spark/commit/c16b121247394374fd6066309e1b7309b981eabb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97556081
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
   }
 
+  test("to_unix_timestamp with session local timezone") {
--- End diff --

The problem is, except this suite, all the changes you made to the tests 
are just fixed existing tests to fit the timezone stuff. You add all the new 
tests in this suite as end-to-end tests, which is not good. We should add new 
tests in `DateTimeUtilsSuite` as unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97554829
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
   }
 
+  test("to_unix_timestamp with session local timezone") {
--- End diff --

I don't think we need to add tests in this file at all. We should improve 
`DateTimeUtilsSuite` to make sure the newly added methods work well with 
different timezones, e.g. `getHours`, `daysToMillions`, etc. Then make sure 
these timezone aware expressions will call the newly added methods in 
`DateTimeUtils` which has timezone parameter(we can remove the old versions 
that don't take timezone parameter, after we finish handling partition values).

This suite is end-to-end test, and it's very annoying if we wanna test all 
changed expressions, we should write more low-level tests in 
`DateTimeUtilsSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71931/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71931/testReport)**
 for PR 16677 at commit 
[`4fb5e40`](https://github.com/apache/spark/commit/4fb5e40d6aa77dafc0eb715730f5048a74d461d6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: 
Int) extends Partitioning `
  * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode with CodegenSupport `
  * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16693
  
**[Test build #71933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71933/testReport)**
 for PR 16693 at commit 
[`db00cf9`](https://github.com/apache/spark/commit/db00cf9061b2ad4263671f5ca9252642a091ee45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16693
  
cc @gatorsmile @windpiger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16693

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/16552 , 
`CreateHiveTableAsSelectCommand` becomes very similar to 
`CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only 
creating table in the table-not-exist branch.

This PR also adds hive provider checking in DataStream reader/writer, which 
is missed in #16552 

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16693


commit db00cf9061b2ad4263671f5ca9252642a091ee45
Author: Wenchen Fan 
Date:   2017-01-24T13:35:03Z

simplify CreateHiveTableAsSelectCommand




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16660
  
LGTM except one minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16660#discussion_r97545280
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala ---
@@ -71,7 +73,104 @@ object UDT {
 
 }
 
+// object and classes to test SPARK-19311
+
+// Trait/Interface for base type
+@SQLUserDefinedType(udt = classOf[ExampleBaseTypeUDT])
+sealed trait IExampleBaseType extends Serializable {
+  def field: Int
+}
+
+// Trait/Interface for derived type
+@SQLUserDefinedType(udt = classOf[ExampleSubTypeUDT])
+sealed trait IExampleSubType extends IExampleBaseType
+
+// a base class
+class ExampleBaseClass(override val field: Int) extends IExampleBaseType {
+  override def toString: String = field.toString
--- End diff --

@gmoehler I think we don't need `toString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16691
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71930/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16691
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16691
  
**[Test build #71930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71930/testReport)**
 for PR 16691 at commit 
[`0c24291`](https://github.com/apache/spark/commit/0c24291b2738d2c71b59decc60b9e33524b8f84d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16269
  
**[Test build #71932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71932/testReport)**
 for PR 16269 at commit 
[`48535aa`](https://github.com/apache/spark/commit/48535aae6be613c28f900e408f073a5eb7ef76cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16606


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16606
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16552


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71928/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16552
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16668#discussion_r97537116
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3406,3 +3406,28 @@ setMethod("randomSplit",
 }
 sapply(sdfs, dataFrame)
   })
+
+#' getNumPartitions
+#'
+#' Return the number of partitions
+#' Note: in order to compute the number of partition the SparkDataFrame 
has to be converted into a
+#' RDD temporarily internally.
+#'
+#' @param x A SparkDataFrame
+#' @family SparkDataFrame functions
+#' @aliases getNumPartitions,SparkDataFrame-method
+#' @rdname getNumPartitions
+#' @name getNumPartitions
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(cars, numPartitions = 2)
+#' getNumPartitions(df)
+#' }
+#' @note getNumPartitions since 2.1.1
+setMethod("getNumPartitions",
+  signature(x = "SparkDataFrame"),
+  function(x) {
+getNumPartitionsRDD(toRDD(x))
--- End diff --

maybe we can add this slow implementation to Spark 2.1, and improve it in 
Spark 2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71928/testReport)**
 for PR 16680 at commit 
[`0f7b9b8`](https://github.com/apache/spark/commit/0f7b9b8b17f79c83c920682f000d7c4eb4cda291).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...

2017-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15880
  
I have updated the PR title and description, and added a release_notes 
label in the ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71929/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16660
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16660
  
**[Test build #71929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71929/testReport)**
 for PR 16660 at commit 
[`7aed9a4`](https://github.com/apache/spark/commit/7aed9a4fada263785ce1d81acb31073ef7a401fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread gmoehler
Github user gmoehler commented on a diff in the pull request:

https://github.com/apache/spark/pull/16660#discussion_r97534883
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala ---
@@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSQLContext with ParquetT
 // call `collect` to make sure this query can pass analysis.
 pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect()
   }
+
+  test("SPARK-19311: UDFs disregard UDT type hierarchy") {
+UDTRegistration.register(classOf[IExampleBaseType].getName,
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...

2017-01-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15837
  
Close this as alternative one #16659 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15837: [SPARK-18395][SQL] Evaluate common subexpression ...

2017-01-24 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/15837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71931/testReport)**
 for PR 16677 at commit 
[`4fb5e40`](https://github.com/apache/spark/commit/4fb5e40d6aa77dafc0eb715730f5048a74d461d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16692: [SPARK-19335] Introduce UPSERT feature to SPARK

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16692
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16614: [SPARK-19260] Spaces or "%20" in path parameter are not ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16614
  
**[Test build #3549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3549/testReport)**
 for PR 16614 at commit 
[`23834a6`](https://github.com/apache/spark/commit/23834a6ec99ac7e8a8df9095ce523de0ede80aea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16692: [SPARK-19335] Introduce UPSERT feature to SPARK

2017-01-24 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/16692

[SPARK-19335] Introduce UPSERT feature to SPARK 

## What changes were proposed in this pull request?

This PR proposes to add the UPSERT feature support into SPARK through 
DataFrameWriter's JDBC data source options. 

For example:
If the mytable2 in mysql database have unique constraints on column c1, and 
the user wants to save the dataframe into the mysql database, it will fail with 
violation of unique constraints.

`val df = Seq((1,4)).toDF("c1","c2")`
`val url = "jdbc:mysql://9.30.167.220:3306/mydb"`
`df.write.mode(org.apache.spark.sql.SaveMode.Append)
.option("user","kevin").option("password","kevin").jdbc(url,"mytable2",new 
java.util.Properties())`

With this feature, the user can use this UPSERT options to write the 
dataframe into the mysql database table.

`df.write.mode(org.apache.spark.sql.SaveMode.Append)

.option(“upsert”,true).option(“upsertUpdateColumn”,”c1”).option("user","kevin").option("password","kevin").jdbc(url,"mytable2",new
 java.util.Properties())`

Here is the design doc.
[UPSERT DESIGN 
DOC](https://drive.google.com/open?id=1IoafDm78v7ATP-npKbTaw2_dTFiXplCd9NzEe8CBH6E)


## How was this patch tested?

Local test: run the test case from spark-shell and connect to MySQL and 
Postgresql database
Test case: add test cases in the existing test cases including docker 
integration suite
Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark upsert2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16692


commit 3b44c5978bd44db986621d3e8511e9165b66926b
Author: Kevin Yu 
Date:   2016-04-20T18:06:30Z

adding testcase

commit 18b4a31c687b264b50aa5f5a74455956911f738a
Author: Kevin Yu 
Date:   2016-04-22T21:48:00Z

Merge remote-tracking branch 'upstream/master'

commit 4f4d1c8f2801b1e662304ab2b33351173e71b427
Author: Kevin Yu 
Date:   2016-04-23T16:50:19Z

Merge remote-tracking branch 'upstream/master'
get latest code from upstream

commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e
Author: Kevin Yu 
Date:   2016-04-23T22:20:53Z

Merge remote-tracking branch 'upstream/master'
adding trim characters support

commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8
Author: Kevin Yu 
Date:   2016-04-25T20:24:33Z

Merge remote-tracking branch 'upstream/master'
get latest code for pr12646

commit 196b6c66b0d55232f427c860c0e7c6876c216a67
Author: Kevin Yu 
Date:   2016-04-25T23:45:57Z

Merge remote-tracking branch 'upstream/master'
merge latest code

commit f37a01e005f3e27ae2be056462d6eb6730933ba5
Author: Kevin Yu 
Date:   2016-04-27T14:15:06Z

Merge remote-tracking branch 'upstream/master'
merge upstream/master

commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0
Author: Kevin Yu 
Date:   2016-04-30T23:49:31Z

Merge remote-tracking branch 'upstream/master'

commit bde5820a181cf84e0879038ad8c4cebac63c1e24
Author: Kevin Yu 
Date:   2016-05-04T03:52:31Z

Merge remote-tracking branch 'upstream/master'

commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d
Author: Kevin Yu 
Date:   2016-05-10T21:14:50Z

Merge remote-tracking branch 'upstream/master'

commit 893a49af0bfd153ccb59ba50b63a232660e0eada
Author: Kevin Yu 
Date:   2016-05-13T18:20:39Z

Merge remote-tracking branch 'upstream/master'

commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d
Author: Kevin Yu 
Date:   2016-05-17T21:58:14Z

Merge remote-tracking branch 'upstream/master'

commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421
Author: Kevin Yu 
Date:   2016-05-18T06:37:13Z

Merge remote-tracking branch 'upstream/master'

commit 8c3e5da458dbff397ed60fcb68f2a46d87ab7ba4
Author: Kevin Yu 
Date:   2016-05-18T16:18:16Z

Merge remote-tracking branch 'upstream/master'

commit a0eaa408e847fbdc3ac5b26348588ee0a1e276c7
Author: Kevin Yu 
Date:   2016-05-19T04:28:20Z

Merge remote-tracking branch 'upstream/master'

commit d03c940ed89795fa7fe1d1e9f511363b22cdf19d
Author: Kevin Yu 
Date:   2016-05-19T21:24:33Z

Merge remote-tracking branch 'upstream/master'

commit d728d5e002082e571ac47292226eb8b2614f479f
Author: Kevin Yu 
Date:   2016-05-24T20:32:57Z

Merge remote-tracking branch 'upstream/master'

commit ea104ddfbf7d180ed1bc53dd9a1005010264aa1f
Author: Kevin Yu 
Date:   2016-05-25T22:52:57Z

Merge remote-tracking branch 'upstream/master'

commit 6ab1215b781ad0cccf1752f3a625b4e4e371c38e
Author: Kevin Yu 
Date:   2016-05-27T17:18:46Z

Merge remote-tracking branch 'upstream/master'

commit 0c566533705331697eb

[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16661#discussion_r97530588
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
 @Since("2.0.0")
 object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
 
+  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue 
*/
+  private[clustering] val MAX_NUM_FEATURES = 46000
--- End diff --

Is floor(sqrt(2^31-1)) = 46340 more accurate? or is there overhead that 
prevents this from being achievable? I know it's a corner case, but if 46000 is 
a number that's just "about" the real max, let's just use the real max.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12380: [SPARK-14623][ML]add label binarizer

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/12380
  
Isn't this just one-hot encoding? Spark has had this for a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16652
  
This is looking OK to me, but it needs a (squash, optionally, and) rebase 
now before it can test again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16654
  
Sure, and classification metrics like AUC only make sense for classifiers 
that output more than just a label -- they have to output a probability or 
score of some kind. Not every metric necessarily makes sense for every model, 
and we can use class hierarchy or just argument checking to avoid applying 
metrics where nonsensical. WSSSE can't be used for k-medoids, yes. k-medoids is 
also not in Spark, AFAIK. It's still not an argument to not abstract this at 
all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16676: delete useless var “j”

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16676


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16658: [DOCS] Fix typo in docs

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16658


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16676: delete useless var “j”

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16676
  
Merged to master. Please read http://spark.apache.org/contributing.html for 
next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16658: [DOCS] Fix typo in docs

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16658
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16355
  
Done, and it synced now. Merged to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16355
  
It's an apache-github sync issue:

https://github.com/apache/spark/commits/branch-2.1
is missing the latest commit from

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.1

I'll cherry-pick onto apache/branch-2.1 and push as that might also kick 
the sync to try again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71926/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #71926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71926/testReport)**
 for PR 15945 at commit 
[`bea519f`](https://github.com/apache/spark/commit/bea519f2ba12312ec96884c3545f74b3bc28c4a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71925/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16552
  
**[Test build #71925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71925/testReport)**
 for PR 16552 at commit 
[`59db8e4`](https://github.com/apache/spark/commit/59db8e41ec2f5a4e090af3964ce48a61936e2ef4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71924/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71922/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15880
  
**[Test build #71924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71924/testReport)**
 for PR 15880 at commit 
[`a11f89b`](https://github.com/apache/spark/commit/a11f89bf5ed13b4061a29daf007a608314465a94).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16606
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16606
  
**[Test build #71922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71922/testReport)**
 for PR 16606 at commit 
[`72164eb`](https://github.com/apache/spark/commit/72164eb02c1b7acd836a5038fddb8bcd8225a1c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71923/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16656: [SPARK-18116][DStream] Report stream input inform...

2017-01-24 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16656#discussion_r97519002
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -536,6 +539,7 @@ abstract class DStream[T: ClassTag] (
 logDebug(s"${this.getClass().getSimpleName}.readObject used")
 ois.defaultReadObject()
 generatedRDDs = new HashMap[Time, RDD[T]]()
+recoveredReports = new HashMap[Time, StreamInputInfo]()
   }
--- End diff --

use recoveredReports to hold recovered report information. We can not 
report to `inputInfoTracker`, as  jobScheduler not yet initialized here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16552
  
**[Test build #71923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71923/testReport)**
 for PR 16552 at commit 
[`7bf5b50`](https://github.com/apache/spark/commit/7bf5b50c5cfba1ecb02b95c2fa9bb1ae7830ca99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16660#discussion_r97518422
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala ---
@@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSQLContext with ParquetT
 // call `collect` to make sure this query can pass analysis.
 pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect()
   }
+
+  test("SPARK-19311: UDFs disregard UDT type hierarchy") {
+UDTRegistration.register(classOf[IExampleBaseType].getName,
--- End diff --

oh. if you worry about that, actually we have `UDTRegistrationSuite` for 
test case of `UDTRegistration`. i am fine to either `SQLUserDefinedType` or 
`UDTRegistration`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16691
  
**[Test build #71930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71930/testReport)**
 for PR 16691 at commit 
[`0c24291`](https://github.com/apache/spark/commit/0c24291b2738d2c71b59decc60b9e33524b8f84d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16691: [SPARK-19349][DStreams] Check resource ready to a...

2017-01-24 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16691#discussion_r97518189
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -422,16 +423,36 @@ class ReceiverTracker(ssc: StreamingContext, 
skipReceiverLaunch: Boolean = false
   }
 
   /**
-   * Run the dummy Spark job to ensure that all slaves have registered. 
This avoids all the
-   * receivers to be scheduled on the same node.
+   * Wait for executors register ready. This avoids multiple receivers to 
be scheduled
+   * on the same node. Here, we check whether all resource has been 
registered. If not,
+   * and the number of receiver is larger than the number of registered 
executors, we
+   * will give once more chance to wait for remaining executors to 
register for
+   * "spark.scheduler.maxRegisteredResourcesWaitingTime" times.
*
-   * TODO Should poll the executor number and wait for executors according 
to
-   * "spark.scheduler.minRegisteredResourcesRatio" and
-   * "spark.scheduler.maxRegisteredResourcesWaitingTime" rather than 
running a dummy job.
+   * This only occurs when set too small 
"spark.scheduler.minRegisteredResourcesRatio".
*/
-  private def runDummySparkJob(): Unit = {
+  private def checkResourceReady(): Unit = {
+val pollTime = 100
+val checkingStarted = System.currentTimeMillis()
+val onceMoreWaitingTimeMs =
+  
ssc.sparkContext.conf.getTimeAsMs("spark.scheduler.maxRegisteredResourcesWaitingTime",
 "30s")
+
--- End diff --

here use "spark.scheduler.maxRegisteredResourcesWaitingTime", need specific 
config? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16691: [SPARK-19349][DStreams] Check resource ready to a...

2017-01-24 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16691

[SPARK-19349][DStreams] Check resource ready to avoid multiple receivers to 
be scheduled on the same node.

## What changes were proposed in this pull request?

remove related TODO

Currently, we can only ensure registered resource satisfy the 
"spark.scheduler.minRegisteredResourcesRatio". But if 
"spark.scheduler.minRegisteredResourcesRatio" is set too small, receivers may 
still be scheduled to few nodes. In fact, we can give once more chance to wait 
for sufficient resource to schedule receiver evenly.

## How was this patch tested?

existing  ut


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19349

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16691


commit 0c24291b2738d2c71b59decc60b9e33524b8f84d
Author: uncleGen 
Date:   2017-01-24T10:37:15Z

SPARK-19349: Check resource ready to avoid multiple receivers to be 
scheduled on the same node.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16660
  
**[Test build #71929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71929/testReport)**
 for PR 16660 at commit 
[`7aed9a4`](https://github.com/apache/spark/commit/7aed9a4fada263785ce1d81acb31073ef7a401fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread gmoehler
Github user gmoehler commented on the issue:

https://github.com/apache/spark/pull/16660
  
Thanks for the valuable (and fast!) comments - i have worked them in. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread gmoehler
Github user gmoehler commented on a diff in the pull request:

https://github.com/apache/spark/pull/16660#discussion_r97512560
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala ---
@@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSQLContext with ParquetT
 // call `collect` to make sure this query can pass analysis.
 pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect()
   }
+
+  test("SPARK-19311: UDFs disregard UDT type hierarchy") {
+UDTRegistration.register(classOf[IExampleBaseType].getName,
--- End diff --

i tend to leave them, but remove the @SQLUserDefinedType, so we have a test 
that uses UDTRegistration


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71928/testReport)**
 for PR 16680 at commit 
[`0f7b9b8`](https://github.com/apache/spark/commit/0f7b9b8b17f79c83c920682f000d7c4eb4cda291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16676: delete useless var “j”

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16676
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16676: delete useless var “j”

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71927/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16676: delete useless var “j”

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16676
  
**[Test build #71927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71927/testReport)**
 for PR 16676 at commit 
[`cf8211a`](https://github.com/apache/spark/commit/cf8211a0057b5cc5652414eb96bb453c0e2618fa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Sure. Shall I add the tests in pkg/inst/tests/testthat/test_sparkSQL.R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97508643
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1136,9 +1136,17 @@ setMethod("collect",
 
   # Note that "binary" columns behave like complex types.
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
-vec <- do.call(c, col)
+valueIndex <- which(!is.na(col))
+if (length(valueIndex) > 0 && valueIndex[1] > 1) {
+  colTail <- col[-(1 : (valueIndex[1] - 1))]
+  vec <- do.call(c, colTail)
+  classVal <- class(vec)
+  vec <- c(rep(NA, valueIndex[1] - 1), vec)
+  class(vec) <- classVal
--- End diff --

Hmm, what happened here?
if you want to drop the NA and use the rest to infer the class you can do 
`col[!is.na(col)]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >