date:20180715

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21556
  
**[Test build #93014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93014/testReport)**
 for PR 21556 at commit 
[`e31c201`](https://github.com/apache/spark/commit/e31c2010fa7cd8ade77691b59940108465df4b54).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaSummarizerExample `
  * `  class SerializableConfiguration(@transient var value: Configuration)`
  * `  class IncompatibleSchemaException(msg: String, ex: Throwable = null) 
extends Exception(msg, ex)`
  * `  case class SchemaType(dataType: DataType, nullable: Boolean)`
  * `  implicit class AvroDataFrameWriter[T](writer: DataFrameWriter[T]) `
  * `  implicit class AvroDataFrameReader(reader: DataFrameReader) `
  * `class KMeansModel (@Since(\"1.0.0\") val clusterCenters: 
Array[Vector],`
  * `trait ComplexTypeMergingExpression extends Expression `
  * `case class Size(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `abstract class ArraySetLike extends 
BinaryArrayExpressionWithImplicitCast `
  * `case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93014/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/961/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21657
  
**[Test build #93015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93015/testReport)**
 for PR 21657 at commit 
[`81b3971`](https://github.com/apache/spark/commit/81b397140486fab7f7c2f7dcb15d5a9a62c99845).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-07-15 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21754
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/962/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20629
  
**[Test build #93016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93016/testReport)**
 for PR 20629 at commit 
[`926c353`](https://github.com/apache/spark/commit/926c35309e39b9137f6637a79f64bd22f6da84e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21740: [SPARK-18230][MLLib]Throw a better exception, if ...

2018-07-15 Thread shahidki31

Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21740#discussion_r202534384
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModelSuite.scala
 ---
@@ -72,6 +72,22 @@ class MatrixFactorizationModelSuite extends 
SparkFunSuite with MLlibTestSparkCon
 }
   }
 
+  test("invalid user and product") {
+val model = new MatrixFactorizationModel(rank, userFeatures, 
prodFeatures)
+assert(intercept[IllegalArgumentException]  {
--- End diff --

Thanks for the review. Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21740: [SPARK-18230][MLLib]Throw a better exception, if ...

2018-07-15 Thread shahidki31

Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21740#discussion_r202534469
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -75,10 +75,22 @@ class MatrixFactorizationModel @Since("0.8.0") (
 }
   }
 
+  /** Check for the invalid user. */
+  private def validateUser(user: Int): Unit = {
+require(userFeatures.lookup(user).nonEmpty, s"userId: $user not found 
in the model")
--- End diff --

I have renamed the method to 'validateAndGetUser', where it check, whether 
the user exist or not and it returns the corresponding user feature. Similarly 
for the product also.
Please let me know if anymore changes required.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...

2018-07-15 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21770#discussion_r202534515
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -318,7 +318,8 @@ case class SampleExec(
 v => s"""
   | $v = new $samplerClass($lowerBound, $upperBound, 
false);
   | $v.setSeed(${seed}L + partitionIndex);
- """.stripMargin.trim)
+ """.stripMargin.trim,
+forceInline = true)
--- End diff --

why do we need this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...

2018-07-15 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21770#discussion_r202534499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3758,7 +3758,10 @@ case class ArrayUnion(left: Expression, right: 
Expression) extends ArraySetLike
   } else {
 val arrayUnion = classOf[ArrayUnion].getName
 val et = ctx.addReferenceObj("elementTypeUnion", elementType)
-val order = ctx.addReferenceObj("orderingUnion", ordering)
+// Some data types (e.g., `BinaryType`) have anonymous classes for 
ordering and
+// `getCanonicalName` returns null in these classes. Therefore, we 
need to
+// explicitly set `className` here.
--- End diff --

nit: as we are adding this comment, shall we also mention that Janino works 
anyway, but JDK complains here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...

2018-07-15 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21770#discussion_r202534510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1585,6 +1585,9 @@ case class InitializeJavaBean(beanInstance: 
Expression, setters: Map[String, Exp
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+// Resolves setters before compilation
+require(resolvedSetters.nonEmpty)
--- End diff --

why do we need to add this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21748#discussion_r202534636
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -399,18 +426,18 @@ specific to Spark on Kubernetes.
   
 Path to the CA cert file for connecting to the Kubernetes API server 
over TLS from the driver pod when requesting
 executors. This file must be located on the submitting machine's disk, 
and will be uploaded to the driver pod.
-Specify this as a path as opposed to a URI (i.e. do not provide a 
scheme).
+Specify this as a path as opposed to a URI (i.e. do not provide a 
scheme). In client mode, use
+spark.kubernetes.authenticate.caCertFile instead.
   
 
 
   spark.kubernetes.authenticate.driver.clientKeyFile
   (none)
   
 Path to the client key file for authenticating against the Kubernetes 
API server from the driver pod when requesting
-executors. This file must be located on the submitting machine's disk, 
and will be uploaded to the driver pod.
-Specify this as a path as opposed to a URI (i.e. do not provide a 
scheme). If this is specified, it is highly
-recommended to set up TLS for the driver submission server, as this 
value is sensitive information that would be
-passed to the driver pod in plaintext otherwise.
--- End diff --

why remove 
```If this is specified, it is highly
recommended to set up TLS for the driver submission server, as this 
value is sensitive information that would be
   passed to the driver pod in plaintext otherwise.```?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21748#discussion_r202535016
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -117,6 +117,37 @@ If the local proxy is running at localhost:8001, 
`--master k8s://http://127.0.0.
 spark-submit. Finally, notice that in the above example we specify a jar 
with a specific URI with a scheme of `local://`.
 This URI is the location of the example jar that is already in the Docker 
image.
 
+## Client Mode
+
+Starting with Spark 2.4.0, it is possible to run Spark applications on 
Kubernetes in client mode. When running a Spark
+application in client mode, a separate pod is not deployed to run the 
driver. When running an application in
--- End diff --

could we add a bit here after `a separate pod is not deployed to run the 
driver` to say that the client/driver could be outside k8s or in k8s/in a pod?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r202533903
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,17 +368,49 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
+// Skip the launch process.
+logInfo(s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +
+  s"because the barrier taskSet requires ${taskSet.numTasks} 
slots, while the total " +
+  s"number of available slots is ${availableSlots}.")
+  } else {
+var launchedAnyTask = false
+var launchedTaskAtCurrentMaxLocality = false
+// Record all the executor IDs assigned barrier tasks on.
+val hosts = ArrayBuffer[String]()
+val taskDescs = ArrayBuffer[TaskDescription]()
+for (currentMaxLocality <- taskSet.myLocalityLevels) {
+  do {
+launchedTaskAtCurrentMaxLocality = 
resourceOfferSingleTaskSet(taskSet,
+  currentMaxLocality, shuffledOffers, availableCpus, tasks, 
hosts, taskDescs)
+launchedAnyTask |= launchedTaskAtCurrentMaxLocality
+  } while (launchedTaskAtCurrentMaxLocality)
+}
+if (!launchedAnyTask) {
+  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+}
+if (launchedAnyTask && taskSet.isBarrier) {
+  // Check whether the barrier tasks are partially launched.
+  // TODO handle the assert failure case (that can happen when 
some locality requirements
+  // are not fulfilled, and we should revert the launched tasks)
+  require(taskDescs.size == taskSet.numTasks,
+s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +
+  s"because only ${taskDescs.size} out of a total number of 
${taskSet.numTasks} " +
+  "tasks got resource offers. The resource offers may have 
been blacklisted or " +
+  "cannot fulfill task locality requirements.")
--- End diff --

how many attempts - would it fail continuously if some hosts are 
blacklisted?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r202533477
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1386,29 +1418,90 @@ class DAGScheduler(
   )
 }
   }
-  // Mark the map whose fetch failed as broken in the map stage
-  if (mapId != -1) {
-mapOutputTracker.unregisterMapOutput(shuffleId, mapId, 
bmAddress)
-  }
+}
 
-  // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
-  if (bmAddress != null) {
-val hostToUnregisterOutputs = if 
(env.blockManager.externalShuffleServiceEnabled &&
-  unRegisterOutputOnHostOnFetchFailure) {
-  // We had a fetch failure with the external shuffle service, 
so we
-  // assume all shuffle data on the node is bad.
-  Some(bmAddress.host)
-} else {
-  // Unregister shuffle data just for one executor (we don't 
have any
-  // reason to believe shuffle data has been lost for the 
entire host).
-  None
+  case failure: TaskFailedReason if task.isBarrier =>
+// Also handle the task failed reasons here.
+failure match {
+  case Resubmitted =>
+logInfo("Resubmitted " + task + ", so marking it as still 
running")
+stage match {
+  case sms: ShuffleMapStage =>
+sms.pendingPartitions += task.partitionId
+
+  case _ =>
+assert(false, "TaskSetManagers should only send 
Resubmitted task statuses for " +
+  "tasks in ShuffleMapStages.")
 }
-removeExecutorAndUnregisterOutputs(
-  execId = bmAddress.executorId,
-  fileLost = true,
-  hostToUnregisterOutputs = hostToUnregisterOutputs,
-  maybeEpoch = Some(task.epoch))
+
+  case _ => // Do nothing.
+}
+
+// Always fail the current stage and retry all the tasks when a 
barrier task fail.
+val failedStage = stageIdToStage(task.stageId)
+logInfo(s"Marking $failedStage (${failedStage.name}) as failed due 
to a barrier task " +
+  "failed.")
+val message = "Stage failed because a barrier task finished 
unsuccessfully. " +
+  s"${failure.toErrorString}"
+try {
+  // cancelTasks will fail if a SchedulerBackend does not 
implement killTask
+  taskScheduler.cancelTasks(stageId, interruptThread = false)
+} catch {
+  case e: UnsupportedOperationException =>
+// Cannot continue with barrier stage if failed to cancel 
zombie barrier tasks.
+logInfo(s"Could not cancel tasks for stage $stageId", e)
--- End diff --

logWarn?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r202533650
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,17 +368,49 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
+// Skip the launch process.
+logInfo(s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +
+  s"because the barrier taskSet requires ${taskSet.numTasks} 
slots, while the total " +
+  s"number of available slots is ${availableSlots}.")
--- End diff --

this could get starved forever? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r202535313
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1386,29 +1418,90 @@ class DAGScheduler(
   )
 }
   }
-  // Mark the map whose fetch failed as broken in the map stage
-  if (mapId != -1) {
-mapOutputTracker.unregisterMapOutput(shuffleId, mapId, 
bmAddress)
-  }
+}
 
-  // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
-  if (bmAddress != null) {
-val hostToUnregisterOutputs = if 
(env.blockManager.externalShuffleServiceEnabled &&
-  unRegisterOutputOnHostOnFetchFailure) {
-  // We had a fetch failure with the external shuffle service, 
so we
-  // assume all shuffle data on the node is bad.
-  Some(bmAddress.host)
-} else {
-  // Unregister shuffle data just for one executor (we don't 
have any
-  // reason to believe shuffle data has been lost for the 
entire host).
-  None
+  case failure: TaskFailedReason if task.isBarrier =>
+// Also handle the task failed reasons here.
+failure match {
+  case Resubmitted =>
+logInfo("Resubmitted " + task + ", so marking it as still 
running")
+stage match {
+  case sms: ShuffleMapStage =>
+sms.pendingPartitions += task.partitionId
+
+  case _ =>
+assert(false, "TaskSetManagers should only send 
Resubmitted task statuses for " +
+  "tasks in ShuffleMapStages.")
 }
-removeExecutorAndUnregisterOutputs(
-  execId = bmAddress.executorId,
-  fileLost = true,
-  hostToUnregisterOutputs = hostToUnregisterOutputs,
-  maybeEpoch = Some(task.epoch))
+
+  case _ => // Do nothing.
+}
+
+// Always fail the current stage and retry all the tasks when a 
barrier task fail.
+val failedStage = stageIdToStage(task.stageId)
+logInfo(s"Marking $failedStage (${failedStage.name}) as failed due 
to a barrier task " +
+  "failed.")
+val message = "Stage failed because a barrier task finished 
unsuccessfully. " +
+  s"${failure.toErrorString}"
--- End diff --

add task id of the failed barrier task? it would make it easier to root 
cause/find the error


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21556
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/963/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21556
  
**[Test build #93017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93017/testReport)**
 for PR 21556 at commit 
[`e31c201`](https://github.com/apache/spark/commit/e31c2010fa7cd8ade77691b59940108465df4b54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20629
  
**[Test build #93016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93016/testReport)**
 for PR 20629 at commit 
[`926c353`](https://github.com/apache/spark/commit/926c35309e39b9137f6637a79f64bd22f6da84e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-15 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202536141
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

@HyukjinKwon Our support receives a few "bug" reports per months. For now 
we can provide a link to the note at least. The warning itself is needed to our 
support engineers to detect such kind of problems from logs of already finished 
jobs. Actually customers do not say in their bug reports that files/jars 
weren't overwritten (it would be easier). They report problems like calling a 
method from a lib crashes due to incompatible signature of method or a class 
doesn't exists. Or final result of a Spark job is not correct because a 
config/resource files added via `addFile()`  is not up to date. Now I can 
detect the situation from logs and provide a link to docs for 
`addFile()`/`addJar()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93016/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536655
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536673
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2005,6 +2005,24 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-24681 checks if nested column names do not include ',', ':', 
and ';'") {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536743
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
+   * such a restriction. Views do not have such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+throw new AnalysisException("Cannot create a table having a 
nested column whose name " +
+  s"contains invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

oh..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93018/testReport)**
 for PR 21711 at commit 
[`fa0233e`](https://github.com/apache/spark/commit/fa0233e78b48aae0caac80d74e7e6dfd061d4c5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/964/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93019/testReport)**
 for PR 21711 at commit 
[`9fabeef`](https://github.com/apache/spark/commit/9fabeeff2aba46ea512ad28464b1140cd59f361b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/965/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93020/testReport)**
 for PR 21711 at commit 
[`482a0c0`](https://github.com/apache/spark/commit/482a0c0b15027c6986070c94c0bf3a967206f792).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93021/testReport)**
 for PR 21711 at commit 
[`37c9ce3`](https://github.com/apache/spark/commit/37c9ce325cc5a654b98dba72fd62eaee0539ab5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/966/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93022 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93022/testReport)**
 for PR 21711 at commit 
[`424ecba`](https://github.com/apache/spark/commit/424ecba1ea051a254491872e28e30479a48256cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/967/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21769
  
**[Test build #93023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93023/testReport)**
 for PR 21769 at commit 
[`bb1098f`](https://github.com/apache/spark/commit/bb1098f0143d3552d51ab5343e36819850330b81).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93024/testReport)**
 for PR 21711 at commit 
[`8a6465b`](https://github.com/apache/spark/commit/8a6465b2a62d8404820872a452682cc464cc37ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/968/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21706: [SPARK-24702] Fix Unable to cast to calendar interval in...

2018-07-15 Thread dmateusp

Github user dmateusp commented on the issue:

https://github.com/apache/spark/pull/21706
  
hey @gatorsmile, sorry to bother, could you just clarify the above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202537869
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names, 
including nested column names.
+   * Partition columns do not have such a restriction. Views do not have 
such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+val errMsg = "Cannot create a table having a nested column 
whose name contains " +
+  s"invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

This is a weird red highlight...the syntax seems to be correct to me (also, 
the test passed). Anything you know? @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93025/testReport)**
 for PR 21711 at commit 
[`b298522`](https://github.com/apache/spark/commit/b298522947fc70337131cdb6b8d0c1e6299eedd3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/969/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21769
  
**[Test build #93023 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93023/testReport)**
 for PR 21769 at commit 
[`bb1098f`](https://github.com/apache/spark/commit/bb1098f0143d3552d51ab5343e36819850330b81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21769
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93023/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21769
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21657
  
**[Test build #93015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93015/testReport)**
 for PR 21657 at commit 
[`81b3971`](https://github.com/apache/spark/commit/81b397140486fab7f7c2f7dcb15d5a9a62c99845).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93015/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric

2018-07-15 Thread dmateusp

Github user dmateusp commented on the issue:

https://github.com/apache/spark/pull/21766
  
Just checked out the PR,

```scala
scala> spark.sql("SELECT CAST(1 as NUMERIC)")
res0: org.apache.spark.sql.DataFrame = [CAST(1 AS DECIMAL(10,0)): 
decimal(10,0)]

scala> spark.sql("SELECT NUMERIC(1)")
org.apache.spark.sql.AnalysisException: Undefined function: 'NUMERIC'. This 
function is neither a registered temporary function nor a permanent function 
registered in the database 'default'.; line 1 pos 7
```

I imagine some tests could be added here:
  - 
`sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala`
  - `sql/core/src/test/resources/sql-tests/inputs/`

Do you think it's worth having a separate DataType or just have it as an 
alias?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion

2018-07-15 Thread dmateusp

Github user dmateusp commented on a diff in the pull request:

https://github.com/apache/spark/pull/21764#discussion_r202538924
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -46,7 +47,23 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
 
   protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations)
 
-  def batches: Seq[Batch] = {
+  protected def postAnalysisBatches: Seq[Batch] = {
+Batch("Eliminate Distinct", Once, EliminateDistinct) ::
+// Technically some of the rules in Finish Analysis are not optimizer 
rules and belong more
+// in the analyzer, because they are needed for correctness (e.g. 
ComputeCurrentTime).
+// However, because we also use the analyzer to canonicalized queries 
(for view definition),
--- End diff --

"to canonicalized" -> "to canonicalize" ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion

2018-07-15 Thread dmateusp

Github user dmateusp commented on a diff in the pull request:

https://github.com/apache/spark/pull/21764#discussion_r202539342
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -175,6 +179,35 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
* Override to provide additional rules for the operator optimization 
batch.
*/
   def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil
+
+  override def batches: Seq[Batch] = {
+val excludedRules =
+  
SQLConf.get.optimizerExcludedRules.toSeq.flatMap(_.split(",").map(_.trim).filter(!_.isEmpty))
+val filteredOptimizationBatches = if (excludedRules.isEmpty) {
+  optimizationBatches
+} else {
+  optimizationBatches.flatMap { batch =>
+val filteredRules =
+  batch.rules.filter { rule =>
+val exclude = excludedRules.contains(rule.ruleName)
+if (exclude) {
+  logInfo(s"Optimization rule '${rule.ruleName}' is excluded 
from the optimizer.")
+}
+!exclude
+  }
+if (batch.rules == filteredRules) {
--- End diff --

My understanding is that it is written that way to allow for logging


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion

2018-07-15 Thread dmateusp

Github user dmateusp commented on a diff in the pull request:

https://github.com/apache/spark/pull/21764#discussion_r202539784
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -127,6 +127,14 @@ object SQLConf {
 }
   }
 
+  val OPTIMIZER_EXCLUDED_RULES = 
buildConf("spark.sql.optimizer.excludedRules")
+.doc("Configures a list of rules to be disabled in the optimizer, in 
which the rules are " +
+  "specified by their rule names and separated by comma. It is not 
guaranteed that all the " +
+  "rules in this configuration will eventually be excluded, as some 
rules are necessary " +
--- End diff --

I don't understand the optimizer at a low level (I'd be one of those users 
for which it is a blackbox), would you think it would be feasible to enumerate 
the rules that cannot be excluded ? Maybe even logging a WARNING when 
validating the config parameters if it contains required rules


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion

2018-07-15 Thread dmateusp

Github user dmateusp commented on a diff in the pull request:

https://github.com/apache/spark/pull/21764#discussion_r202539843
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerRuleExclusionSuite.scala
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.internal.SQLConf.OPTIMIZER_EXCLUDED_RULES
+
+
+class OptimizerRuleExclusionSuite extends PlanTest {
--- End diff --

Any test case for when a required rule is being passed as a "to be 
excluded" rule ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93019 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93019/testReport)**
 for PR 21711 at commit 
[`9fabeef`](https://github.com/apache/spark/commit/9fabeeff2aba46ea512ad28464b1140cd59f361b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93019/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93018/testReport)**
 for PR 21711 at commit 
[`fa0233e`](https://github.com/apache/spark/commit/fa0233e78b48aae0caac80d74e7e6dfd061d4c5f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93018/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93022 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93022/testReport)**
 for PR 21711 at commit 
[`424ecba`](https://github.com/apache/spark/commit/424ecba1ea051a254491872e28e30479a48256cb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93022/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-15 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21589
  
> AFAIK, we always have num of executor ...

Not in all cases, Databricks clients can create auto-scaling clusters: 
https://docs.databricks.com/user-guide/clusters/sizing.html#cluster-size-and-autoscaling
 . For such cluster, we cannot get size of cluster  in term of cores via config 
parameters. We need methods that could return current state of a cluster. Any 
static configs don't work here because it leads to overloaded or underloaded 
clusters. 

> ...  and then num of core per executor right?

In general, number of cores per executor could be different. I don't think 
it is good idea to force user to perform complex calculation to get number of 
cores available in a cluster. 

> maybe we should have the getter factored the same way and probably named 
and described/documented similarly

@felixcheung I am not sure that our users are so interested in getting a 
list of cores per executors and calculate total numbers cores by summurizing 
the list. It will just complicate API and implementation, from my point of view.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-15 Thread liutang123

GitHub user liutang123 opened a pull request:

https://github.com/apache/spark/pull/21772

[SPARK-24809] [SQL] Serializing LongHashedRelation in executor may result 
in data error

When join key is long or int in broadcast join, Spark will use 
LongHashedRelation as the broadcast value. Details see SPARK-14419. But, if the 
broadcast value is abnormal big, executor will serialize it to disk. But, data 
will lost when serializing.
A flow chart [see](http://oi67.tinypic.com/2z5pzs7.jpg)

## What changes were proposed in this pull request?
Write cursor instead when serializing and setting cursor value when 
deserializing.

## How was this patch tested?
manual test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liutang123/spark SPARK-24809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21772.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21772


commit a72fe61863e119c0e902cef3054d9140b6d04f77
Author: liulijia 
Date:   2018-07-15T11:24:55Z

[SPARK-24809] [SQL] Serializing LongHashedRelation in executor may result 
in data error




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93025/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93025 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93025/testReport)**
 for PR 21711 at commit 
[`b298522`](https://github.com/apache/spark/commit/b298522947fc70337131cdb6b8d0c1e6299eedd3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21772
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21772
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21772
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-15 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21769#discussion_r202541358
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -680,12 +689,22 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 
   Files.createFile(new File(tempSaveDir, "non-avro").toPath)
 
-  val newDf = spark
-.read
-.option(AvroFileFormat.IgnoreFilesWithoutExtensionProperty, "true")
-.avro(tempSaveDir)
+  val count = try {
--- End diff --

Nit: consider writing the `try...finally` like this:
```
  val hadoopConf = spark.sqlContext.sparkContext.hadoopConfiguration
  try {
hadoopConf.set(AvroFileFormat.IgnoreFilesWithoutExtensionProperty, 
"true")
val count = spark.read.avro(tempSaveDir).count()
assert(count == 8)
  } finally {
hadoopConf.unset(AvroFileFormat.IgnoreFilesWithoutExtensionProperty)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93020/testReport)**
 for PR 21711 at commit 
[`482a0c0`](https://github.com/apache/spark/commit/482a0c0b15027c6986070c94c0bf3a967206f792).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93020/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93021/testReport)**
 for PR 21711 at commit 
[`37c9ce3`](https://github.com/apache/spark/commit/37c9ce325cc5a654b98dba72fd62eaee0539ab5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93021/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21762
  
**[Test build #93026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93026/testReport)**
 for PR 21762 at commit 
[`aa5e79e`](https://github.com/apache/spark/commit/aa5e79ec67fbe0be54678a75258cb1b02cf5c9c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21556
  
**[Test build #93017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93017/testReport)**
 for PR 21556 at commit 
[`e31c201`](https://github.com/apache/spark/commit/e31c2010fa7cd8ade77691b59940108465df4b54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaSummarizerExample `
  * `  class SerializableConfiguration(@transient var value: Configuration)`
  * `  class IncompatibleSchemaException(msg: String, ex: Throwable = null) 
extends Exception(msg, ex)`
  * `  case class SchemaType(dataType: DataType, nullable: Boolean)`
  * `  implicit class AvroDataFrameWriter[T](writer: DataFrameWriter[T]) `
  * `  implicit class AvroDataFrameReader(reader: DataFrameReader) `
  * `class KMeansModel (@Since(\"1.0.0\") val clusterCenters: 
Array[Vector],`
  * `trait ComplexTypeMergingExpression extends Expression `
  * `case class Size(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `abstract class ArraySetLike extends 
BinaryArrayExpressionWithImplicitCast `
  * `case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93017/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21711
  
**[Test build #93024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93024/testReport)**
 for PR 21711 at commit 
[`8a6465b`](https://github.com/apache/spark/commit/8a6465b2a62d8404820872a452682cc464cc37ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93024/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21762
  
**[Test build #93026 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93026/testReport)**
 for PR 21762 at commit 
[`aa5e79e`](https://github.com/apache/spark/commit/aa5e79ec67fbe0be54678a75258cb1b02cf5c9c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...

2018-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21762
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93026/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 523 matches

Mail list logo