[GitHub] spark pull request #20631: [SPARK-23454][SS][DOCS] Added trigger information...

2018-02-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20631#discussion_r168941119
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -1979,6 +2004,157 @@ which has methods that get called whenever there is 
a sequence of rows generated
 
 - Whenever `open` is called, `close` will also be called (unless the JVM 
exits due to some error). This is true even if `open` returns false. If there 
is any error in processing and writing the data, `close` will be called with 
the error. It is your responsibility to clean up state (e.g. connections, 
transactions, etc.) that have been created in `open` such that there are no 
resource leaks.
 
+ Triggers
+The trigger settings of a streaming query defines the timing of streaming 
data processing, whether
+the query is going to executed as micro-batch query with a fixed batch 
interval or as a continuous processing query.
+Here are the different kinds of triggers that are supported.
+
+
+  
+Trigger Type
+Description
+  
+  
+unspecified (default)
+
+If no trigger setting is explicitly specified, then by default, 
the query will be
+executed in micro-batch mode, where micro-batches will be 
generated as soon as
+the previous micro-batch has completed processing.
+
+  
+  
+Fixed interval micro-batches
+
+The query will be executed with micro-batches mode, where 
micro-batches will be kicked off
+at the user-specified intervals.
+
+  If the previous micro-batch completes within the interval, 
then the engine will wait until
+  the interval is over before kicking off the next 
micro-batch.
+
+  If the previous micro-batch takes longer than the interval 
to complete (i.e. if an
+  interval boundary is missed), then the next micro-batch will 
start as soon as the
+  previous one completes (i.e., it will not wait for the next 
interval boundary).
+
+  If no new data is available, then no micro-batch will be 
kicked off.
+
+
+  
+  
+One-time micro-batch
+
+The query will execute *only one* micro-batch to process all the 
available data and then
+stop on its own. This is useful in scenarios you want to 
periodically spin up a cluster,
+process everything that is available since the last period, and 
then the shutdown the
+cluster. In some case, this may lead to significant cost savings.
+
+  
+  
+Continuous with fixed checkpoint 
interval(experimental)
+
+The query will be executed in the new low-latency, continuous 
processing mode. Read more
+about this in the Continuous Processing section 
below.
+
+  
+
+
+Here are a few code examples.
+
+
+
+
+{% highlight scala %}
+import org.apache.spark.sql.streaming.Trigger
+
+// Default trigger (runs micro-batch as soon as it can)
+df.writeStream
+  .format("console")
+  .start()
+
+// ProcessingTime trigger with two-second micro-batch interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.ProcessingTime("2 seconds"))
+  .start()
+
+// One-time trigger
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Once())
+  .start()
+
+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Continuous())
+  .start()
+
+{% endhighlight %}
+
+
+
+
+
+{% highlight java %}
+import org.apache.spark.sql.streaming.Trigger
+
+// Default trigger (runs micro-batch as soon as it can)
+df.writeStream
+  .format("console")
+  .start();
+
+// ProcessingTime trigger with two-second micro-batch interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.ProcessingTime("2 seconds"))
+  .start();
+
+// One-time trigger
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Once())
+  .start();
+
+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+  .format("console")
+  .trigger(Trigger.Continuous())
+  .start();
+
+{% endhighlight %}
+
+
+
+
+{% highlight python %}
+
+# Default trigger (runs micro-batch as soon as it can)
+df.writeStream \
+  .format("console") \
+  .start()
+
+# ProcessingTime trigger with two-second micro-batch interval
+df.writeStream \
+  .format("console") \
+  .trigger(processingTime='2 seconds') \
+  .start()
+
+# One-time trigger
+df.writeStream \
+  .format("console") \
+  .trigge

[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87534/testReport)**
 for PR 20619 at commit 
[`8bd02d8`](https://github.com/apache/spark/commit/8bd02d8692708ab58e31e19a3682af3a94550369).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/953/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20619
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87533/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87533/testReport)**
 for PR 20619 at commit 
[`8bd02d8`](https://github.com/apache/spark/commit/8bd02d8692708ab58e31e19a3682af3a94550369).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-02-17 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/20442
  
Thanks for the comments. I am in China now for Chinese New Year. Will 
address the comments when I get back to work on 2/21.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87532/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87532/testReport)**
 for PR 20387 at commit 
[`1a603db`](https://github.com/apache/spark/commit/1a603dbe5528b447bff371d2e00abdbdee664a75).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87533/testReport)**
 for PR 20619 at commit 
[`8bd02d8`](https://github.com/apache/spark/commit/8bd02d8692708ab58e31e19a3682af3a94550369).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/952/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

2018-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r168936300
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite {
 val (indices, values) = map.toSeq.sortBy(_._1).unzip
 Vectors.sparse(size, indices.toArray, values.toArray)
   }
+
+  /** Generate a label. */
+  private def generateLabel(rnd: Random, numClasses: Int): Double = {
+rnd.nextInt(numClasses)
+  }
+
+  /** Generate a numeric value in the range [numericMin, numericMax]. */
+  private def generateNumericValue(rnd: Random, numericMin: Double, 
numericMax: Double) : Double = {
+rnd.nextDouble() * (Math.abs(numericMax) + Math.abs(numericMin)) + 
numericMin
+  }
+
+  /** Generate a binary value. */
+  private def generateBinaryValue(rnd: Random) : Double = if 
(rnd.nextBoolean()) 1 else 0
+
+  /** Generate an array of binary values of length numBinary. */
+  private def generateBinaryArray(rnd: Random, numBinary: Int): 
Array[Double] = {
+Range.apply(0, numBinary).map(_ => generateBinaryValue(rnd)).toArray
+  }
+
+  /** Generate an array of binary values of length numNumeric in the range
+* [numericMin, numericMax]. */
+  private def generateNumericArray(rnd: Random,
+   numNumeric: Int,
+   numericMin: Double,
+   numericMax: Double) : Array[Double] = {
+Range.apply(0, numNumeric).map(_ => generateNumericValue(rnd, 
numericMin, numericMax)).toArray
+  }
+
+  /** Generate a LabeledPoint with numNumeric numeric values followed by 
numBinary binary values. */
+  private def generatePoint(rnd: Random,
+numClasses: Int,
+numNumeric: Int = 10,
+numericMin: Double = -100,
+numericMax: Double = 100,
+numBinary: Int = 10): LabeledPoint = {
+val label = generateLabel(rnd, numClasses)
+val numericArray = generateNumericArray(rnd, numNumeric, numericMin, 
numericMax)
+val binaryArray = generateBinaryArray(rnd, numBinary)
+val vector = Vectors.dense(numericArray ++ binaryArray)
+
+new LabeledPoint(label, vector)
+  }
+
+  /** Data for tree redundancy tests which produces a non-trivial tree. */
+  private def generateRedundantPoints(sc: SparkContext,
+  numClasses: Int,
+  numPoints: Int,
+  rnd: Random): RDD[LabeledPoint] = {
+sc.parallelize(Range.apply(1, numPoints)
+  .map(_ => generatePoint(rnd, numClasses = numClasses, numNumeric = 
0, numBinary = 3)))
+  }
+
+  /**
+* Returns true iff the decision tree has at least one subtree that can 
be pruned
+* (i.e., all its leaves share the same prediction).
+* @param tree the tree to be tested
+* @return true iff the decision tree has at least one subtree that can 
be pruned.
+*/
+  private def isRedundant(tree: DecisionTreeModel): Boolean = 
_isRedundant(tree.rootNode)
+
+  /**
+* Returns true iff the node has at least one subtree that can be pruned
+* (i.e., all its leaves share the same prediction).
+* @param n the node to be tested
+* @return true iff the node has at least one subtree that can be 
pruned.
+*/
+  private def _isRedundant(n: Node): Boolean = n match {
+case n: InternalNode =>
+  _isRedundant(n.leftChild) || _isRedundant(n.leftChild) || 
canBePruned(n)
+case _ => false
+  }
+
+  /**
+* Returns true iff the subtree rooted at the given node can be pruned
+* (i.e., all its leaves share the same prediction).
+* @param n the node to be tested
+* @return returns true iff the subtree rooted at the given node can be 
pruned.
+*/
+  private def canBePruned(n: Node): Boolean = n match {
+case n: InternalNode =>
+  (leafPredictions(n.leftChild) ++ leafPredictions(n.rightChild)).size 
== 1
+case _ => false
+  }
+
+  /**
+* Given a node, the method returns the set of predictions appearing in 
the subtree rooted at it.
+* @return the set of predictions appearing in the subtree rooted at 
the given node.
+*/
+  private def leafPredictions(n: Node): Set[Double] = n match {
--- End diff --

I suppose I haven't read this code thoroughly, but rather than write 
similar logic differently in the test, to test the original logic, why not just 
construct cases with known correct answers and comp

[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

2018-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r168936292
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode(
 }
   }
 
+  /**
+   * Method testing whether a node is a leaf.
+   * @return true iff a node is a leaf.
+   */
+  private def isLeafNode(): Boolean = leftChild.isEmpty && 
rightChild.isEmpty
+
+  /** True iff the node should be a leaf. */
+  private lazy val shouldBeLeaf: Boolean = leafPredictions.size == 1
+
+  /**
+   * Returns the set of (leaf) predictions appearing in the subtree rooted 
at the considered node.
+   * @return the set of (leaf) predictions appearing in the subtree rooted 
at the given node.
+   */
+  private def leafPredictions: Set[Double] = {
--- End diff --

Yeah, and short-circuiting involves an `if` statement anyway, so maybe no 
value in my previous suggestion about `foreach`. 

Now this saves the whole `Set[Double]` of predictions when it's never used 
again. It's probably worth not hogging that memory. The method here doesn't 
actually need to form the set of all predictions, just decide whether there's 
more than 1. For example, in the case that there are no distinct values after 
evaluating children, no point in adding the leaf's prediction to it just to 
decide it's got 1 unique value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
The final failure is irrelevant to this.
```
org.apache.spark.sql.sources.CreateTableAsSelectSuite.(It is not a test it 
is a sbt.testing.SuiteSelector)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

2018-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r168936195
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite {
 val (indices, values) = map.toSeq.sortBy(_._1).unzip
 Vectors.sparse(size, indices.toArray, values.toArray)
   }
+
+  /** Generate a label. */
+  private def generateLabel(rnd: Random, numClasses: Int): Double = {
+rnd.nextInt(numClasses)
+  }
+
+  /** Generate a numeric value in the range [numericMin, numericMax]. */
+  private def generateNumericValue(rnd: Random, numericMin: Double, 
numericMax: Double) : Double = {
+rnd.nextDouble() * (Math.abs(numericMax) + Math.abs(numericMin)) + 
numericMin
--- End diff --

Oh right, disregard that. I had in mind that they were ints.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

2018-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r168936124
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode(
 }
   }
 
+  /**
+   * Method testing whether a node is a leaf.
+   * @return true iff a node is a leaf.
+   */
+  private def isLeafNode(): Boolean = leftChild.isEmpty && 
rightChild.isEmpty
+
+  /** True iff the node should be a leaf. */
+  private lazy val shouldBeLeaf: Boolean = leafPredictions.size == 1
+
+  /**
+   * Returns the set of (leaf) predictions appearing in the subtree rooted 
at the considered node.
+   * @return the set of (leaf) predictions appearing in the subtree rooted 
at the given node.
+   */
+  private def leafPredictions: Set[Double] = {
+
+val predBuffer = new scala.collection.mutable.HashSet[Double]
+
+// collect the (leaf) predictions in the left subtree, if any
+if (leftChild.isDefined) {
--- End diff --

No, I'm just thinking you use `Option.foreach(...)` twice. It's a matter of 
taste I guess. I tend to prefer avoiding the `isDefined` and `get` calls for 
simple one-liners.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168935309
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,34 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
+  // SPARK-23436: Casting the string to date may still return null if 
a bad Date is provided.
+  // This can happen since DateFormat.parse  may not use the entire 
text of the given string:
+  // so if there are extra-characters after the date, it returns 
correctly.
+  // We need to check that we can cast the raw string since we later 
can use Cast to get
+  // the partition values with the right DataType (see
+  // 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning)
+  val dateValue = Cast(Literal(raw), DateType).eval()
+  // Disallow DateType if the cast returned null
+  require(dateValue != null)
+  Literal.create(dateValue, DateType)
+}
+
+val timestampTry = Try {
+  val unescapedRaw = unescapePathName(raw)
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // TimestampType
+  
DateTimeUtils.getThreadLocalTimestampFormat(timeZone).parse(unescapedRaw)
--- End diff --

`inferPartitioning` will use `PartitioningUtils.parsePartitions` to infer 
the partition spec if there is no `userPartitionSchema`. It is used by 
`DataSource.sourceSchema`. Seems this change makes the partition directory 
previously parsing-able now unable to parse. Will it change behavior of other 
code path?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87531/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87531/testReport)**
 for PR 20057 at commit 
[`0e646b2`](https://github.com/apache/spark/commit/0e646b20f0be1580cfb8565908c98b57ec248941).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87532/testReport)**
 for PR 20387 at commit 
[`1a603db`](https://github.com/apache/spark/commit/1a603dbe5528b447bff371d2e00abdbdee664a75).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
Thanks for the update! Enjoy your vacation, and thanks for letting me know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/951/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-17 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r168933227
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -17,130 +17,55 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
-import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
AttributeMap, AttributeSet, Expression, NamedExpression, PredicateHelper}
-import org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, 
AttributeSet}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
 import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.datasources.DataSourceStrategy
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.sources.v2.reader._
 
-/**
- * Pushes down various operators to the underlying data source for better 
performance. Operators are
- * being pushed down with a specific order. As an example, given a LIMIT 
has a FILTER child, you
- * can't push down LIMIT if FILTER is not completely pushed down. When 
both are pushed down, the
- * data source should execute FILTER before LIMIT. And required columns 
are calculated at the end,
- * because when more operators are pushed down, we may need less columns 
at Spark side.
- */
-object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with 
PredicateHelper {
-  override def apply(plan: LogicalPlan): LogicalPlan = {
-// Note that, we need to collect the target operator along with 
PROJECT node, as PROJECT may
-// appear in many places for column pruning.
-// TODO: Ideally column pruning should be implemented via a plan 
property that is propagated
-// top-down, then we can simplify the logic here and only collect 
target operators.
-val filterPushed = plan transformUp {
-  case FilterAndProject(fields, condition, r @ DataSourceV2Relation(_, 
reader)) =>
-val (candidates, nonDeterministic) =
-  splitConjunctivePredicates(condition).partition(_.deterministic)
-
-val stayUpFilters: Seq[Expression] = reader match {
-  case r: SupportsPushDownCatalystFilters =>
-r.pushCatalystFilters(candidates.toArray)
-
-  case r: SupportsPushDownFilters =>
-// A map from original Catalyst expressions to corresponding 
translated data source
-// filters. If a predicate is not in this map, it means it 
cannot be pushed down.
-val translatedMap: Map[Expression, sources.Filter] = 
candidates.flatMap { p =>
-  DataSourceStrategy.translateFilter(p).map(f => p -> f)
-}.toMap
-
-// Catalyst predicate expressions that cannot be converted to 
data source filters.
-val nonConvertiblePredicates = 
candidates.filterNot(translatedMap.contains)
-
-// Data source filters that cannot be pushed down. An 
unhandled filter means
-// the data source cannot guarantee the rows returned can pass 
the filter.
-// As a result we must return it so Spark can plan an extra 
filter operator.
-val unhandledFilters = 
r.pushFilters(translatedMap.values.toArray).toSet
-val unhandledPredicates = translatedMap.filter { case (_, f) =>
-  unhandledFilters.contains(f)
-}.keys
-
-nonConvertiblePredicates ++ unhandledPredicates
-
-  case _ => candidates
-}
-
-val filterCondition = (stayUpFilters ++ 
nonDeterministic).reduceLeftOption(And)
-val withFilter = filterCondition.map(Filter(_, r)).getOrElse(r)
-if (withFilter.output == fields) {
-  withFilter
-} else {
-  Project(fields, withFilter)
-}
-}
-
-// TODO: add more push down rules.
-
-val columnPruned = pushDownRequiredColumns(filterPushed, 
filterPushed.outputSet)
-// After column pruning, we may have redundant PROJECT nodes in the 
query plan, remove them.
-RemoveRedundantProject(columnPruned)
-  }
-
-  // TODO: nested fields pruning
-  private def pushDownRequiredColumns(
-  plan: LogicalPlan, requiredByParent: AttributeSet): LogicalPlan = {
-plan match {
-  case p @ Project(projectList, child) =>
-val required = projectList.flatMap(_.references)
-p.copy(child = pushDownRequiredColumns(child, 
AttributeSet(required)))
-
-  case f @ Filter(condition, child) =>
-val required = requiredByParent

[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87526/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87526/testReport)**
 for PR 20621 at commit 
[`8698f4d`](https://github.com/apache/spark/commit/8698f4de7b6e1b9453a12bc949ce8666d6322a87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87528/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87528/testReport)**
 for PR 20619 at commit 
[`8bd02d8`](https://github.com/apache/spark/commit/8bd02d8692708ab58e31e19a3682af3a94550369).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20634
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20634
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20634
  
**[Test build #87525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87525/testReport)**
 for PR 20634 at commit 
[`bde6818`](https://github.com/apache/spark/commit/bde681816bc8251010aa14fd0f4fb27b732fd061).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87527/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87527/testReport)**
 for PR 20619 at commit 
[`e08d06c`](https://github.com/apache/spark/commit/e08d06c0e6c0cf23178d12baaa5eb00d55f9b456).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87531/testReport)**
 for PR 20057 at commit 
[`0e646b2`](https://github.com/apache/spark/commit/0e646b20f0be1580cfb8565908c98b57ec248941).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168930195
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect {
 s"SELECT 1 FROM $table LIMIT 1"
   }
 
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
   /**
-  * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
-  * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
-  * the Postgres dialect adds 'ONLY' to truncate only the table in question
-  * @param table The name of the table.
-  * @return The SQL query to use for truncating a table
-  */
-  override def getTruncateQuery(table: String): String = {
-s"TRUNCATE TABLE ONLY $table"
+   * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
+   * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
+   * the Postgres dialect adds 'ONLY' to truncate only the table in 
question
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the value of
+   *isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+cascade match {
+  case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE"
--- End diff --

I believe you already these this in PostgreSQL.
To make it sure for us about using these `ONLY` and `CASCADE` at the same 
time, could you share us the result of a manual test of this with PostgreSQL 
tables, descendant, and foreign-key referenced tables here?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87530/testReport)**
 for PR 20057 at commit 
[`7e09f90`](https://github.com/apache/spark/commit/7e09f90ab5625d95d060949a371f03afb50b15cb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87530/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87530/testReport)**
 for PR 20057 at commit 
[`7e09f90`](https://github.com/apache/spark/commit/7e09f90ab5625d95d060949a371f03afb50b15cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Thanks again @dongjoon-hyun, much appreciated 👍. I've made changes 
accordingly, and added comments in this PR where appropriate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929874
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1372,6 +1372,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with care. This option applies only to writing. 
It defaults to the default cascading truncate behaviour of the JDBC database in 
question, specified in the isCascadeTruncate in each JDBCDialect.
--- End diff --

Yeah, so the "problem" there is that PostgreSQL has inheritance across 
tables (as we discovered in 
[SPARK-22729](https://github.com/apache/spark/pull/19911). To be completely 
transparent to users, I think the `ONLY` part of the query for PostgreSQL 
should also be configurable, but I think that's outside the scope of this PR. 
I've added a comment saying that for PostgreSQL a `ONLY` is added to prevent 
descendant tables from being truncated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87529/testReport)**
 for PR 20057 at commit 
[`b79becd`](https://github.com/apache/spark/commit/b79becd188f67202dd23796456c512e544cd).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87529/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect {
 case _ => value
   }
 
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929792
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala ---
@@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for MsSql 
as it is unsupported.
--- End diff --

Hmm, I disagree with this, the reason I added it was to show users that 
even if they specified `cascadeTruncate` being `true`, it doesn't take effect 
for some databases (like MsSQL). The `isCascadingTruncateTable` only shows the 
default behaviour, which for all databases at the moment is false. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87529/testReport)**
 for PR 20057 at commit 
[`b79becd`](https://github.com/apache/spark/commit/b79becd188f67202dd23796456c512e544cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87528/testReport)**
 for PR 20619 at commit 
[`8bd02d8`](https://github.com/apache/spark/commit/8bd02d8692708ab58e31e19a3682af3a94550369).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/950/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23457][SQL] Register task completion liste...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168929352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -395,16 +395,21 @@ class ParquetFileFormat
 
ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration, 
pushed.get)
   }
   val taskContext = Option(TaskContext.get())
-  val parquetReader = if (enableVectorizedReader) {
+  if (enableVectorizedReader) {
 val vectorizedReader = new VectorizedParquetRecordReader(
   convertTz.orNull, enableOffHeapColumnVector && 
taskContext.isDefined, capacity)
+val iter = new RecordReaderIterator(vectorizedReader)
+// SPARK-23457 Register a task completion lister before 
`initialization`.
--- End diff --

Now, `SPARK-23457` is added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Oh, @kiszk .  The following meat really `comment` in the code. Sorry, I 
misunderstood.
> Would it be worth to add this JIRA number in a comment as we did for ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87527/testReport)**
 for PR 20619 at commit 
[`e08d06c`](https://github.com/apache/spark/commit/e08d06c0e6c0cf23178d12baaa5eb00d55f9b456).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/949/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Thank you, @kiszk . I added SPARK-23390 in the PR description.
> Would it be worth to add this JIRA number in a comment as we did for ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/948/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87526/testReport)**
 for PR 20621 at commit 
[`8698f4d`](https://github.com/apache/spark/commit/8698f4de7b6e1b9453a12bc949ce8666d6322a87).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20619
  
LGTM with one minor comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Thank you for review, @mgaido91 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20057
  
I finished my second round. Could you update once more?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168928044
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -120,11 +121,13 @@ abstract class JdbcDialect extends Serializable {
* The SQL query that should be used to truncate a table. Dialects can 
override this method to
* return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other 
tables.
-   * @param table The name of the table.
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation
* @return The SQL query to use for truncating a table
*/
   @Since("2.3.0")
-  def getTruncateQuery(table: String): String = {
+  def getTruncateQuery(table: String,
+   cascade: Option[Boolean] = 
isCascadingTruncateTable): String = {
--- End diff --

For this one, we need to keep the original one for backward compatibility 
like the following?
```
  @Since("2.3.0")
  def getTruncateQuery(table: String): String = {
  getTruncateQuery(table, isCascadingTruncateTable)
  }

  @Since("2.4.0")
  def getTruncateQuery(table: String, cascade: Option[Boolean] = 
isCascadingTruncateTable)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927868
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1372,6 +1372,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with care. This option applies only to writing. 
It defaults to the default cascading truncate behaviour of the JDBC database in 
question, specified in the isCascadeTruncate in each JDBCDialect.
--- End diff --

Ur, for PostgreSQL, we are generating `TRUNCATE TABLE ONLY $table CASCADE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927785
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala ---
@@ -64,7 +64,16 @@ private class AggregatedDialect(dialects: 
List[JdbcDialect]) extends JdbcDialect
 }
   }
 
-  override def getTruncateQuery(table: String): String = {
-dialects.head.getTruncateQuery(table)
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
--- End diff --

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927699
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect {
 case _ => value
   }
 
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
--- End diff --

Could you move this function after `isCascadingTruncateTable` like the 
other files?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect {
 case _ => value
   }
 
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927631
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala ---
@@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for MsSql 
as it is unsupported.
--- End diff --

And, for all dialects, please remove it or move it consistently.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927633
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala ---
@@ -46,4 +46,17 @@ private case object MySQLDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927608
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala ---
@@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for MsSql 
as it is unsupported.
--- End diff --

I think we can ignore `Ignored for MsSql as it is unsupported.` here.
If you want, you can make a function comment of `override def 
isCascadingTruncateTable(): Option[Boolean] = Some(false)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927597
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala ---
@@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
--- End diff --

`The JDBCOptions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-02-17 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20619
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927546
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala ---
@@ -49,4 +49,17 @@ private object DB2Dialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
--- End diff --

`The JDBCOptions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927495
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -120,11 +121,13 @@ abstract class JdbcDialect extends Serializable {
* The SQL query that should be used to truncate a table. Dialects can 
override this method to
* return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other 
tables.
-   * @param table The name of the table.
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation
* @return The SQL query to use for truncating a table
*/
   @Since("2.3.0")
-  def getTruncateQuery(table: String): String = {
+  def getTruncateQuery(table: String,
+   cascade: Option[Boolean] = 
isCascadingTruncateTable): String = {
--- End diff --

@danielvdende .
Could you take a look the guide? This guide is helpful in Apache Spark 
community.
- https://github.com/databricks/scala-style-guide


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927408
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala ---
@@ -41,4 +41,16 @@ private object DerbyDialect extends JdbcDialect {
   Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL))
 case _ => None
   }
+
+  /**
+  * The SQL query used to truncate a table.
+  * @param table The JDBCOptions.
+  * @param cascade Whether or not to cascade the truncation. Default value 
is the
+  *value of isCascadingTruncateTable(). Ignored for Derby 
as it is unsupported.
+  * @return The SQL query to use for truncating a table
+  */
+  override def getTruncateQuery(table: String,
+cascade: Option[Boolean] = 
isCascadingTruncateTable): String = {
--- End diff --

indentation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927400
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -102,7 +102,12 @@ object JdbcUtils extends Logging {
 val dialect = JdbcDialects.get(options.url)
 val statement = conn.createStatement
 try {
-  statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  if (options.isCascadeTruncate.isDefined) {
+statement.executeUpdate(dialect.getTruncateQuery(options.table,
+  options.isCascadeTruncate))
+  } else {
+statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  }
--- End diff --

I see.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168927406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala ---
@@ -41,4 +41,16 @@ private object DerbyDialect extends JdbcDialect {
   Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL))
 case _ => None
   }
+
+  /**
+  * The SQL query used to truncate a table.
--- End diff --

Indentation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...

2018-02-17 Thread stoader
Github user stoader commented on the issue:

https://github.com/apache/spark/pull/19775
  
@erikerlandson we tested this on Kubernetes using 
https://github.com/prometheus/pushgateway/tree/v0.3.1 and 
https://github.com/prometheus/pushgateway/tree/v0.4.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20634
  
**[Test build #87525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87525/testReport)**
 for PR 20634 at commit 
[`bde6818`](https://github.com/apache/spark/commit/bde681816bc8251010aa14fd0f4fb27b732fd061).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Yep. @kiszk . @mgaido91 also reports that, so I'm investigating that more.

However, that doesn't mean this approach is not proper. You can see the 
manual test case example in previous ORC-related PR and this PR. This approach 
definitely reduces the number of point of failures. 

For the remaining issue, I think we may need a different approach in a 
different code path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listeners fi...

2018-02-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
For the following, I'll.
> Would it be worth to add this JIRA number in a comment as we did for ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20634
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/947/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20634
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20634: [SPARK-23456][SPARK-21783] Turn on `native` ORC i...

2018-02-17 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20634

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default

## What changes were proposed in this pull request?

Apache Spark 2.3 introduced `native` ORC supports with vectorization and 
many fixes. However, it's shipped as a not-default option. This PR enables 
`native` ORC implementation and predicate-pushdown by default for Apache Spark 
2.4. We will improve and stabilize ORC data source before Apache Spark 2.4. 
And, eventually, Apache Spark will drop old Hive-based ORC code.

## How was this patch tested?

Pass the Jenkins with existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23456

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20634


commit bde681816bc8251010aa14fd0f4fb27b732fd061
Author: Dongjoon Hyun 
Date:   2018-02-17T16:56:31Z

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168925882
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,34 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
+  // SPARK-23436: Casting the string to date may still return null if 
a bad Date is provided.
+  // This can happen since DateFormat.parse  may not use the entire 
text of the given string:
+  // so if there are extra-characters after the date, it returns 
correctly.
+  // We need to check that we can cast the raw string since we later 
can use Cast to get
+  // the partition values with the right DataType (see
+  // 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning)
+  val dateValue = Cast(Literal(raw), DateType).eval()
+  // Disallow DateType if the cast returned null
+  require(dateValue != null)
+  Literal.create(dateValue, DateType)
+}
+
+val timestampTry = Try {
+  val unescapedRaw = unescapePathName(raw)
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // TimestampType
+  
DateTimeUtils.getThreadLocalTimestampFormat(timeZone).parse(unescapedRaw)
--- End diff --

sorry, I am not sure I got 100% your question, may you elaborate it a bit 
more please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87524/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87524/testReport)**
 for PR 20057 at commit 
[`3921368`](https://github.com/apache/spark/commit/39213688b9389e1a2aa4763a8a09c3d1ca402e6e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168925339
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,34 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
+  // SPARK-23436: Casting the string to date may still return null if 
a bad Date is provided.
+  // This can happen since DateFormat.parse  may not use the entire 
text of the given string:
+  // so if there are extra-characters after the date, it returns 
correctly.
+  // We need to check that we can cast the raw string since we later 
can use Cast to get
+  // the partition values with the right DataType (see
+  // 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning)
+  val dateValue = Cast(Literal(raw), DateType).eval()
+  // Disallow DateType if the cast returned null
+  require(dateValue != null)
+  Literal.create(dateValue, DateType)
+}
+
+val timestampTry = Try {
+  val unescapedRaw = unescapePathName(raw)
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // TimestampType
+  
DateTimeUtils.getThreadLocalTimestampFormat(timeZone).parse(unescapedRaw)
--- End diff --

Since this changes the behavior of `PartitioningUtils.parsePartitions`, 
doesn't it change the result of another path in `inferPartitioning`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

2018-02-17 Thread asolimando
Github user asolimando commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r168925267
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite {
 val (indices, values) = map.toSeq.sortBy(_._1).unzip
 Vectors.sparse(size, indices.toArray, values.toArray)
   }
+
+  /** Generate a label. */
+  private def generateLabel(rnd: Random, numClasses: Int): Double = {
+rnd.nextInt(numClasses)
+  }
+
+  /** Generate a numeric value in the range [numericMin, numericMax]. */
+  private def generateNumericValue(rnd: Random, numericMin: Double, 
numericMax: Double) : Double = {
+rnd.nextDouble() * (Math.abs(numericMax) + Math.abs(numericMin)) + 
numericMin
+  }
+
+  /** Generate a binary value. */
+  private def generateBinaryValue(rnd: Random) : Double = if 
(rnd.nextBoolean()) 1 else 0
+
+  /** Generate an array of binary values of length numBinary. */
+  private def generateBinaryArray(rnd: Random, numBinary: Int): 
Array[Double] = {
+Range.apply(0, numBinary).map(_ => generateBinaryValue(rnd)).toArray
+  }
+
+  /** Generate an array of binary values of length numNumeric in the range
+* [numericMin, numericMax]. */
+  private def generateNumericArray(rnd: Random,
+   numNumeric: Int,
+   numericMin: Double,
+   numericMax: Double) : Array[Double] = {
+Range.apply(0, numNumeric).map(_ => generateNumericValue(rnd, 
numericMin, numericMax)).toArray
+  }
+
+  /** Generate a LabeledPoint with numNumeric numeric values followed by 
numBinary binary values. */
+  private def generatePoint(rnd: Random,
+numClasses: Int,
+numNumeric: Int = 10,
+numericMin: Double = -100,
+numericMax: Double = 100,
+numBinary: Int = 10): LabeledPoint = {
+val label = generateLabel(rnd, numClasses)
+val numericArray = generateNumericArray(rnd, numNumeric, numericMin, 
numericMax)
+val binaryArray = generateBinaryArray(rnd, numBinary)
+val vector = Vectors.dense(numericArray ++ binaryArray)
+
+new LabeledPoint(label, vector)
+  }
+
+  /** Data for tree redundancy tests which produces a non-trivial tree. */
+  private def generateRedundantPoints(sc: SparkContext,
+  numClasses: Int,
+  numPoints: Int,
+  rnd: Random): RDD[LabeledPoint] = {
+sc.parallelize(Range.apply(1, numPoints)
+  .map(_ => generatePoint(rnd, numClasses = numClasses, numNumeric = 
0, numBinary = 3)))
+  }
+
+  /**
+* Returns true iff the decision tree has at least one subtree that can 
be pruned
+* (i.e., all its leaves share the same prediction).
+* @param tree the tree to be tested
+* @return true iff the decision tree has at least one subtree that can 
be pruned.
+*/
+  private def isRedundant(tree: DecisionTreeModel): Boolean = 
_isRedundant(tree.rootNode)
--- End diff --

I have removed the documentation (but the return) for all the auxiliary 
method here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >