date:20170529

[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18141
  
**[Test build #77521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77521/testReport)**
 for PR 18141 at commit 
[`f64df52`](https://github.com/apache/spark/commit/f64df526f50cb776a7672faddeb92ef3fcb30024).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...

2017-05-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18141
  
@maropu Sure. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r118608662
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -828,6 +837,8 @@ class SQLConf extends Serializable with Logging {
 
   def optimizerInSetConversionThreshold: Int = 
getConf(OPTIMIZER_INSET_CONVERSION_THRESHOLD)
 
+  def stateStoreProviderClass: Option[String] = 
getConf(STATE_STORE_PROVIDER_CLASS)
--- End diff --

Also add this to `StateStoreConf` for consistency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014976
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -273,27 +333,34 @@ case class StreamingDeduplicateExec(
 child.execute().mapPartitionsWithStateStore(
   getStateId.checkpointLocation,
   getStateId.operatorId,
+  storeName = "default",
   getStateId.batchId,
   keyExpressions.toStructType,
   child.output.toStructType,
+  indexOrdinal = None,
   sqlContext.sessionState,
   Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
   val getKey = GenerateUnsafeProjection.generate(keyExpressions, 
child.output)
   val numOutputRows = longMetric("numOutputRows")
   val numTotalStateRows = longMetric("numTotalStateRows")
   val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+  val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+  val allRemovalsTimeMs = longMetric("allRemovalsTimeMs")
+  val commitTimeMs = longMetric("commitTimeMs")
 
   val baseIterator = watermarkPredicateForData match {
 case Some(predicate) => iter.filter(row => !predicate.eval(row))
 case None => iter
   }
 
+  val updatesStartTimeMs = System.currentTimeMillis
--- End diff --

nit: please use `nanoTime`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119016141
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -165,54 +189,88 @@ case class StateStoreSaveExec(
 child.execute().mapPartitionsWithStateStore(
   getStateId.checkpointLocation,
   getStateId.operatorId,
+  storeName = "default",
   getStateId.batchId,
   keyExpressions.toStructType,
   child.output.toStructType,
+  indexOrdinal = None,
   sqlContext.sessionState,
   Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
 val getKey = GenerateUnsafeProjection.generate(keyExpressions, 
child.output)
 val numOutputRows = longMetric("numOutputRows")
 val numTotalStateRows = longMetric("numTotalStateRows")
 val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+val allRemovalsTimeMs = longMetric("allRemovalsTimeMs")
+val commitTimeMs = longMetric("commitTimeMs")
 
 outputMode match {
   // Update and output all rows in the StateStore.
   case Some(Complete) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  while (iter.hasNext) {
+val row = iter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
+}
+allRemovalsTimeMs += 0
+commitTimeMs += timeTakenMs {
+  store.commit()
 }
-store.commit()
 numTotalStateRows += store.numKeys()
-store.iterator().map { case (k, v) =>
+store.iterator().map { case UnsafeRowPair(_, v) =>
   numOutputRows += 1
   v.asInstanceOf[InternalRow]
 }
 
   // Update and output only rows being evicted from the StateStore
+  // Assumption: watermark predicates must be non-empty if append 
mode is allowed
   case Some(Append) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  val filteredIter = iter.filter(row => 
!watermarkPredicateForData.get.eval(row))
+  while (filteredIter.hasNext) {
+val row = filteredIter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
 }
 
-// Assumption: Append mode can be done only when watermark has 
been specified
-store.remove(watermarkPredicateForKeys.get.eval _)
-store.commit()
+val removalStartTime = System.currentTimeMillis
+val rangeIter = store.getRange(None, None)
+
+new NextIterator[InternalRow] {
+  override protected def getNext(): InternalRow = {
+var removedValueRow: InternalRow = null
+while(rangeIter.hasNext && removedValueRow == null) {
+  val UnsafeRowPair(keyRow, valueRow) = rangeIter.next()
--- End diff --

Case class's `unapply` will create a `Tuple`. You should not use this Scala 
syntactic sugar :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r118802674
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -552,6 +552,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val STATE_STORE_PROVIDER_CLASS =
+buildConf("spark.sql.streaming.stateStore.providerClass")
+  .internal()
+  .doc(
+"The class used to manage state data in stateful streaming 
queries. This class must" +
+  "be a subclass of StateStoreProvider, and must have a zero-arg 
constructor.")
--- End diff --

nit: missing space before `be`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014965
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -253,6 +311,8 @@ case class StateStoreSaveExec(
   override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
+
--- End diff --

nit: extra empty lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014982
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -304,8 +371,9 @@ case class StreamingDeduplicateExec(
   }
 
   CompletionIterator[InternalRow, Iterator[InternalRow]](result, {
-watermarkPredicateForKeys.foreach(f => store.remove(f.eval _))
-store.commit()
+allUpdatesTimeMs += System.currentTimeMillis - updatesStartTimeMs
--- End diff --

nit: please use `nanoTime`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r118804254
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -719,3 +745,23 @@ object ThrowingInterruptedIOException {
*/
   @volatile var createSourceLatch: CountDownLatch = null
 }
+
+class TestStateStoreProvider extends StateStoreProvider {
+
+  override def init(
+  stateStoreId: StateStoreId,
+  keySchema: StructType,
+  valueSchema: StructType,
+  indexOrdinal: Option[Int],
+  storeConfs: StateStoreConf,
+  hadoopConf: Configuration): Unit = {
+throw new Exception("Successfully instantiated")
+
--- End diff --

nit: extra empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014511
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -47,50 +44,54 @@ trait StateStore {
   /** Version of the data in this store before committing updates. */
   def version: Long
 
-  /** Get the current value of a key. */
-  def get(key: UnsafeRow): Option[UnsafeRow]
-
   /**
-   * Return an iterator of key-value pairs that satisfy a certain 
condition.
-   * Note that the iterator must be fail-safe towards modification to the 
store, that is,
-   * it must be based on the snapshot of store the time of this call, and 
any change made to the
-   * store while iterating through iterator should not cause the iterator 
to fail or have
-   * any affect on the values in the iterator.
+   * Get the current value of a non-null key.
*/
-  def filter(condition: (UnsafeRow, UnsafeRow) => Boolean): 
Iterator[(UnsafeRow, UnsafeRow)]
+  def get(key: UnsafeRow): UnsafeRow
 
-  /** Put a new value for a key. */
+  /**
+   * Put a new value for a non-null key. Implementations must be aware 
that the UnsafeRows in
+   * the params can be reused, and must make copies of the data as needed 
for persistence.
+   * @note put cannot be done once
--- End diff --

Could you clarify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r118615627
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -61,11 +60,24 @@ trait StateStoreReader extends StatefulOperator {
 }
 
 /** An operator that writes to a StateStore. */
-trait StateStoreWriter extends StatefulOperator {
+trait StateStoreWriter extends StatefulOperator { self: SparkPlan =>
+
   override lazy val metrics = Map(
 "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"),
 "numTotalStateRows" -> SQLMetrics.createMetric(sparkContext, "number 
of total state rows"),
-"numUpdatedStateRows" -> SQLMetrics.createMetric(sparkContext, "number 
of updated state rows"))
+"numUpdatedStateRows" -> SQLMetrics.createMetric(sparkContext, "number 
of updated state rows"),
+"allUpdatesTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, 
"total time to update rows"),
+"allRemovalsTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, 
"total time to remove rows"),
+"commitTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, "time to 
commit changes")
+  )
+
+  /** Records the duration of running `body` for the next query progress 
update. */
+  protected def timeTakenMs(body: => Unit): Long = {
+val startTime = System.currentTimeMillis
--- End diff --

nit: Use `nanoTime` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014924
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -165,54 +189,88 @@ case class StateStoreSaveExec(
 child.execute().mapPartitionsWithStateStore(
   getStateId.checkpointLocation,
   getStateId.operatorId,
+  storeName = "default",
   getStateId.batchId,
   keyExpressions.toStructType,
   child.output.toStructType,
+  indexOrdinal = None,
   sqlContext.sessionState,
   Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
 val getKey = GenerateUnsafeProjection.generate(keyExpressions, 
child.output)
 val numOutputRows = longMetric("numOutputRows")
 val numTotalStateRows = longMetric("numTotalStateRows")
 val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+val allRemovalsTimeMs = longMetric("allRemovalsTimeMs")
+val commitTimeMs = longMetric("commitTimeMs")
 
 outputMode match {
   // Update and output all rows in the StateStore.
   case Some(Complete) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  while (iter.hasNext) {
+val row = iter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
+}
+allRemovalsTimeMs += 0
+commitTimeMs += timeTakenMs {
+  store.commit()
 }
-store.commit()
 numTotalStateRows += store.numKeys()
-store.iterator().map { case (k, v) =>
+store.iterator().map { case UnsafeRowPair(_, v) =>
   numOutputRows += 1
   v.asInstanceOf[InternalRow]
 }
 
   // Update and output only rows being evicted from the StateStore
+  // Assumption: watermark predicates must be non-empty if append 
mode is allowed
   case Some(Append) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  val filteredIter = iter.filter(row => 
!watermarkPredicateForData.get.eval(row))
+  while (filteredIter.hasNext) {
+val row = filteredIter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
 }
 
-// Assumption: Append mode can be done only when watermark has 
been specified
-store.remove(watermarkPredicateForKeys.get.eval _)
-store.commit()
+val removalStartTime = System.currentTimeMillis
+val rangeIter = store.getRange(None, None)
+
+new NextIterator[InternalRow] {
+  override protected def getNext(): InternalRow = {
+var removedValueRow: InternalRow = null
+while(rangeIter.hasNext && removedValueRow == null) {
+  val UnsafeRowPair(keyRow, valueRow) = rangeIter.next()
+  if (watermarkPredicateForKeys.get.eval(keyRow)) {
+store.remove(keyRow)
+removedValueRow = valueRow
+  }
+}
+if (removedValueRow == null) {
+  finished = true
+  null
+} else {
+  removedValueRow
+}
+  }
 
-numTotalStateRows += store.numKeys()
-store.updates().filter(_.isInstanceOf[ValueRemoved]).map { 
removed =>
-  numOutputRows += 1
-  removed.value.asInstanceOf[InternalRow]
+  override protected def close(): Unit = {
+allRemovalsTimeMs += System.currentTimeMillis - 
removalStartTime
--- End diff --

nit: please use `nanoTime`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014948
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -165,54 +189,88 @@ case class StateStoreSaveExec(
 child.execute().mapPartitionsWithStateStore(
   getStateId.checkpointLocation,
   getStateId.operatorId,
+  storeName = "default",
   getStateId.batchId,
   keyExpressions.toStructType,
   child.output.toStructType,
+  indexOrdinal = None,
   sqlContext.sessionState,
   Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
 val getKey = GenerateUnsafeProjection.generate(keyExpressions, 
child.output)
 val numOutputRows = longMetric("numOutputRows")
 val numTotalStateRows = longMetric("numTotalStateRows")
 val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+val allUpdatesTimeMs = longMetric("allUpdatesTimeMs")
+val allRemovalsTimeMs = longMetric("allRemovalsTimeMs")
+val commitTimeMs = longMetric("commitTimeMs")
 
 outputMode match {
   // Update and output all rows in the StateStore.
   case Some(Complete) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  while (iter.hasNext) {
+val row = iter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
+}
+allRemovalsTimeMs += 0
+commitTimeMs += timeTakenMs {
+  store.commit()
 }
-store.commit()
 numTotalStateRows += store.numKeys()
-store.iterator().map { case (k, v) =>
+store.iterator().map { case UnsafeRowPair(_, v) =>
   numOutputRows += 1
   v.asInstanceOf[InternalRow]
 }
 
   // Update and output only rows being evicted from the StateStore
+  // Assumption: watermark predicates must be non-empty if append 
mode is allowed
   case Some(Append) =>
-while (iter.hasNext) {
-  val row = iter.next().asInstanceOf[UnsafeRow]
-  val key = getKey(row)
-  store.put(key.copy(), row.copy())
-  numUpdatedStateRows += 1
+allUpdatesTimeMs += timeTakenMs {
+  val filteredIter = iter.filter(row => 
!watermarkPredicateForData.get.eval(row))
+  while (filteredIter.hasNext) {
+val row = filteredIter.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+store.put(key, row)
+numUpdatedStateRows += 1
+  }
 }
 
-// Assumption: Append mode can be done only when watermark has 
been specified
-store.remove(watermarkPredicateForKeys.get.eval _)
-store.commit()
+val removalStartTime = System.currentTimeMillis
+val rangeIter = store.getRange(None, None)
+
+new NextIterator[InternalRow] {
+  override protected def getNext(): InternalRow = {
+var removedValueRow: InternalRow = null
+while(rangeIter.hasNext && removedValueRow == null) {
+  val UnsafeRowPair(keyRow, valueRow) = rangeIter.next()
+  if (watermarkPredicateForKeys.get.eval(keyRow)) {
+store.remove(keyRow)
+removedValueRow = valueRow
+  }
+}
+if (removedValueRow == null) {
+  finished = true
+  null
+} else {
+  removedValueRow
+}
+  }
 
-numTotalStateRows += store.numKeys()
-store.updates().filter(_.isInstanceOf[ValueRemoved]).map { 
removed =>
-  numOutputRows += 1
-  removed.value.asInstanceOf[InternalRow]
+  override protected def close(): Unit = {
+allRemovalsTimeMs += System.currentTimeMillis - 
removalStartTime
+commitTimeMs += timeTakenMs { store.commit() }
+numTotalStateRows += store.numKeys()
+  }
 }
 
   // Update and output

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119013976
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -102,28 +103,100 @@ trait StateStore {
 }
 
 
-/** Trait representing a provider of a specific version of a 
[[StateStore]]. */
+/**
+ * Trait representing a provider that provide [[StateStore]] instances 
representing
+ * versions of state data.
+ *
+ * The life cycle of a provider and its provide stores are as follows.
+ *
+ * - A StateStoreProvider is created in a executor for each unique 
[[StateStoreId]] when
+ *   the first batch of a streaming query is executed on the executor. All 
subsequent batches reuse
+ *   this provider instance until the query is stopped.
+ *
+ * - Every batch of streaming data request a specific version of the state 
data by invoking
+ *   `getStore(version)` which returns an instance of [[StateStore]] 
through which the required
+ *   version of the data can be accessed. It is the responsible of the 
provider to populate
+ *   this store with context information like the schema of keys and 
values, etc.
+ *
+ * - After the streaming query is stopped, the created provider instances 
are lazily disposed off.
+ */
 trait StateStoreProvider {
 
-  /** Get the store with the existing version. */
+  /**
+   * Initialize the provide with more contextual information from the SQL 
operator.
+   * This method will be called first after creating an instance of the 
StateStoreProvider by
+   * reflection.
+   *
+   * @param stateStoreId Id of the versioned StateStores that this 
provider will generate
+   * @param keySchema Schema of keys to be stored
+   * @param valueSchema Schema of value to be stored
+   * @param keyIndexOrdinal Optional column (represent as the ordinal of 
the field in keySchema) by
+   *which the StateStore implementation could 
index the data.
+   * @param storeConfs Configurations used by the StateStores
+   * @param hadoopConf Hadoop configuration that could be used by 
StateStore to save state data
+   */
+  def init(
+  stateStoreId: StateStoreId,
+  keySchema: StructType,
+  valueSchema: StructType,
+  keyIndexOrdinal: Option[Int], // for sorting the data by their keys
+  storeConfs: StateStoreConf,
+  hadoopConf: Configuration): Unit
+
+  /**
+   * Return the id of the StateStores this provider will generate.
+   * Should be the same as the one passed in init().
+   */
+  def id: StateStoreId
+
+  /** Called when the provider instance is unloaded from the executor */
+  def close(): Unit
+
+  /** Return an instance of [[StateStore]] representing state data of the 
given version */
   def getStore(version: Long): StateStore
 
-  /** Optional method for providers to allow for background maintenance */
+  /** Optional method for providers to allow for background maintenance 
(e.g. compactions) */
   def doMaintenance(): Unit = { }
 }
 
-
-/** Trait representing updates made to a [[StateStore]]. */
-sealed trait StoreUpdate {
-  def key: UnsafeRow
-  def value: UnsafeRow
+object StateStoreProvider {
+  /**
+   * Return a provider instance of the given provider class.
+   * The instance will be already initialized.
+   */
+  def instantiate(
+  providerClass: String,
+  stateStoreId: StateStoreId,
+  keySchema: StructType,
+  valueSchema: StructType,
+  indexOrdinal: Option[Int], // for sorting the data
+  storeConf: StateStoreConf,
+  hadoopConf: Configuration): StateStoreProvider = {
+val provider = Utils.getContextOrSparkClassLoader
--- End diff --

nit: Use `Utils.classForName(providerClass)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-29 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r119014486
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -47,50 +44,54 @@ trait StateStore {
   /** Version of the data in this store before committing updates. */
   def version: Long
 
-  /** Get the current value of a key. */
-  def get(key: UnsafeRow): Option[UnsafeRow]
-
   /**
-   * Return an iterator of key-value pairs that satisfy a certain 
condition.
-   * Note that the iterator must be fail-safe towards modification to the 
store, that is,
-   * it must be based on the snapshot of store the time of this call, and 
any change made to the
-   * store while iterating through iterator should not cause the iterator 
to fail or have
-   * any affect on the values in the iterator.
+   * Get the current value of a non-null key.
--- End diff --

nit: please mention that `null` means key doesn't exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18142
  
**[Test build #77520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77520/testReport)**
 for PR 18142 at commit 
[`3f253f3`](https://github.com/apache/spark/commit/3f253f37b1660a6f69376986b470213c38c10cc6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18142#discussion_r119014081
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -17,51 +17,74 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import java.lang.reflect.Modifier
+import java.util.Locale
+import javax.annotation.concurrent.GuardedBy
 
+import scala.collection.mutable
 import scala.language.existentials
 import scala.reflect.ClassTag
 import scala.util.{Failure, Success, Try}
 
 import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.FunctionIdentifier
 import 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry.FunctionBuilder
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
 import org.apache.spark.sql.catalyst.expressions.xml._
-import org.apache.spark.sql.catalyst.util.StringKeyHashMap
 import org.apache.spark.sql.types._
 
 
 /**
  * A catalog for looking up user defined functions, used by an 
[[Analyzer]].
  *
- * Note: The implementation should be thread-safe to allow concurrent 
access.
+ * Note:
+ *   1) The implementation should be thread-safe to allow concurrent 
access.
+ *   2) the database name is always case-sensitive here, callers are 
responsible to
+ *  format the database name w.r.t. case-sensitive config.
  */
 trait FunctionRegistry {
 
-  final def registerFunction(name: String, builder: FunctionBuilder): Unit 
= {
-registerFunction(name, new 
ExpressionInfo(builder.getClass.getCanonicalName, name), builder)
+  final def registerFunction(name: FunctionIdentifier, builder: 
FunctionBuilder): Unit = {
+val info = new ExpressionInfo(
+  builder.getClass.getCanonicalName, name.database.orNull, 
name.funcName)
+registerFunction(name, info, builder)
   }
 
-  def registerFunction(name: String, info: ExpressionInfo, builder: 
FunctionBuilder): Unit
+  def registerFunction(
+name: FunctionIdentifier,
+info: ExpressionInfo,
+builder: FunctionBuilder): Unit
+
+  /* Create or replace a temporary function. */
+  final def createOrReplaceTempFunction(name: String, builder: 
FunctionBuilder): Unit = {
--- End diff --

Since we already expose `FunctionRegistry` to the stable class 
`UDFRegistration`, I added this extra API for a helper function. 

Ideally, this function should only exist in `SessionCatalog`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18142#discussion_r119013261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1205,8 +1204,8 @@ class SessionCatalog(
 requireDbExists(dbName)
 val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { 
f =>
   FunctionIdentifier(f, Some(dbName)) }
-val loadedFunctions =
-  StringUtils.filterPattern(functionRegistry.listFunction(), 
pattern).map { f =>
+val loadedFunctions = StringUtils
+  
.filterPattern(functionRegistry.listFunction().map(_.unquotedString), 
pattern).map { f =>
--- End diff --

This PR keeps the current behavior. However, I think it is also a bug. The 
user-specified `pattern` should not consider the database name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18142#discussion_r119013093
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1116,8 +1115,8 @@ class SessionCatalog(
 // TODO: just make function registry take in FunctionIdentifier 
instead of duplicating this
 val database = 
name.database.orElse(Some(currentDb)).map(formatDatabaseName)
 val qualifiedName = name.copy(database = database)
-functionRegistry.lookupFunction(name.funcName)
--- End diff --

This also sounds a bug. This line ignores the database name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18142
  
**[Test build #77519 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77519/testReport)**
 for PR 18142 at commit 
[`201787f`](https://github.com/apache/spark/commit/201787f7f01cadf21ca1f9c30304aa4a26af8226).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77515/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18142#discussion_r119012967
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -72,39 +89,53 @@ trait FunctionRegistry {
 
 class SimpleFunctionRegistry extends FunctionRegistry {
 
-  protected val functionBuilders =
-StringKeyHashMap[(ExpressionInfo, FunctionBuilder)](caseSensitive = 
false)
--- End diff --

This has a bug. The database name could be case sensitive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77515/testReport)**
 for PR 18134 at commit 
[`69d4227`](https://github.com/apache/spark/commit/69d42278cf6eeb13415b9627cdb7019c333547fa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18142#discussion_r119012802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -72,39 +89,53 @@ trait FunctionRegistry {
 
 class SimpleFunctionRegistry extends FunctionRegistry {
 
-  protected val functionBuilders =
-StringKeyHashMap[(ExpressionInfo, FunctionBuilder)](caseSensitive = 
false)
+  @GuardedBy("this")
+  private val functionBuilders =
+new mutable.HashMap[FunctionIdentifier, (ExpressionInfo, 
FunctionBuilder)]
+
+  // Resolution of the function name is always case insensitive, but the 
database name
+  // depends on the caller
+  private def normalizeFuncName(name: FunctionIdentifier): 
FunctionIdentifier = {
+FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), 
name.database)
+  }
 
   override def registerFunction(
-  name: String,
+  name: FunctionIdentifier,
   info: ExpressionInfo,
   builder: FunctionBuilder): Unit = synchronized {
-functionBuilders.put(name, (info, builder))
+functionBuilders.put(normalizeFuncName(name), (info, builder))
   }
 
-  override def lookupFunction(name: String, children: Seq[Expression]): 
Expression = {
+  override def lookupFunction(name: FunctionIdentifier, children: 
Seq[Expression]): Expression = {
 val func = synchronized {
-  functionBuilders.get(name).map(_._2).getOrElse {
+  functionBuilders.get(normalizeFuncName(name)).map(_._2).getOrElse {
 throw new AnalysisException(s"undefined function $name")
   }
 }
 func(children)
   }
 
-  override def listFunction(): Seq[String] = synchronized {
-functionBuilders.iterator.map(_._1).toList.sorted
--- End diff --

This `sorted` is useless.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...

2017-05-29 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18141
  
Better to add tests in `SQLQueryTestSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...

2017-05-29 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/18142

[SPARK-20918] [SQL] Use FunctionIdentifier as function identifiers in 
FunctionRegistry

### What changes were proposed in this pull request?
Currently, the unquoted string of a function identifier is being used as 
the function identifier in the function registry. This could cause the 
incorrect the behavior when users use `.` in the function names. This PR is to 
take the `FunctionIdentifier` as the identifier in the function registry. 

### How was this patch tested?
TODO: add extra test cases to verify the inclusive bug fixes. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark fuctionRegistry

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18142


commit a374e9f14fd2486bb8a77c24d9ff8c3aa12d7bd4
Author: Xiao Li 
Date:   2017-05-30T04:38:18Z

fix.

commit 201787f7f01cadf21ca1f9c30304aa4a26af8226
Author: Xiao Li 
Date:   2017-05-30T04:51:06Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77513/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77513 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77513/testReport)**
 for PR 18134 at commit 
[`dcae776`](https://github.com/apache/spark/commit/dcae77600fc0cca41d2e3e607469232f59a021af).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77512/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18138
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18138
  
**[Test build #77512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77512/testReport)**
 for PR 18138 at commit 
[`895f414`](https://github.com/apache/spark/commit/895f414983250a708fee46b7879de1524f01c368).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #77518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77518/testReport)**
 for PR 14158 at commit 
[`114401a`](https://github.com/apache/spark/commit/114401a630650623c7c311bf753d4422d98e1550).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77518/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #77518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77518/testReport)**
 for PR 14158 at commit 
[`114401a`](https://github.com/apache/spark/commit/114401a630650623c7c311bf753d4422d98e1550).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18134
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77514/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77514/testReport)**
 for PR 18134 at commit 
[`87defdc`](https://github.com/apache/spark/commit/87defdc91a7be6ba027cc196e76a552bf47a01f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedRelation(`
  * `case class StringReplace(srcExpr: Expression, searchExpr: Expression, 
replaceExpr: Expression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17750: [SPARK-4899][MESOS] Support for Checkpointing on ...

2017-05-29 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17750#discussion_r119009317
  
--- Diff: docs/running-on-mesos.md ---
@@ -516,6 +516,16 @@ See the [configuration page](configuration.html) for 
information on Spark config
 Fetcher Cache
   
 
+
+  spark.mesos.checkpoint
+  false
+  
+If set to true, the agents that are running the Spark executors will 
write the framework pid, executor pids and status updates to disk. 
--- End diff --

nit: the *mesos* agents


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18140
  
@felixcheung Please take a look. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18141
  
**[Test build #77517 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77517/testReport)**
 for PR 18141 at commit 
[`0a7eab0`](https://github.com/apache/spark/commit/0a7eab0e456ffc0113ec9d39618617b970922f9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...

2017-05-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18141
  
cc @JoshRosen @cloud-fan @hvanhovell @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18141: [SPARK-20916][SQL] Improve error message for unal...

2017-05-29 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/18141

[SPARK-20916][SQL] Improve error message for unaliased subqueries in FROM 
clause

## What changes were proposed in this pull request?

We changed the parser to reject unaliased subqueries in the FROM clause in 
SPARK-20690. However, the error message that we now give isn't very helpful:

scala> sql("""SELECT x FROM (SELECT 1 AS x)""")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 
'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 
'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9)

We should modify the parser to throw a more clear error for such queries:

scala> sql("""SELECT x FROM (SELECT 1 AS x)""")
org.apache.spark.sql.catalyst.parser.ParseException:
The unaliased subqueries in the FROM clause are not supported.(line 1, 
pos 14)

## How was this patch tested?

Modified existing tests to reflect this change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-20916

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18141


commit 0a7eab0e456ffc0113ec9d39618617b970922f9b
Author: Liang-Chi Hsieh 
Date:   2017-05-30T03:52:47Z

Improve error message for unaliased subqueries in FROM clause.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77510/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #77510 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77510/testReport)**
 for PR 14158 at commit 
[`69180bd`](https://github.com/apache/spark/commit/69180bd5b5b21725ff1e498e98690bc261f079f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18140
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77516/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14158#discussion_r119008402
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
 ---
@@ -65,13 +65,29 @@ abstract class AbstractSqlParser extends 
ParserInterface with Logging {
   }
 
   /** Creates LogicalPlan for a given SQL string. */
-  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { 
parser =>
-astBuilder.visitSingleStatement(parser.singleStatement()) match {
-  case plan: LogicalPlan => plan
-  case _ =>
-val position = Origin(None, None)
-throw new ParseException(Option(sqlText), "Unsupported SQL 
statement", position, position)
+  override def parsePlan(sqlText: String): LogicalPlan = {
+val logicalPlan = parse(sqlText) { parser =>
+  astBuilder.visitSingleStatement(parser.singleStatement()) match {
+case plan: LogicalPlan => plan
+case _ =>
+  val position = Origin(None, None)
+  throw new ParseException(Option(sqlText), "Unsupported SQL 
statement", position, position)
+  }
+}
+// Record the original sql text in the top logical plan for checking 
in the web UI.
+// Truncate the text to avoid downing browsers or web UI servers by 
running out of memory.
+val maxLength = 1000
+val suffix = " ... (truncated)"
+val truncateLength = maxLength - suffix.length
+val truncatedSqlText = {
+  if (sqlText.length <= maxLength) {
+sqlText
+  } else {
+sqlText.substring(0, truncateLength) + suffix
+  }
 }
+logicalPlan.sqlText = Some(truncatedSqlText)
+logicalPlan
--- End diff --

The solution in this PR looks intrusive to me. If we really want to store 
the original sql text, we can add it into the QueryExecution. The value can be 
initialized when we build the QueryExecution


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18140
  
**[Test build #77516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)**
 for PR 18140 at commit 
[`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14158#discussion_r119007096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -258,6 +258,9 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
* Refreshes (or invalidates) any metadata/data cached in the plan 
recursively.
*/
   def refresh(): Unit = children.foreach(_.refresh())
+
+  // Record the original sql text in the top logical plan for checking in 
the web UI.
+  var sqlText: Option[String] = None
--- End diff --

Using `var` for this should be avoided. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18140
  
**[Test build #77516 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)**
 for PR 18140 at commit 
[`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77515 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77515/testReport)**
 for PR 18134 at commit 
[`69d4227`](https://github.com/apache/spark/commit/69d42278cf6eeb13415b9627cdb7019c333547fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18140
  
**[Test build #77511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)**
 for PR 18140 at commit 
[`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18140
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77511/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18140
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18134
  
LGTM except for one comment about the function description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18134: [SPARK-20909][SQL] Add build-int SQL function - D...

2017-05-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18134#discussion_r119002317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -404,6 +404,44 @@ case class DayOfMonth(child: Expression) extends 
UnaryExpression with ImplicitCa
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
+  usage = "_FUNC_(date) - Returns the weekday index for date/timestamp (1 
= Sunday, 2 = Monday, ..., 7 = Saturday).",
--- End diff --

As Sunday, Saturday are included, it is not only weekday. `Returns the day 
of the week ...`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18122
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77514 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77514/testReport)**
 for PR 18134 at commit 
[`87defdc`](https://github.com/apache/spark/commit/87defdc91a7be6ba027cc196e76a552bf47a01f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18122
  
@yanboliang I have moved the tests to the test file. Please let me know if 
there is anything else needed. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-29 Thread pralabhkumar

Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
16ccbdf is successful . Please review the pull request .  
@MLnick @sethah @mpjlu @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18134: [SPARK-20909][SQL] Add build-int SQL function - D...

2017-05-29 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18134#discussion_r119000601
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -402,6 +402,44 @@ case class DayOfMonth(child: Expression) extends 
UnaryExpression with ImplicitCa
   }
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(date) - Returns the weekday index for date/timestamp (1 
= Sunday, 2 = Monday, ..., 7 = Saturday).",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-07-30');
+   5
+  """)
+// scalastyle:on line.size.limit
+case class DayOfWeek(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
+
+  override def dataType: DataType = IntegerType
+
+  @transient private lazy val c = {
+Calendar.getInstance(DateTimeUtils.getTimeZone("UTC"))
+  }
+
+  override protected def nullSafeEval(date: Any): Any = {
+c.setTimeInMillis(date.asInstanceOf[Int] * 1000L * 3600L * 24L)
+c.get(Calendar.DAY_OF_WEEK)
--- End diff --

Keep pace with [Hive's DayOfWeek

](https://github.com/apache/hive/blob/59539885725a96cca4b3f0759a5b26e0d8198dc8/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfWeek.java#L55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18134
  
**[Test build #77513 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77513/testReport)**
 for PR 18134 at commit 
[`dcae776`](https://github.com/apache/spark/commit/dcae77600fc0cca41d2e3e607469232f59a021af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18138
  
**[Test build #77512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77512/testReport)**
 for PR 18138 at commit 
[`895f414`](https://github.com/apache/spark/commit/895f414983250a708fee46b7879de1524f01c368).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18140: [ML][SparkR] SparkR supports string encoding consistent ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18140
  
**[Test build #77511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)**
 for PR 18140 at commit 
[`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18140: Spark r formula

2017-05-29 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/18140

Spark r formula

## What changes were proposed in this pull request?

Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support 
string encoding that is consistent with default R. 

## How was this patch tested?
new tests 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark sparkRFormula

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18140


commit be7a0fb993ad1fbe60576cd39ca86b20d45289a6
Author: actuaryzhang 
Date:   2017-05-28T01:39:51Z

add stringIndexerOrderType to SparkR glm and test result consistency with R

commit 826e784e3bf83c3b9a84fc7d9500d15971a7ffd8
Author: actuaryzhang 
Date:   2017-05-30T01:36:39Z

add stringIndexerOrderType to survreg




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18122
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77509/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18122
  
**[Test build #77509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77509/testReport)**
 for PR 18122 at commit 
[`4af4b35`](https://github.com/apache/spark/commit/4af4b3500de27acb0128763be755ea8078736d60).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...

2017-05-29 Thread nblintao

Github user nblintao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14158#discussion_r11899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
 ---
@@ -50,13 +50,29 @@ abstract class AbstractSqlParser extends 
ParserInterface with Logging {
   }
 
   /** Creates LogicalPlan for a given SQL string. */
-  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { 
parser =>
-astBuilder.visitSingleStatement(parser.singleStatement()) match {
-  case plan: LogicalPlan => plan
-  case _ =>
-val position = Origin(None, None)
-throw new ParseException(Option(sqlText), "Unsupported SQL 
statement", position, position)
+  override def parsePlan(sqlText: String): LogicalPlan = {
+val logicalPlan = parse(sqlText) { parser =>
+  astBuilder.visitSingleStatement(parser.singleStatement()) match {
+case plan: LogicalPlan => plan
+case _ =>
+  val position = Origin(None, None)
+  throw new ParseException(Option(sqlText), "Unsupported SQL 
statement", position, position)
+  }
+}
+// Record the original sql text in the top logical plan for checking 
in the web UI.
+// Truncate the text to avoid downing browsers or web UI servers by 
running out of memory.
+val maxLength = 1000
+val suffix = " ... (truncated)"
+val truncateLength = maxLength - suffix.length
--- End diff --

I think either way is okay. Here, I am considering keeping the text 
displayed (including suffix) less than 1000 chars.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...

2017-05-29 Thread nblintao

Github user nblintao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14158#discussion_r118998724
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -326,7 +336,16 @@ private[spark] object UIUtils extends Logging {
 
 val headerRow: Seq[Node] = {
   headers.view.zipWithIndex.map { x =>
-{getHeaderContent(x._1)}
+val toolTipOption = getToolTip(x._2)
+if (toolTipOption.isEmpty) {
+  {getHeaderContent(x._1)}
+} else {
+  val toolTip = toolTipOption.get
+  // scalastyle:off line.size.limit
--- End diff --

Fixed. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread nblintao

Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14158
  
I have just rebased. @ajbozarth @HyukjinKwon  @gatorsmile @srowen @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #77510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77510/testReport)**
 for PR 14158 at commit 
[`69180bd`](https://github.com/apache/spark/commit/69180bd5b5b21725ff1e498e98690bc261f079f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-05-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17308


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-05-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17935
  
@JoshRosen Thanks for filing this issue. I'll look into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18122
  
**[Test build #77509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77509/testReport)**
 for PR 18122 at commit 
[`4af4b35`](https://github.com/apache/spark/commit/4af4b3500de27acb0128763be755ea8078736d60).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...

2017-05-29 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/17308
  
LGTM. Merging to master and 2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-05-29 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/17935
  
I was trying to run a test case from another database which _does_ support 
unaliased subqueries in the `FROM` clause and hit a confusing parser error due 
to this patch's behavior change. While I agree that we shouldn't necessarily 
support this syntax, I think that the current error message that we're 
returning isn't very good so I've file 
https://issues.apache.org/jira/browse/SPARK-20916 to improve it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18132: [SPARK-8184][SQL] Add additional function descrip...

2017-05-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18132: [SPARK-8184][SQL] Add additional function description fo...

2017-05-29 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18132
  
Thanks - merging in master/branch-2.2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118989451
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1090,6 +1090,24 @@ class SessionCatalog(
 }
   }
 
+  /** Create a temporary macro. */
+  def createTempMacro(
+  name: String,
+  info: ExpressionInfo,
+  functionBuilder: FunctionBuilder): Unit = {
+if (functionRegistry.functionExists(name)) {
+  throw new AnalysisException(s"Function $name already exists")
+}
+functionRegistry.registerFunction(name, info, functionBuilder)
+  }
+
+  /** Drop a temporary macro. */
+  def dropTempMacro(name: String, ignoreIfNotExists: Boolean): Unit = {
+if (!functionRegistry.dropMacro(name) && !ignoreIfNotExists) {
+  throw new NoSuchTempMacroException(name)
--- End diff --

```
hive>  DROP TEMPORARY MACRO max;
OK
Time taken: 0.01 seconds
hive> select max(3) from t1;
OK
3
```

After we drop the macro, the existing function works well. That means, we 
did not delete the original built-in functions. The built-in function will not 
be dropped by ` DROP TEMPORARY MACRO`. After we drop the macro with the same 
name, the original function `max` is using the original built-in function. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118989143
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1090,6 +1090,24 @@ class SessionCatalog(
 }
   }
 
+  /** Create a temporary macro. */
+  def createTempMacro(
+  name: String,
+  info: ExpressionInfo,
+  functionBuilder: FunctionBuilder): Unit = {
+if (functionRegistry.functionExists(name)) {
--- End diff --

```
hive> create temporary macro max(x int)
> x*x;
OK
Time taken: 0.014 seconds

hive> select max(3) from t1;
OK
9
Time taken: 0.468 seconds, Fetched: 1 row(s)

hive> select max(3,4) from t1;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
'4': The macro max accepts exactly 1 arguments.
```

Hive overwrites the temporary function



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118987906
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala
 ---
@@ -52,3 +52,6 @@ class NoSuchPartitionsException(db: String, table: 
String, specs: Seq[TableParti
 
 class NoSuchTempFunctionException(func: String)
   extends AnalysisException(s"Temporary function '$func' not found")
+
+class NoSuchTempMacroException(func: String)
--- End diff --

Please remove it. For reasons, please see the PR 
https://github.com/apache/spark/pull/17716. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77507/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17880
  
**[Test build #77507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77507/testReport)**
 for PR 17880 at commit 
[`3110f0f`](https://github.com/apache/spark/commit/3110f0f0c1a09b28a5706674ae65fd47ce48b163).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18122
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18122
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77508/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18122
  
**[Test build #77508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77508/testReport)**
 for PR 18122 at commit 
[`320203e`](https://github.com/apache/spark/commit/320203eeea6d7613bb091f01b170fbfa2805b2a0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkMLTests(ReusedPySparkTestCase):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18122
  
**[Test build #77508 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77508/testReport)**
 for PR 18122 at commit 
[`320203e`](https://github.com/apache/spark/commit/320203eeea6d7613bb091f01b170fbfa2805b2a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18139: Spark 20787 invalid years

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18139
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18139: Spark 20787 invalid years

2017-05-29 Thread rberenguel

GitHub user rberenguel opened a pull request:

https://github.com/apache/spark/pull/18139

Spark 20787 invalid years

`time.mktime` can't handle dates from 1899-100, according to the 
documentation by design. `calendar.timegm` is equivalent in shared cases, but 
can handle those years.

## What changes were proposed in this pull request?

Change `time.mktime` for the more able `calendar.timegm` to adress cases 
like:
```python
import datetime as dt

sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1899,12,31)]])).count()
```
failing due to internal conversion failure when there is no timezone 
information in the time object. In the case there is information, `calendar` 
was used instead.

## How was this patch tested?

The existing test cases cover this change, since it does not change any 
existing functionality. Added a test to confirm it working in the problematic 
range.

This PR is original work from me and I license this work to the Spark 
project

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rberenguel/spark SPARK-20787-invalid-years

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18139


commit 6c0312f94e3fce2bf4d6a30055bd747be535bb0f
Author: Ruben Berenguel Montoro 
Date:   2017-05-29T15:46:21Z

SPARK-20787 time.mktime canât handle dates from 1899-100, by 
construction. calendar.timegm is equivalent in shared cases, but can handle 
those

commit d3c41b5f18971168870524ad3a5fac876859bf4b
Author: Ruben Berenguel Montoro 
Date:   2017-05-29T19:42:54Z

SPARK-20787 Technically a hack. Using gmtime everywhere does not work well 
with DST shifts. So, for timeranges that donât work well with mktime, use 
gmtime




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18135: [SPARK-20907][test] Use testQuietly for test suit...

2017-05-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18135


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18135: [SPARK-20907][test] Use testQuietly for test suites that...

2017-05-29 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18135
  
LGTM. Merging to master and 2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18138
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77505/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18138
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2017-05-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18138
  
**[Test build #77505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77505/testReport)**
 for PR 18138 at commit 
[`3ac9fb0`](https://github.com/apache/spark/commit/3ac9fb07ef2f53315247ad12d391b1bed92319e9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 183 matches

Mail list logo