[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18141 **[Test build #77521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77521/testReport)** for PR 18141 at commit [`f64df52`](https://github.com/apache/spark/commit/f64df526f50cb776a7672faddeb92ef3fcb30024). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18141 @maropu Sure. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r118608662 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -828,6 +837,8 @@ class SQLConf extends Serializable with Logging { def optimizerInSetConversionThreshold: Int = getConf(OPTIMIZER_INSET_CONVERSION_THRESHOLD) + def stateStoreProviderClass: Option[String] = getConf(STATE_STORE_PROVIDER_CLASS) --- End diff -- Also add this to `StateStoreConf` for consistency? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014976 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -273,27 +333,34 @@ case class StreamingDeduplicateExec( child.execute().mapPartitionsWithStateStore( getStateId.checkpointLocation, getStateId.operatorId, + storeName = "default", getStateId.batchId, keyExpressions.toStructType, child.output.toStructType, + indexOrdinal = None, sqlContext.sessionState, Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output) val numOutputRows = longMetric("numOutputRows") val numTotalStateRows = longMetric("numTotalStateRows") val numUpdatedStateRows = longMetric("numUpdatedStateRows") + val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") + val allRemovalsTimeMs = longMetric("allRemovalsTimeMs") + val commitTimeMs = longMetric("commitTimeMs") val baseIterator = watermarkPredicateForData match { case Some(predicate) => iter.filter(row => !predicate.eval(row)) case None => iter } + val updatesStartTimeMs = System.currentTimeMillis --- End diff -- nit: please use `nanoTime` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119016141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -165,54 +189,88 @@ case class StateStoreSaveExec( child.execute().mapPartitionsWithStateStore( getStateId.checkpointLocation, getStateId.operatorId, + storeName = "default", getStateId.batchId, keyExpressions.toStructType, child.output.toStructType, + indexOrdinal = None, sqlContext.sessionState, Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output) val numOutputRows = longMetric("numOutputRows") val numTotalStateRows = longMetric("numTotalStateRows") val numUpdatedStateRows = longMetric("numUpdatedStateRows") +val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") +val allRemovalsTimeMs = longMetric("allRemovalsTimeMs") +val commitTimeMs = longMetric("commitTimeMs") outputMode match { // Update and output all rows in the StateStore. case Some(Complete) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + while (iter.hasNext) { +val row = iter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } +} +allRemovalsTimeMs += 0 +commitTimeMs += timeTakenMs { + store.commit() } -store.commit() numTotalStateRows += store.numKeys() -store.iterator().map { case (k, v) => +store.iterator().map { case UnsafeRowPair(_, v) => numOutputRows += 1 v.asInstanceOf[InternalRow] } // Update and output only rows being evicted from the StateStore + // Assumption: watermark predicates must be non-empty if append mode is allowed case Some(Append) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + val filteredIter = iter.filter(row => !watermarkPredicateForData.get.eval(row)) + while (filteredIter.hasNext) { +val row = filteredIter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } } -// Assumption: Append mode can be done only when watermark has been specified -store.remove(watermarkPredicateForKeys.get.eval _) -store.commit() +val removalStartTime = System.currentTimeMillis +val rangeIter = store.getRange(None, None) + +new NextIterator[InternalRow] { + override protected def getNext(): InternalRow = { +var removedValueRow: InternalRow = null +while(rangeIter.hasNext && removedValueRow == null) { + val UnsafeRowPair(keyRow, valueRow) = rangeIter.next() --- End diff -- Case class's `unapply` will create a `Tuple`. You should not use this Scala syntactic sugar :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r118802674 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -552,6 +552,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val STATE_STORE_PROVIDER_CLASS = +buildConf("spark.sql.streaming.stateStore.providerClass") + .internal() + .doc( +"The class used to manage state data in stateful streaming queries. This class must" + + "be a subclass of StateStoreProvider, and must have a zero-arg constructor.") --- End diff -- nit: missing space before `be`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -253,6 +311,8 @@ case class StateStoreSaveExec( override def output: Seq[Attribute] = child.output override def outputPartitioning: Partitioning = child.outputPartitioning + --- End diff -- nit: extra empty lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -304,8 +371,9 @@ case class StreamingDeduplicateExec( } CompletionIterator[InternalRow, Iterator[InternalRow]](result, { -watermarkPredicateForKeys.foreach(f => store.remove(f.eval _)) -store.commit() +allUpdatesTimeMs += System.currentTimeMillis - updatesStartTimeMs --- End diff -- nit: please use `nanoTime` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r118804254 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -719,3 +745,23 @@ object ThrowingInterruptedIOException { */ @volatile var createSourceLatch: CountDownLatch = null } + +class TestStateStoreProvider extends StateStoreProvider { + + override def init( + stateStoreId: StateStoreId, + keySchema: StructType, + valueSchema: StructType, + indexOrdinal: Option[Int], + storeConfs: StateStoreConf, + hadoopConf: Configuration): Unit = { +throw new Exception("Successfully instantiated") + --- End diff -- nit: extra empty line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014511 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -47,50 +44,54 @@ trait StateStore { /** Version of the data in this store before committing updates. */ def version: Long - /** Get the current value of a key. */ - def get(key: UnsafeRow): Option[UnsafeRow] - /** - * Return an iterator of key-value pairs that satisfy a certain condition. - * Note that the iterator must be fail-safe towards modification to the store, that is, - * it must be based on the snapshot of store the time of this call, and any change made to the - * store while iterating through iterator should not cause the iterator to fail or have - * any affect on the values in the iterator. + * Get the current value of a non-null key. */ - def filter(condition: (UnsafeRow, UnsafeRow) => Boolean): Iterator[(UnsafeRow, UnsafeRow)] + def get(key: UnsafeRow): UnsafeRow - /** Put a new value for a key. */ + /** + * Put a new value for a non-null key. Implementations must be aware that the UnsafeRows in + * the params can be reused, and must make copies of the data as needed for persistence. + * @note put cannot be done once --- End diff -- Could you clarify it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r118615627 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -61,11 +60,24 @@ trait StateStoreReader extends StatefulOperator { } /** An operator that writes to a StateStore. */ -trait StateStoreWriter extends StatefulOperator { +trait StateStoreWriter extends StatefulOperator { self: SparkPlan => + override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"), "numTotalStateRows" -> SQLMetrics.createMetric(sparkContext, "number of total state rows"), -"numUpdatedStateRows" -> SQLMetrics.createMetric(sparkContext, "number of updated state rows")) +"numUpdatedStateRows" -> SQLMetrics.createMetric(sparkContext, "number of updated state rows"), +"allUpdatesTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, "total time to update rows"), +"allRemovalsTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, "total time to remove rows"), +"commitTimeMs" -> SQLMetrics.createTimingMetric(sparkContext, "time to commit changes") + ) + + /** Records the duration of running `body` for the next query progress update. */ + protected def timeTakenMs(body: => Unit): Long = { +val startTime = System.currentTimeMillis --- End diff -- nit: Use `nanoTime` instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014924 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -165,54 +189,88 @@ case class StateStoreSaveExec( child.execute().mapPartitionsWithStateStore( getStateId.checkpointLocation, getStateId.operatorId, + storeName = "default", getStateId.batchId, keyExpressions.toStructType, child.output.toStructType, + indexOrdinal = None, sqlContext.sessionState, Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output) val numOutputRows = longMetric("numOutputRows") val numTotalStateRows = longMetric("numTotalStateRows") val numUpdatedStateRows = longMetric("numUpdatedStateRows") +val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") +val allRemovalsTimeMs = longMetric("allRemovalsTimeMs") +val commitTimeMs = longMetric("commitTimeMs") outputMode match { // Update and output all rows in the StateStore. case Some(Complete) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + while (iter.hasNext) { +val row = iter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } +} +allRemovalsTimeMs += 0 +commitTimeMs += timeTakenMs { + store.commit() } -store.commit() numTotalStateRows += store.numKeys() -store.iterator().map { case (k, v) => +store.iterator().map { case UnsafeRowPair(_, v) => numOutputRows += 1 v.asInstanceOf[InternalRow] } // Update and output only rows being evicted from the StateStore + // Assumption: watermark predicates must be non-empty if append mode is allowed case Some(Append) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + val filteredIter = iter.filter(row => !watermarkPredicateForData.get.eval(row)) + while (filteredIter.hasNext) { +val row = filteredIter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } } -// Assumption: Append mode can be done only when watermark has been specified -store.remove(watermarkPredicateForKeys.get.eval _) -store.commit() +val removalStartTime = System.currentTimeMillis +val rangeIter = store.getRange(None, None) + +new NextIterator[InternalRow] { + override protected def getNext(): InternalRow = { +var removedValueRow: InternalRow = null +while(rangeIter.hasNext && removedValueRow == null) { + val UnsafeRowPair(keyRow, valueRow) = rangeIter.next() + if (watermarkPredicateForKeys.get.eval(keyRow)) { +store.remove(keyRow) +removedValueRow = valueRow + } +} +if (removedValueRow == null) { + finished = true + null +} else { + removedValueRow +} + } -numTotalStateRows += store.numKeys() -store.updates().filter(_.isInstanceOf[ValueRemoved]).map { removed => - numOutputRows += 1 - removed.value.asInstanceOf[InternalRow] + override protected def close(): Unit = { +allRemovalsTimeMs += System.currentTimeMillis - removalStartTime --- End diff -- nit: please use `nanoTime` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014948 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -165,54 +189,88 @@ case class StateStoreSaveExec( child.execute().mapPartitionsWithStateStore( getStateId.checkpointLocation, getStateId.operatorId, + storeName = "default", getStateId.batchId, keyExpressions.toStructType, child.output.toStructType, + indexOrdinal = None, sqlContext.sessionState, Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) => val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output) val numOutputRows = longMetric("numOutputRows") val numTotalStateRows = longMetric("numTotalStateRows") val numUpdatedStateRows = longMetric("numUpdatedStateRows") +val allUpdatesTimeMs = longMetric("allUpdatesTimeMs") +val allRemovalsTimeMs = longMetric("allRemovalsTimeMs") +val commitTimeMs = longMetric("commitTimeMs") outputMode match { // Update and output all rows in the StateStore. case Some(Complete) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + while (iter.hasNext) { +val row = iter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } +} +allRemovalsTimeMs += 0 +commitTimeMs += timeTakenMs { + store.commit() } -store.commit() numTotalStateRows += store.numKeys() -store.iterator().map { case (k, v) => +store.iterator().map { case UnsafeRowPair(_, v) => numOutputRows += 1 v.asInstanceOf[InternalRow] } // Update and output only rows being evicted from the StateStore + // Assumption: watermark predicates must be non-empty if append mode is allowed case Some(Append) => -while (iter.hasNext) { - val row = iter.next().asInstanceOf[UnsafeRow] - val key = getKey(row) - store.put(key.copy(), row.copy()) - numUpdatedStateRows += 1 +allUpdatesTimeMs += timeTakenMs { + val filteredIter = iter.filter(row => !watermarkPredicateForData.get.eval(row)) + while (filteredIter.hasNext) { +val row = filteredIter.next().asInstanceOf[UnsafeRow] +val key = getKey(row) +store.put(key, row) +numUpdatedStateRows += 1 + } } -// Assumption: Append mode can be done only when watermark has been specified -store.remove(watermarkPredicateForKeys.get.eval _) -store.commit() +val removalStartTime = System.currentTimeMillis +val rangeIter = store.getRange(None, None) + +new NextIterator[InternalRow] { + override protected def getNext(): InternalRow = { +var removedValueRow: InternalRow = null +while(rangeIter.hasNext && removedValueRow == null) { + val UnsafeRowPair(keyRow, valueRow) = rangeIter.next() + if (watermarkPredicateForKeys.get.eval(keyRow)) { +store.remove(keyRow) +removedValueRow = valueRow + } +} +if (removedValueRow == null) { + finished = true + null +} else { + removedValueRow +} + } -numTotalStateRows += store.numKeys() -store.updates().filter(_.isInstanceOf[ValueRemoved]).map { removed => - numOutputRows += 1 - removed.value.asInstanceOf[InternalRow] + override protected def close(): Unit = { +allRemovalsTimeMs += System.currentTimeMillis - removalStartTime +commitTimeMs += timeTakenMs { store.commit() } +numTotalStateRows += store.numKeys() + } } // Update and output
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119013976 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -102,28 +103,100 @@ trait StateStore { } -/** Trait representing a provider of a specific version of a [[StateStore]]. */ +/** + * Trait representing a provider that provide [[StateStore]] instances representing + * versions of state data. + * + * The life cycle of a provider and its provide stores are as follows. + * + * - A StateStoreProvider is created in a executor for each unique [[StateStoreId]] when + * the first batch of a streaming query is executed on the executor. All subsequent batches reuse + * this provider instance until the query is stopped. + * + * - Every batch of streaming data request a specific version of the state data by invoking + * `getStore(version)` which returns an instance of [[StateStore]] through which the required + * version of the data can be accessed. It is the responsible of the provider to populate + * this store with context information like the schema of keys and values, etc. + * + * - After the streaming query is stopped, the created provider instances are lazily disposed off. + */ trait StateStoreProvider { - /** Get the store with the existing version. */ + /** + * Initialize the provide with more contextual information from the SQL operator. + * This method will be called first after creating an instance of the StateStoreProvider by + * reflection. + * + * @param stateStoreId Id of the versioned StateStores that this provider will generate + * @param keySchema Schema of keys to be stored + * @param valueSchema Schema of value to be stored + * @param keyIndexOrdinal Optional column (represent as the ordinal of the field in keySchema) by + *which the StateStore implementation could index the data. + * @param storeConfs Configurations used by the StateStores + * @param hadoopConf Hadoop configuration that could be used by StateStore to save state data + */ + def init( + stateStoreId: StateStoreId, + keySchema: StructType, + valueSchema: StructType, + keyIndexOrdinal: Option[Int], // for sorting the data by their keys + storeConfs: StateStoreConf, + hadoopConf: Configuration): Unit + + /** + * Return the id of the StateStores this provider will generate. + * Should be the same as the one passed in init(). + */ + def id: StateStoreId + + /** Called when the provider instance is unloaded from the executor */ + def close(): Unit + + /** Return an instance of [[StateStore]] representing state data of the given version */ def getStore(version: Long): StateStore - /** Optional method for providers to allow for background maintenance */ + /** Optional method for providers to allow for background maintenance (e.g. compactions) */ def doMaintenance(): Unit = { } } - -/** Trait representing updates made to a [[StateStore]]. */ -sealed trait StoreUpdate { - def key: UnsafeRow - def value: UnsafeRow +object StateStoreProvider { + /** + * Return a provider instance of the given provider class. + * The instance will be already initialized. + */ + def instantiate( + providerClass: String, + stateStoreId: StateStoreId, + keySchema: StructType, + valueSchema: StructType, + indexOrdinal: Option[Int], // for sorting the data + storeConf: StateStoreConf, + hadoopConf: Configuration): StateStoreProvider = { +val provider = Utils.getContextOrSparkClassLoader --- End diff -- nit: Use `Utils.classForName(providerClass)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18107#discussion_r119014486 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -47,50 +44,54 @@ trait StateStore { /** Version of the data in this store before committing updates. */ def version: Long - /** Get the current value of a key. */ - def get(key: UnsafeRow): Option[UnsafeRow] - /** - * Return an iterator of key-value pairs that satisfy a certain condition. - * Note that the iterator must be fail-safe towards modification to the store, that is, - * it must be based on the snapshot of store the time of this call, and any change made to the - * store while iterating through iterator should not cause the iterator to fail or have - * any affect on the values in the iterator. + * Get the current value of a non-null key. --- End diff -- nit: please mention that `null` means key doesn't exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18142 **[Test build #77520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77520/testReport)** for PR 18142 at commit [`3f253f3`](https://github.com/apache/spark/commit/3f253f37b1660a6f69376986b470213c38c10cc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18142#discussion_r119014081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -17,51 +17,74 @@ package org.apache.spark.sql.catalyst.analysis -import java.lang.reflect.Modifier +import java.util.Locale +import javax.annotation.concurrent.GuardedBy +import scala.collection.mutable import scala.language.existentials import scala.reflect.ClassTag import scala.util.{Failure, Success, Try} import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.FunctionBuilder import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.expressions.xml._ -import org.apache.spark.sql.catalyst.util.StringKeyHashMap import org.apache.spark.sql.types._ /** * A catalog for looking up user defined functions, used by an [[Analyzer]]. * - * Note: The implementation should be thread-safe to allow concurrent access. + * Note: + * 1) The implementation should be thread-safe to allow concurrent access. + * 2) the database name is always case-sensitive here, callers are responsible to + * format the database name w.r.t. case-sensitive config. */ trait FunctionRegistry { - final def registerFunction(name: String, builder: FunctionBuilder): Unit = { -registerFunction(name, new ExpressionInfo(builder.getClass.getCanonicalName, name), builder) + final def registerFunction(name: FunctionIdentifier, builder: FunctionBuilder): Unit = { +val info = new ExpressionInfo( + builder.getClass.getCanonicalName, name.database.orNull, name.funcName) +registerFunction(name, info, builder) } - def registerFunction(name: String, info: ExpressionInfo, builder: FunctionBuilder): Unit + def registerFunction( +name: FunctionIdentifier, +info: ExpressionInfo, +builder: FunctionBuilder): Unit + + /* Create or replace a temporary function. */ + final def createOrReplaceTempFunction(name: String, builder: FunctionBuilder): Unit = { --- End diff -- Since we already expose `FunctionRegistry` to the stable class `UDFRegistration`, I added this extra API for a helper function. Ideally, this function should only exist in `SessionCatalog`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18142#discussion_r119013261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1205,8 +1204,8 @@ class SessionCatalog( requireDbExists(dbName) val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { f => FunctionIdentifier(f, Some(dbName)) } -val loadedFunctions = - StringUtils.filterPattern(functionRegistry.listFunction(), pattern).map { f => +val loadedFunctions = StringUtils + .filterPattern(functionRegistry.listFunction().map(_.unquotedString), pattern).map { f => --- End diff -- This PR keeps the current behavior. However, I think it is also a bug. The user-specified `pattern` should not consider the database name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18142#discussion_r119013093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1116,8 +1115,8 @@ class SessionCatalog( // TODO: just make function registry take in FunctionIdentifier instead of duplicating this val database = name.database.orElse(Some(currentDb)).map(formatDatabaseName) val qualifiedName = name.copy(database = database) -functionRegistry.lookupFunction(name.funcName) --- End diff -- This also sounds a bug. This line ignores the database name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18142 **[Test build #77519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77519/testReport)** for PR 18142 at commit [`201787f`](https://github.com/apache/spark/commit/201787f7f01cadf21ca1f9c30304aa4a26af8226). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77515/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18142#discussion_r119012967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -72,39 +89,53 @@ trait FunctionRegistry { class SimpleFunctionRegistry extends FunctionRegistry { - protected val functionBuilders = -StringKeyHashMap[(ExpressionInfo, FunctionBuilder)](caseSensitive = false) --- End diff -- This has a bug. The database name could be case sensitive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77515/testReport)** for PR 18134 at commit [`69d4227`](https://github.com/apache/spark/commit/69d42278cf6eeb13415b9627cdb7019c333547fa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18142#discussion_r119012802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -72,39 +89,53 @@ trait FunctionRegistry { class SimpleFunctionRegistry extends FunctionRegistry { - protected val functionBuilders = -StringKeyHashMap[(ExpressionInfo, FunctionBuilder)](caseSensitive = false) + @GuardedBy("this") + private val functionBuilders = +new mutable.HashMap[FunctionIdentifier, (ExpressionInfo, FunctionBuilder)] + + // Resolution of the function name is always case insensitive, but the database name + // depends on the caller + private def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = { +FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), name.database) + } override def registerFunction( - name: String, + name: FunctionIdentifier, info: ExpressionInfo, builder: FunctionBuilder): Unit = synchronized { -functionBuilders.put(name, (info, builder)) +functionBuilders.put(normalizeFuncName(name), (info, builder)) } - override def lookupFunction(name: String, children: Seq[Expression]): Expression = { + override def lookupFunction(name: FunctionIdentifier, children: Seq[Expression]): Expression = { val func = synchronized { - functionBuilders.get(name).map(_._2).getOrElse { + functionBuilders.get(normalizeFuncName(name)).map(_._2).getOrElse { throw new AnalysisException(s"undefined function $name") } } func(children) } - override def listFunction(): Seq[String] = synchronized { -functionBuilders.iterator.map(_._1).toList.sorted --- End diff -- This `sorted` is useless. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18141 Better to add tests in `SQLQueryTestSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as fun...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/18142 [SPARK-20918] [SQL] Use FunctionIdentifier as function identifiers in FunctionRegistry ### What changes were proposed in this pull request? Currently, the unquoted string of a function identifier is being used as the function identifier in the function registry. This could cause the incorrect the behavior when users use `.` in the function names. This PR is to take the `FunctionIdentifier` as the identifier in the function registry. ### How was this patch tested? TODO: add extra test cases to verify the inclusive bug fixes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark fuctionRegistry Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18142 commit a374e9f14fd2486bb8a77c24d9ff8c3aa12d7bd4 Author: Xiao LiDate: 2017-05-30T04:38:18Z fix. commit 201787f7f01cadf21ca1f9c30304aa4a26af8226 Author: Xiao Li Date: 2017-05-30T04:51:06Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77513/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77513/testReport)** for PR 18134 at commit [`dcae776`](https://github.com/apache/spark/commit/dcae77600fc0cca41d2e3e607469232f59a021af). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77512/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18138 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18138 **[Test build #77512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77512/testReport)** for PR 18138 at commit [`895f414`](https://github.com/apache/spark/commit/895f414983250a708fee46b7879de1524f01c368). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #77518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77518/testReport)** for PR 14158 at commit [`114401a`](https://github.com/apache/spark/commit/114401a630650623c7c311bf753d4422d98e1550). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77518/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #77518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77518/testReport)** for PR 14158 at commit [`114401a`](https://github.com/apache/spark/commit/114401a630650623c7c311bf753d4422d98e1550). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18134 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77514/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77514/testReport)** for PR 18134 at commit [`87defdc`](https://github.com/apache/spark/commit/87defdc91a7be6ba027cc196e76a552bf47a01f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedRelation(` * `case class StringReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17750: [SPARK-4899][MESOS] Support for Checkpointing on ...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/17750#discussion_r119009317 --- Diff: docs/running-on-mesos.md --- @@ -516,6 +516,16 @@ See the [configuration page](configuration.html) for information on Spark config Fetcher Cache + + spark.mesos.checkpoint + false + +If set to true, the agents that are running the Spark executors will write the framework pid, executor pids and status updates to disk. --- End diff -- nit: the *mesos* agents --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18140 @felixcheung Please take a look. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18141 **[Test build #77517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77517/testReport)** for PR 18141 at commit [`0a7eab0`](https://github.com/apache/spark/commit/0a7eab0e456ffc0113ec9d39618617b970922f9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18141: [SPARK-20916][SQL] Improve error message for unaliased s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18141 cc @JoshRosen @cloud-fan @hvanhovell @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18141: [SPARK-20916][SQL] Improve error message for unal...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18141 [SPARK-20916][SQL] Improve error message for unaliased subqueries in FROM clause ## What changes were proposed in this pull request? We changed the parser to reject unaliased subqueries in the FROM clause in SPARK-20690. However, the error message that we now give isn't very helpful: scala> sql("""SELECT x FROM (SELECT 1 AS x)""") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9) We should modify the parser to throw a more clear error for such queries: scala> sql("""SELECT x FROM (SELECT 1 AS x)""") org.apache.spark.sql.catalyst.parser.ParseException: The unaliased subqueries in the FROM clause are not supported.(line 1, pos 14) ## How was this patch tested? Modified existing tests to reflect this change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-20916 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18141.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18141 commit 0a7eab0e456ffc0113ec9d39618617b970922f9b Author: Liang-Chi HsiehDate: 2017-05-30T03:52:47Z Improve error message for unaliased subqueries in FROM clause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77510/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #77510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77510/testReport)** for PR 14158 at commit [`69180bd`](https://github.com/apache/spark/commit/69180bd5b5b21725ff1e498e98690bc261f079f7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18140 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77516/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14158#discussion_r119008402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala --- @@ -65,13 +65,29 @@ abstract class AbstractSqlParser extends ParserInterface with Logging { } /** Creates LogicalPlan for a given SQL string. */ - override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser => -astBuilder.visitSingleStatement(parser.singleStatement()) match { - case plan: LogicalPlan => plan - case _ => -val position = Origin(None, None) -throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position) + override def parsePlan(sqlText: String): LogicalPlan = { +val logicalPlan = parse(sqlText) { parser => + astBuilder.visitSingleStatement(parser.singleStatement()) match { +case plan: LogicalPlan => plan +case _ => + val position = Origin(None, None) + throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position) + } +} +// Record the original sql text in the top logical plan for checking in the web UI. +// Truncate the text to avoid downing browsers or web UI servers by running out of memory. +val maxLength = 1000 +val suffix = " ... (truncated)" +val truncateLength = maxLength - suffix.length +val truncatedSqlText = { + if (sqlText.length <= maxLength) { +sqlText + } else { +sqlText.substring(0, truncateLength) + suffix + } } +logicalPlan.sqlText = Some(truncatedSqlText) +logicalPlan --- End diff -- The solution in this PR looks intrusive to me. If we really want to store the original sql text, we can add it into the QueryExecution. The value can be initialized when we build the QueryExecution --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18140 **[Test build #77516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14158#discussion_r119007096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -258,6 +258,9 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { * Refreshes (or invalidates) any metadata/data cached in the plan recursively. */ def refresh(): Unit = children.foreach(_.refresh()) + + // Record the original sql text in the top logical plan for checking in the web UI. + var sqlText: Option[String] = None --- End diff -- Using `var` for this should be avoided. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18140 **[Test build #77516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77515/testReport)** for PR 18134 at commit [`69d4227`](https://github.com/apache/spark/commit/69d42278cf6eeb13415b9627cdb7019c333547fa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18140 **[Test build #77511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18140 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77511/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18140 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18134 LGTM except for one comment about the function description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18134: [SPARK-20909][SQL] Add build-int SQL function - D...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18134#discussion_r119002317 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -404,6 +404,44 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa // scalastyle:off line.size.limit @ExpressionDescription( + usage = "_FUNC_(date) - Returns the weekday index for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).", --- End diff -- As Sunday, Saturday are included, it is not only weekday. `Returns the day of the week ...`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18122 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77514/testReport)** for PR 18134 at commit [`87defdc`](https://github.com/apache/spark/commit/87defdc91a7be6ba027cc196e76a552bf47a01f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18122 @yanboliang I have moved the tests to the test file. Please let me know if there is anything else needed. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 16ccbdf is successful . Please review the pull request . @MLnick @sethah @mpjlu @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18134: [SPARK-20909][SQL] Add build-int SQL function - D...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18134#discussion_r119000601 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -402,6 +402,44 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa } } +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(date) - Returns the weekday index for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).", + extended = """ +Examples: + > SELECT _FUNC_('2009-07-30'); + 5 + """) +// scalastyle:on line.size.limit +case class DayOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(DateType) + + override def dataType: DataType = IntegerType + + @transient private lazy val c = { +Calendar.getInstance(DateTimeUtils.getTimeZone("UTC")) + } + + override protected def nullSafeEval(date: Any): Any = { +c.setTimeInMillis(date.asInstanceOf[Int] * 1000L * 3600L * 24L) +c.get(Calendar.DAY_OF_WEEK) --- End diff -- Keep pace with [Hive's DayOfWeek ](https://github.com/apache/hive/blob/59539885725a96cca4b3f0759a5b26e0d8198dc8/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfWeek.java#L55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18134: [SPARK-20909][SQL] Add build-int SQL function - DAYOFWEE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18134 **[Test build #77513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77513/testReport)** for PR 18134 at commit [`dcae776`](https://github.com/apache/spark/commit/dcae77600fc0cca41d2e3e607469232f59a021af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18138 **[Test build #77512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77512/testReport)** for PR 18138 at commit [`895f414`](https://github.com/apache/spark/commit/895f414983250a708fee46b7879de1524f01c368). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18140: [ML][SparkR] SparkR supports string encoding consistent ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18140 **[Test build #77511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18140: Spark r formula
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/18140 Spark r formula ## What changes were proposed in this pull request? Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support string encoding that is consistent with default R. ## How was this patch tested? new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark sparkRFormula Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18140 commit be7a0fb993ad1fbe60576cd39ca86b20d45289a6 Author: actuaryzhangDate: 2017-05-28T01:39:51Z add stringIndexerOrderType to SparkR glm and test result consistency with R commit 826e784e3bf83c3b9a84fc7d9500d15971a7ffd8 Author: actuaryzhang Date: 2017-05-30T01:36:39Z add stringIndexerOrderType to survreg --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18122 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77509/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18122 **[Test build #77509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77509/testReport)** for PR 18122 at commit [`4af4b35`](https://github.com/apache/spark/commit/4af4b3500de27acb0128763be755ea8078736d60). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...
Github user nblintao commented on a diff in the pull request: https://github.com/apache/spark/pull/14158#discussion_r11899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala --- @@ -50,13 +50,29 @@ abstract class AbstractSqlParser extends ParserInterface with Logging { } /** Creates LogicalPlan for a given SQL string. */ - override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser => -astBuilder.visitSingleStatement(parser.singleStatement()) match { - case plan: LogicalPlan => plan - case _ => -val position = Origin(None, None) -throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position) + override def parsePlan(sqlText: String): LogicalPlan = { +val logicalPlan = parse(sqlText) { parser => + astBuilder.visitSingleStatement(parser.singleStatement()) match { +case plan: LogicalPlan => plan +case _ => + val position = Origin(None, None) + throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position) + } +} +// Record the original sql text in the top logical plan for checking in the web UI. +// Truncate the text to avoid downing browsers or web UI servers by running out of memory. +val maxLength = 1000 +val suffix = " ... (truncated)" +val truncateLength = maxLength - suffix.length --- End diff -- I think either way is okay. Here, I am considering keeping the text displayed (including suffix) less than 1000 chars. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web ...
Github user nblintao commented on a diff in the pull request: https://github.com/apache/spark/pull/14158#discussion_r118998724 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -326,7 +336,16 @@ private[spark] object UIUtils extends Logging { val headerRow: Seq[Node] = { headers.view.zipWithIndex.map { x => -{getHeaderContent(x._1)} +val toolTipOption = getToolTip(x._2) +if (toolTipOption.isEmpty) { + {getHeaderContent(x._1)} +} else { + val toolTip = toolTipOption.get + // scalastyle:off line.size.limit --- End diff -- Fixed. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14158 I have just rebased. @ajbozarth @HyukjinKwon @gatorsmile @srowen @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #77510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77510/testReport)** for PR 14158 at commit [`69180bd`](https://github.com/apache/spark/commit/69180bd5b5b21725ff1e498e98690bc261f079f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17308 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17935 @JoshRosen Thanks for filing this issue. I'll look into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18122 **[Test build #77509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77509/testReport)** for PR 18122 at commit [`4af4b35`](https://github.com/apache/spark/commit/4af4b3500de27acb0128763be755ea8078736d60). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/17308 LGTM. Merging to master and 2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/17935 I was trying to run a test case from another database which _does_ support unaliased subqueries in the `FROM` clause and hit a confusing parser error due to this patch's behavior change. While I agree that we shouldn't necessarily support this syntax, I think that the current error message that we're returning isn't very good so I've file https://issues.apache.org/jira/browse/SPARK-20916 to improve it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18132: [SPARK-8184][SQL] Add additional function descrip...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18132: [SPARK-8184][SQL] Add additional function description fo...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18132 Thanks - merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118989451 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1090,6 +1090,24 @@ class SessionCatalog( } } + /** Create a temporary macro. */ + def createTempMacro( + name: String, + info: ExpressionInfo, + functionBuilder: FunctionBuilder): Unit = { +if (functionRegistry.functionExists(name)) { + throw new AnalysisException(s"Function $name already exists") +} +functionRegistry.registerFunction(name, info, functionBuilder) + } + + /** Drop a temporary macro. */ + def dropTempMacro(name: String, ignoreIfNotExists: Boolean): Unit = { +if (!functionRegistry.dropMacro(name) && !ignoreIfNotExists) { + throw new NoSuchTempMacroException(name) --- End diff -- ``` hive> DROP TEMPORARY MACRO max; OK Time taken: 0.01 seconds hive> select max(3) from t1; OK 3 ``` After we drop the macro, the existing function works well. That means, we did not delete the original built-in functions. The built-in function will not be dropped by ` DROP TEMPORARY MACRO`. After we drop the macro with the same name, the original function `max` is using the original built-in function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118989143 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1090,6 +1090,24 @@ class SessionCatalog( } } + /** Create a temporary macro. */ + def createTempMacro( + name: String, + info: ExpressionInfo, + functionBuilder: FunctionBuilder): Unit = { +if (functionRegistry.functionExists(name)) { --- End diff -- ``` hive> create temporary macro max(x int) > x*x; OK Time taken: 0.014 seconds hive> select max(3) from t1; OK 9 Time taken: 0.468 seconds, Fetched: 1 row(s) hive> select max(3,4) from t1; FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch '4': The macro max accepts exactly 1 arguments. ``` Hive overwrites the temporary function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118987906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala --- @@ -52,3 +52,6 @@ class NoSuchPartitionsException(db: String, table: String, specs: Seq[TableParti class NoSuchTempFunctionException(func: String) extends AnalysisException(s"Temporary function '$func' not found") + +class NoSuchTempMacroException(func: String) --- End diff -- Please remove it. For reasons, please see the PR https://github.com/apache/spark/pull/17716. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77507/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17880: [SPARK-20620][TEST]Improve some unit tests for NullExpre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17880 **[Test build #77507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77507/testReport)** for PR 17880 at commit [`3110f0f`](https://github.com/apache/spark/commit/3110f0f0c1a09b28a5706674ae65fd47ce48b163). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18122 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18122 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77508/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18122 **[Test build #77508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77508/testReport)** for PR 18122 at commit [`320203e`](https://github.com/apache/spark/commit/320203eeea6d7613bb091f01b170fbfa2805b2a0). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SparkMLTests(ReusedPySparkTestCase):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18122: [SPARK-20899][PySpark] PySpark supports stringIndexerOrd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18122 **[Test build #77508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77508/testReport)** for PR 18122 at commit [`320203e`](https://github.com/apache/spark/commit/320203eeea6d7613bb091f01b170fbfa2805b2a0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18139: Spark 20787 invalid years
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18139 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18139: Spark 20787 invalid years
GitHub user rberenguel opened a pull request: https://github.com/apache/spark/pull/18139 Spark 20787 invalid years `time.mktime` can't handle dates from 1899-100, according to the documentation by design. `calendar.timegm` is equivalent in shared cases, but can handle those years. ## What changes were proposed in this pull request? Change `time.mktime` for the more able `calendar.timegm` to adress cases like: ```python import datetime as dt sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1899,12,31)]])).count() ``` failing due to internal conversion failure when there is no timezone information in the time object. In the case there is information, `calendar` was used instead. ## How was this patch tested? The existing test cases cover this change, since it does not change any existing functionality. Added a test to confirm it working in the problematic range. This PR is original work from me and I license this work to the Spark project You can merge this pull request into a Git repository by running: $ git pull https://github.com/rberenguel/spark SPARK-20787-invalid-years Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18139 commit 6c0312f94e3fce2bf4d6a30055bd747be535bb0f Author: Ruben Berenguel MontoroDate: 2017-05-29T15:46:21Z SPARK-20787 time.mktime canât handle dates from 1899-100, by construction. calendar.timegm is equivalent in shared cases, but can handle those commit d3c41b5f18971168870524ad3a5fac876859bf4b Author: Ruben Berenguel Montoro Date: 2017-05-29T19:42:54Z SPARK-20787 Technically a hack. Using gmtime everywhere does not work well with DST shifts. So, for timeranges that donât work well with mktime, use gmtime --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18135: [SPARK-20907][test] Use testQuietly for test suit...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18135 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18135: [SPARK-20907][test] Use testQuietly for test suites that...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18135 LGTM. Merging to master and 2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18138 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77505/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18138 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18138 **[Test build #77505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77505/testReport)** for PR 18138 at commit [`3ac9fb0`](https://github.com/apache/spark/commit/3ac9fb07ef2f53315247ad12d391b1bed92319e9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org