[GitHub] spark pull request #19095: [SPARK-21886][SQL] Use SparkSession.internalCreat...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/19095 [SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create⦠⦠Dataset with LogicalRDD logical operator ## What changes were proposed in this pull request? Reusing `SparkSession.internalCreateDataFrame` wherever possible (to cut dups) ## How was this patch tested? Local build and waiting for Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-21886-internalCreateDataFrame Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19095 commit fff4b6131cfb2143af465d28ab3cde4fb6af313e Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-08-31T10:44:24Z [SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create Dataset with LogicalRDD logical operator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/19089 Logs are back with the change. ð Thanks (and don't mess it up again fixing STS :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19056#discussion_r135989439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -130,16 +130,7 @@ class TextSocketSource(host: String, port: Int, includeTimestamp: Boolean, sqlCo val rdd = sqlContext.sparkContext.parallelize(rawList).map( v => InternalRow(UTF8String.fromString(v._1), v._2.getTime())) --- End diff -- May I ask to replace `v` with `case...` as follows? IMHO That would make things easier to read. ``` val rdd = sqlContext.sparkContext. parallelize(rawList). map { case (v, ts) => InternalRow(UTF8String.fromString(v), ts.getTime) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19056#discussion_r135610992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -126,16 +128,17 @@ class TextSocketSource(host: String, port: Int, includeTimestamp: Boolean, sqlCo batches.slice(sliceStart, sliceEnd) } -import sqlContext.implicits._ -val rawBatch = sqlContext.createDataset(rawList) +val rdd = sqlContext.sparkContext.parallelize(rawList).map( +v => InternalRow(UTF8String.fromString(v._1), v._2.getTime())) +val rawBatch = sqlContext.internalCreateDataFrame(rdd, schema, isStreaming = true) // Underlying MemoryStream has schema (String, Timestamp); strip out the timestamp // if requested. if (includeTimestamp) { rawBatch.toDF("value", "timestamp") --- End diff -- I think that the schema will already handle what fields are included in `rawBatch` and this `if` is no longer necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19056#discussion_r135610632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -126,16 +128,17 @@ class TextSocketSource(host: String, port: Int, includeTimestamp: Boolean, sqlCo batches.slice(sliceStart, sliceEnd) } -import sqlContext.implicits._ -val rawBatch = sqlContext.createDataset(rawList) +val rdd = sqlContext.sparkContext.parallelize(rawList).map( +v => InternalRow(UTF8String.fromString(v._1), v._2.getTime())) +val rawBatch = sqlContext.internalCreateDataFrame(rdd, schema, isStreaming = true) // Underlying MemoryStream has schema (String, Timestamp); strip out the timestamp // if requested. if (includeTimestamp) { rawBatch.toDF("value", "timestamp") } else { // Strip out timestamp - rawBatch.select("_1").toDF("value") + rawBatch.select("value").toDF() --- End diff -- `toDF` is unnecessary since it's already a DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19056#discussion_r135610234 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -39,6 +39,16 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations) + override protected def checkInvariants( + result: LogicalPlan, + original: LogicalPlan, + rule: Rule[LogicalPlan]): Unit = { +assert( + result.isStreaming == original.isStreaming, + s"Rule ${rule.ruleName} changed isStreaming from original ${original.isStreaming}:" + --- End diff -- Space before the closing `"`"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18642 @zsxwing @tdas Could you review the change and let me know what you think? I'd appreciate. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18642 @zsxwing @tdas Your friendly reminder to give the change a nice review. I'd appreciate. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGr...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/18642 [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses flatMapGroupsWithState ## What changes were proposed in this pull request? Refactored `KeyValueGroupedDataset.mapGroupsWithState` to use `flatMapGroupsWithState` explicitly (so it's clear that the former is almost the latter). ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark mapGroupsWithState Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18642 commit ce51466411983181b52cac07610d2bece2ffc5e8 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-07-15T09:31:55Z [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses flatMapGroupsWithState --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18539: [SPARK-21313][SS] ConsoleSink's string representation
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18539 If you know how to display `ForeachWriter` that's passed in to `ForeachSink` nicely, let me know. `getClass.getName` didn't convince me and so I left it out. It'd be very helpful to see what the implementation is though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18509#discussion_r125874046 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala --- @@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: EventTimeStats = EventTime case class EventTimeWatermarkExec( eventTime: Attribute, delay: CalendarInterval, -child: SparkPlan) extends SparkPlan { +child: SparkPlan) extends UnaryExecNode { --- End diff -- I disagree that the simple change requires a JIRA since it's about making the code more readable, but here it comes anyway...https://issues.apache.org/jira/browse/SPARK-21329 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18539: [SPARK-21313][SS] ConsoleSink's string representation
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18539 I think that `ConsoleSink` was the only one with this mysterious name. We could however have another JIRA to _somehow_ unify how options are printed out for sources and sinks. I don't think it should be a part of this (smallish) change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18523 Thanks @facaiy for the changes. I wonder if the code could `collect` all the columns with incorrect type in one go (rather than reporting issues column by column until a user fixed all). Something along the lines: ``` val incorrectColumns = inputColNames .map { name => (name, schema(name).dataType) } .collect { case (_, _: NumericType | BooleanType) => case t if t.isInstanceOf[VectorUDT] => case other => other } // report issue with incorrectColumns ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18539: [SPARK-21313][SS] ConsoleSink's string representa...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/18539 [SPARK-21313][SS] ConsoleSink's string representation ## What changes were proposed in this pull request? Add `toString` with options for `ConsoleSink` so it shows nicely in query progress. **BEFORE** ``` "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@4b340441" } ``` **AFTER** ``` "sink" : { "description" : "ConsoleSink[numRows=10, truncate=false]" } ``` ## How was this patch tested? Local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-21313-ConsoleSink-toString Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18539 commit 39a132f58092b81675bf271977cb13ba6b5dc428 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-07-05T08:15:37Z [SPARK-21313][SS] ConsoleSink's string representation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18509#discussion_r125571708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala --- @@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: EventTimeStats = EventTime case class EventTimeWatermarkExec( eventTime: Attribute, delay: CalendarInterval, -child: SparkPlan) extends SparkPlan { +child: SparkPlan) extends UnaryExecNode { --- End diff -- @zsxwing friendly ping... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125397518 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,12 @@ class VectorAssembler @Since("1.4.0") (@Since("1.4.0") override val uid: String) override def transformSchema(schema: StructType): StructType = { val inputColNames = $(inputCols) val outputColName = $(outputCol) -val inputDataTypes = inputColNames.map(name => schema(name).dataType) +val inputDataTypes = inputColNames.map(name => (name, schema(name).dataType)) inputDataTypes.foreach { - case _: NumericType | BooleanType => - case t if t.isInstanceOf[VectorUDT] => - case other => -throw new IllegalArgumentException(s"Data type $other is not supported.") + case (_, _: NumericType | BooleanType) => + case (_, t) if t.isInstanceOf[VectorUDT] => --- End diff -- I wonder why the line does not do `(_, t: VectorUDT)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18509#discussion_r125353689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala --- @@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: EventTimeStats = EventTime case class EventTimeWatermarkExec( eventTime: Attribute, delay: CalendarInterval, -child: SparkPlan) extends SparkPlan { +child: SparkPlan) extends UnaryExecNode { --- End diff -- I don't see it used in the other physical operators like [HashAggregateExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L45) or [CoalesceExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L574) or [BroadcastExchangeExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L41) etc. Why would that matter here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/18509 [SS][MINOR] Make EventTimeWatermarkExec explicitly UnaryExecNode ## What changes were proposed in this pull request? Making EventTimeWatermarkExec explicitly UnaryExecNode /cc @tdas @zsxwing ## How was this patch tested? Local build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark EventTimeWatermarkExec-UnaryExecNode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18509.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18509 commit 2cc9fdca381f50cb405bf5fbcaa7229652749a83 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-07-03T07:07:12Z [SS][MINOR] Make EventTimeWatermarkExec explicitly UnaryExecNode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122617147 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => +data.show(data.count().toInt, false) --- End diff -- `ConsoleSink` [has two options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30) that could be used here -- `numRows` and `truncate`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122616876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => --- End diff -- Underscore `dataSource` since it's not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18144 @cloud-fan If consistency is to remove (not add) I'm fine. Either way consistency is the ultimate goal (as I myself am running into this discrepancy far too often). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18074#discussion_r119168394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -153,12 +153,13 @@ case class WindowExec( } /** - * Collection containing an entry for each window frame to process. Each entry contains a frames' - * WindowExpressions and factory function for the WindowFrameFunction. + * Collection containing an entry for each window frame to process. Each entry contains a frame's + * [[WindowExpression]]s and factory function for the WindowFrameFunction. */ private[this] lazy val windowFrameExpressionFactoryPairs = { type FrameKey = (String, FrameType, Option[Int], Option[Int]) type ExpressionBuffer = mutable.Buffer[Expression] +type WindowExpressionBuffer = mutable.Buffer[WindowExpression] --- End diff -- Sorry. Reverted the change manually and looks like it slipped through the cracks. Apologies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18144 @cloud-fan I don't understand why would that be an issue...ever. The API is not consistent and I often run into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/18074 Hey @srowen could you review the changes again and accept possibly? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18074#discussion_r118788857 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -153,19 +153,24 @@ case class WindowExec( } /** - * Collection containing an entry for each window frame to process. Each entry contains a frames' - * WindowExpressions and factory function for the WindowFrameFunction. + * Collection containing an entry for each window frame to process. Each entry contains a frame's + * [[WindowExpression]]s and factory function for the WindowFrameFunction. */ - private[this] lazy val windowFrameExpressionFactoryPairs = { + private[this] lazy val windowFrameExpressionFactoryPairs: +Seq[(mutable.Buffer[WindowExpression], InternalRow => WindowFunctionFrame)] = + { type FrameKey = (String, FrameType, Option[Int], Option[Int]) type ExpressionBuffer = mutable.Buffer[Expression] -val framedFunctions = mutable.Map.empty[FrameKey, (ExpressionBuffer, ExpressionBuffer)] +type WindowExpressionBuffer = mutable.Buffer[WindowExpression] +val framedFunctions = mutable.Map.empty[FrameKey, (WindowExpressionBuffer, ExpressionBuffer)] // Add a function and its function to the map for a given frame. -def collect(tpe: String, fr: SpecifiedWindowFrame, e: Expression, fn: Expression): Unit = { +def collect( + tpe: String, fr: SpecifiedWindowFrame, e: WindowExpression, fn: Expression): Unit = --- End diff -- I'll revert it to not "pollute" the PR. It was much easier to read the code when the internals were less generic (that does nothing at all). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15575 I'm late with this, but just leaving it for future code reviewers... I think the change took the most extreme path where even such simple `outputPartitioning` as the one in `WindowExec` (see below) is currently part of the extension of `UnaryExecNode` while I think it should stay with `UnaryExecNode` and be overridden when the output partitioning is different. It's a kind of code duplication. ``` override def outputPartitioning: Partitioning = child.outputPartitioning ``` The above serves nothing but says _"I simply don't care what happens upstream"_. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18074#discussion_r118436041 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -35,12 +35,13 @@ import org.apache.spark.sql.types.NumericType import org.apache.spark.sql.types.StructType /** - * A set of methods for aggregations on a `DataFrame`, created by `Dataset.groupBy`. + * A set of methods for aggregations on a `DataFrame`, created by [[Dataset#groupBy groupBy]], + * [[Dataset#cube cube]] or [[Dataset#rollup rollup]] (and also [[pivot]]). * - * The main method is the agg function, which has multiple variants. This class also contains - * convenience some first order statistics such as mean, sum for convenience. + * The main method is the [[agg]] function, which has multiple variants. This class also contains --- End diff -- No scaladoc. I was focused on javadoc and seems I missed it. Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/18074 [DOCS][MINOR] Scaladoc fixes (aka typo hunting) ## What changes were proposed in this pull request? Minor changes to scaladoc ## How was this patch tested? Local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark scaladoc-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18074.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18074 commit 5a7eac8dbda2fe3f2f9fb03be0548835e329014d Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-05-23T14:58:52Z [DOCS][MINOR] Scaladoc fixes (aka typo hunting) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18026: [SPARK-16202][SQL][DOC] Follow-up to Correct The ...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/18026 [SPARK-16202][SQL][DOC] Follow-up to Correct The Description of CreatableRelationProvider's createRelation ## What changes were proposed in this pull request? Follow-up to SPARK-16202: 1. Remove the duplication of the meaning of `SaveMode` (as one was in fact missing that had proven that the duplication may be incomplete in the future again) 2. Use standard scaladoc tags /cc @gatorsmile @rxin @yhuai (as they were involved previously) ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark CreatableRelationProvider-SPARK-16202 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18026 commit 7bf5d1962aa750079da313cd6d106bf8c616d448 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-05-18T07:56:28Z [SPARK-16202][SQL][DOC] Follow-up to Correct The Description of CreatableRelationProvider's createRelation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17917: [SPARK-20600][SS] KafkaRelation should be pretty printed...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17917 https://cloud.githubusercontent.com/assets/62313/25960541/879096ce-3677-11e7-900f-09bd5f200a00.png;> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16960 I'll have a look at this this week and send a PR unless you beat me to it :) Thanks @ala! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17917: [SPARK-20600][SS] KafkaRelation should be pretty ...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17917#discussion_r115711771 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -143,4 +143,6 @@ private[kafka010] class KafkaRelation( validateTopicPartitions(partitions, partitionOffsets) } } + + override def toString: String = "kafka" } --- End diff -- Hehe...been thinking about it too, but thought it'd be too much for such a simple change. It merits a JIRA issue though where one had to find out what and how to display. After all, you could read from multiple topics with different Kafka brokers and offsets. I was worried it'd be too much to handle as part of this issue. I'm up for working on the other issue, but don't think we should hold this one up. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16960 I think that the commit has left [numGeneratedRows](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L344) metrics off, hasn't it? (it was added in #16829) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17904: [SPARK-20630] [Web UI] Fixed column visibility in Execut...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17904 WFM. Thanks @ajbozarth! ``` $ git fetch origin pull/17904/head:17904 $ gco 17904 $ ./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install ``` The following worked fine (http://localhost:4040/executors/): ``` $ ./bin/spark-shell $ ./bin/spark-shell --conf spark.ui.threadDumpsEnabled=true $ ./bin/spark-shell --conf spark.ui.threadDumpsEnabled=false ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17917: [SPARK-20600][SS] KafkaRelation should be pretty ...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17917 [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI ## What changes were proposed in this pull request? User-friendly name of `KafkaRelation` in web UI (under Details for Query). ### Before https://cloud.githubusercontent.com/assets/62313/25841955/74479ac6-34a2-11e7-87fb-d9f62a1356a7.png;> ### After https://cloud.githubusercontent.com/assets/62313/25841829/f5335630-34a1-11e7-85a4-afe9b66d73c8.png;> ## How was this patch tested? Local build ``` ./bin/spark-shell --jars ~/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.0-SNAPSHOT/spark-sql-kafka-0-10_2.11-2.3.0-SNAPSHOT.jar --packages org.apache.kafka:kafka-clients:0.10.0.1 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-20600-KafkaRelation-webUI Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17917.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17917 commit 2ffe4476553cfe50eb6392d8e573545a92fef737 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-05-09T08:20:49Z [SPARK-20600][SQL] KafkaRelation should be pretty printed in web UI (Details for Query) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17727: [SQL][MINOR] Remove misleading comment (and tags ...
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/17727 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17801 Are the errors (that led to `fails to generate documentation`) after my change? Look very weird to me. ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117: error: ExecutorAllocationClient is not public in org.apache.spark; cannot be accessed from outside package [error] public BlacklistTracker (org.apache.spark.scheduler.LiveListenerBus listenerBus, org.apache.spark.SparkConf conf, scala.Option allocationClient, org.apache.spark.util.Clock clock) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118: error: ExecutorAllocationClient is not public in org.apache.spark; cannot be accessed from outside package [error] public BlacklistTracker (org.apache.spark.SparkContext sc, scala.Option allocationClient) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:133: error: ConfigReader is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] private org.apache.spark.internal.config.ConfigReader reader () { throw new RuntimeException(); } [error]^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:138: error: ConfigEntry is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] org.apache.spark.SparkConf set (org.apache.spark.internal.config.ConfigEntry entry, T value) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:139: error: OptionalConfigEntry is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] org.apache.spark.SparkConf set (org.apache.spark.internal.config.OptionalConfigEntry entry, T value) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:187: error: ConfigEntry is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] org.apache.spark.SparkConf setIfMissing (org.apache.spark.internal.config.ConfigEntry entry, T value) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:188: error: OptionalConfigEntry is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] org.apache.spark.SparkConf setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry entry, T value) { throw new RuntimeException(); } [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:208: error: ConfigEntry is not public in org.apache.spark.internal.config; cannot be accessed from outside package [error] org.apache.spark.SparkConf remove (org.apache.spark.internal.config.ConfigEntry entry) { throw new RuntimeException(); } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scala...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17801 [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (and typo hunting) ## What changes were proposed in this pull request? * Docs are consistent (across different `unix_timestamp` variants and their internal expressions) * typo hunting ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark unix_timestamp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17801 commit 20edec80f943b0fb76d8bee26f47f8c19ca299be Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-04-28T08:17:53Z [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (and typo hunting) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17727: [SQL][MINOR] Remove misleading comment (and tags do bett...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17727 Is the comment correct then? I don't think so. What about improving it? I don't mind if we stop discussing it either. It's a tiny change after all (and don't want to drag it along and waste your time). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17727: [SQL][MINOR] Remove misleading comment (and tags do bett...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17727 Fair enough. Let's do it here. Quoting directly from the code: > Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in [[org.apache.spark.sql.sources]] > Converts a logical plan into zero or more SparkPlans. It does not. It's a type alias for something that may do so. And moreover it's copied from `SparkStrategy`. No need for repeating it here. > This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. That's the purpose of `@DeveloperApi` and `@InterfaceStability.Unstable` annotations so, yes, you're right, it's not misleading, but a repetition (that cannot be checked by the Scala compiler). > Developers writing libraries should instead consider using the stable APIs provided in [[org.apache.spark.sql.sources]] How does `org.apache.spark.sql.sources` help here?! `SparkStrategy` is about query planning not DataSource IO. They're at different levels and may interest different developers. Certainly misleading. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17727: [SQL][MINOR] Remove misleading comment (and tags ...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17727 [SQL][MINOR] Remove misleading comment (and tags do better) ## What changes were proposed in this pull request? Misleading comment removed (and tags do a better job to express instability) ## How was this patch tested? Local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark sql-package-object Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17727.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17727 commit 69b68925fd7f0465b060c5279159615c39a35176 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-04-22T09:03:11Z [SQL][MINOR] Remove misleading comment (and tags do better) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112692885 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,20 +47,31 @@ case class UserDefinedFunction protected[sql] ( dataType: DataType, inputTypes: Option[Seq[DataType]]) { - // Optionally used for printing UDF names in EXPLAIN - private var nameOption: Option[String] = None + protected def name: Option[String] = None - def withName(name: String): this.type = { -this.nameOption = Option(name) -this - } + // Optionally used for printing UDF names in EXPLAIN --- End diff -- "Optionally"? I'd assume the old "UDF" is the default unless the name is given. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112692504 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,20 +47,31 @@ case class UserDefinedFunction protected[sql] ( dataType: DataType, inputTypes: Option[Seq[DataType]]) { - // Optionally used for printing UDF names in EXPLAIN - private var nameOption: Option[String] = None + protected def name: Option[String] = None - def withName(name: String): this.type = { -this.nameOption = Option(name) -this - } + // Optionally used for printing UDF names in EXPLAIN + def withName(name: String): this.type = +UserDefinedFunctionWithName(Option(name), f, dataType, inputTypes) /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), nameOption)) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) } } + +/** + * A user-defined function with a function name. + * + * @since 2.3.0 + */ +@InterfaceStability.Stable +case class UserDefinedFunctionWithName protected[sql] ( +override val name: Option[String], --- End diff -- Can a `UserDefinedFunctionWithName` have **no** name? When? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112634952 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala --- @@ -256,10 +256,12 @@ class UDFSuite extends QueryTest with SharedSQLContext { val sparkPlan = spark.sessionState.executePlan(explain).executedPlan sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("") } -val udf1 = "myUdf1" -val udf2 = "myUdf2" -spark.udf.register(udf1, (n: Int) => { n + 1 }) -spark.udf.register(udf2, (n: Int) => { n * 1 }) -assert(explainStr(sql("SELECT myUdf1(myUdf2(1))")).contains(s"UDF:$udf1(UDF:$udf2(1))")) +val udf1Name = "myUdf1" +val udf2Name = "myUdf2" +val udf1 = spark.udf.register(udf1Name, (n: Int) => { n + 1 }) +val udf2 = spark.udf.register(udf2Name, (n: Int) => { n * 1 }) +assert(explainStr(sql("SELECT myUdf1(myUdf2(1))")).contains(s"UDF:$udf1Name(UDF:$udf2Name(1))")) +assert(explainStr(spark.range(1).select(udf1(udf2(functions.lit(1) + .contains(s"UDF:$udf1Name(UDF:$udf2Name(1))")) --- End diff -- The goal of the change was to make sure that the names are the same for SQL and Dataset "modes". The test should check it (even though it does it using the above two tests the last one should rather check equality of SQL's and Dataset's outputs). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112634044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,12 +47,20 @@ case class UserDefinedFunction protected[sql] ( dataType: DataType, inputTypes: Option[Seq[DataType]]) { + // Optionally used for printing UDF names in EXPLAIN + private var nameOption: Option[String] = None --- End diff -- Beside @rxin's comment, I'd also add `name` as an input parameter with default value `None`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112634273 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala --- @@ -256,10 +256,12 @@ class UDFSuite extends QueryTest with SharedSQLContext { val sparkPlan = spark.sessionState.executePlan(explain).executedPlan sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("") } -val udf1 = "myUdf1" -val udf2 = "myUdf2" -spark.udf.register(udf1, (n: Int) => { n + 1 }) -spark.udf.register(udf2, (n: Int) => { n * 1 }) -assert(explainStr(sql("SELECT myUdf1(myUdf2(1))")).contains(s"UDF:$udf1(UDF:$udf2(1))")) +val udf1Name = "myUdf1" +val udf2Name = "myUdf2" +val udf1 = spark.udf.register(udf1Name, (n: Int) => { n + 1 }) --- End diff -- Remove `{` and `}` (as they're not needed) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17670 I think the change should rather be [here](ResolveTableValuedFunctions) where the built-in table-valued function `range` is resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17657: [TEST][MINOR] Replace repartitionBy with distribu...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17657 [TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite ## What changes were proposed in this pull request? Replace non-existent `repartitionBy` with `distribute` in `CollapseRepartitionSuite`. ## How was this patch tested? local build and `catalyst/testOnly *CollapseRepartitionSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark CollapseRepartitionSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17657 commit 427924ab99332f475ed0b46b261eb55eee560c4a Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-04-17T08:24:38Z [TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [DOCS] Docs-only improvements
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r10842 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -60,7 +60,7 @@ import org.apache.spark.util.Utils * The builder can also be used to create a new session: * * {{{ - * SparkSession.builder() + * SparkSession.builder --- End diff -- Consistency (and one of the recommended coding styles of mine). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [DOCS] Docs-only improvements
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108777513 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -26,7 +26,8 @@ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval /** - * Test basic expression parsing. If a type of expression is supported it should be tested here. + * Test basic expression parsing. + * If the type of an expression is supported it should be tested here. --- End diff -- Almost. I replaced `a` with `the` and added `an` before `expression`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [DOCS] Docs-only improvements
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108777037 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -75,7 +75,6 @@ case class WindowSpecDefinition( frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true - override def foldable: Boolean = false --- End diff -- Correct. Reverting... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [DOCS] Docs-only improvements
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108776915 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -52,16 +52,15 @@ * This class implements sort-based shuffle's hash-style shuffle fallback path. This write path * writes incoming records to separate files, one file per reduce partition, then concatenates these * per-partition files to form a single output file, regions of which are served to reducers. - * Records are not buffered in memory. This is essentially identical to - * {@link org.apache.spark.shuffle.hash.HashShuffleWriter}, except that it writes output in a format + * Records are not buffered in memory. It writes output in a format --- End diff -- `HashShuffleWriter` is long gone. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [DOCS] Docs-only improvements
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17417 Executed `cd docs && SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve` to check the changes and they've seemed fine. I had to fix some extra javadoc-related places to please jekyll. @srowen Ready to review the changes once more? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17434#discussion_r108665073 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -492,7 +492,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** - * Add an [[Aggregate]] to a logical plan. + * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan. --- End diff -- The package `org.apache.spark.sql.catalyst.parser` and hence `AstBuilder` don't show in javadoc/scaladoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/17434 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17434 Closing as it was merged into https://github.com/apache/spark/pull/17417. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17434#discussion_r108634808 --- Diff: core/src/main/java/org/apache/spark/memory/MemoryConsumer.java --- @@ -60,8 +60,6 @@ protected long getUsed() { /** * Force spill during building. - * --- End diff -- It's not "for testing only". I'd even say that it's more often used in a non-test code than test code. That made the comment no longer correct. See `ShuffleExternalSorter` and `UnsafeShuffleWriter` which both are `ShuffleWriter` instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17434#discussion_r108633245 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -323,7 +323,7 @@ class SparkSession private( * // |-- age: integer (nullable = true) * * dataFrame.createOrReplaceTempView("people") - * sparkSession.sql("select name from people").collect.foreach(println) + * sparkSession.sql("select name from people").show --- End diff -- I agree. Just for the record, `collect.foreach(println)` should not be "endorsed" in Spark 2's official docs as too RDD-ish. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17434#discussion_r108632961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -492,7 +492,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** - * Add an [[Aggregate]] to a logical plan. + * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan. --- End diff -- Didn't check the javadoc and the change follows `Aggregate`. I'll see how to generate javadoc and check this out (as I've been meaning to do it anyway for some time). Thanks for inspiration! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17434#discussion_r108632590 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -82,13 +82,13 @@ private[spark] class SortShuffleManager(conf: SparkConf) extends ShuffleManager override val shuffleBlockResolver = new IndexShuffleBlockResolver(conf) /** - * Register a shuffle with the manager and obtain a handle for it to pass to tasks. + * Obtains a [[ShuffleHandle]] to pass to tasks. --- End diff -- It's a copy from the main method this one overrides. Since the method does **not** do what the scaladoc said I thought I'd make it current. It might've "Register(ed) a shuffle with the manager" in the past but not today. It was misleading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [SQL][DOC] Use recommended values for row boundaries in ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17417 I'm going to merge the two PRs with your comments applied (i.e. excluding changes that are not necessarily doc-only). Thanks a lot for your time, Sean. Appreciate a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17434: [MINOR] Typo fixes
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17434 You'd asked I delivered @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [SQL][DOC] Use recommended values for row boundaries in ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17417 Hey @srowen Would appreciate your looking at the changes again and comments (or merge). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17434: [SQL][DOC][MINOR] Squashing a typo in from_json f...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17434 [SQL][DOC][MINOR] Squashing a typo in from_json function ## What changes were proposed in this pull request? Just squashing a typo in `from_json` function ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark from_json-typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17434.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17434 commit 3b385039bf88ec32321c4e888cc17180a37b279e Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-03-26T10:13:09Z [SQL][DOC][MINOR] Squashing a typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108035549 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala --- @@ -113,12 +113,12 @@ object Window { * Creates a [[WindowSpec]] with the frame boundaries defined, * from `start` (inclusive) to `end` (inclusive). * - * Both `start` and `end` are relative positions from the current row. For example, "0" means + * Both `start` and `end` are relative positions to the current row. For example, "0" means * "current row", while "-1" means the row before the current row, and "5" means the fifth row * after the current row. * - * We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`, - * and `Window.currentRow` to specify special boundary values, rather than using integral + * We recommend users to use [[Window.unboundedPreceding]], [[Window.unboundedFollowing]], --- End diff -- Leaving 'that' aside is incorrect -- see http://dictionary.cambridge.org/dictionary/english/recommend where *to* is even highlighted to make the point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108035498 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala --- @@ -131,9 +131,9 @@ object Window { * import org.apache.spark.sql.expressions.Window * val df = Seq((1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")) * .toDF("id", "category") - * df.withColumn("sum", - * sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1)) - * .show() + * val byCategoryOrderedById = + * Window.partitionBy('category).orderBy('id).rowsBetween(Window.currentRow, 1) --- End diff -- See the above change where Spark devs "recommend that users use" the values by their aliases not their numeric values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108035475 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala --- @@ -113,12 +113,12 @@ object Window { * Creates a [[WindowSpec]] with the frame boundaries defined, * from `start` (inclusive) to `end` (inclusive). * - * Both `start` and `end` are relative positions from the current row. For example, "0" means + * Both `start` and `end` are relative positions to the current row. For example, "0" means --- End diff -- I'll fix it to be more accurate (that's the purpose of this particular change so the more accurate the merrier). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/17417#discussion_r108035464 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala --- @@ -22,7 +22,7 @@ import org.apache.spark.sql.Column import org.apache.spark.sql.catalyst.expressions._ /** - * Utility functions for defining window in DataFrames. + * Utility functions for defining window in Datasets. --- End diff -- Sure. I'm going to revert the changes. There's a little value in them. I'd rather see the changes approved in general than fight for DataFrame vs Dataset. p.s. There's no `DataFrame` in Spark SQL which is just a type alias of `Dataset[Row]` -- see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L46. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17417 [SQL][DOC] Use recommended values for row boundaries in Window's scal⦠â¦adoc ## What changes were proposed in this pull request? Use recommended values for row boundaries in Window's scaladoc, i.e. `Window.unboundedPreceding`, `Window.unboundedFollowing`, and `Window.currentRow` (that were introduced in 2.1.0). ## How was this patch tested? Local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark window-expression-scaladoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17417 commit efc420bfc0cb19bed5adec919587cc20182293fe Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-03-24T21:43:23Z [SQL][DOC] Use recommended values for row boundaries in Window's scaladoc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17409: [SQL][MINOR] Fix for typo in Analyzer
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17409 [SQL][MINOR] Fix for typo in Analyzer ## What changes were proposed in this pull request? Fix for typo in Analyzer ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark analyzer-typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17409.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17409 commit cb28134c911dcaebeb8e2b168fa133f6bbb8ae3f Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-03-24T13:57:08Z [SQL][MINOR] Fix for typo in Analyzer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17337: [SQL][MINOR] Fix scaladoc for UDFRegistration
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17337 [SQL][MINOR] Fix scaladoc for UDFRegistration ## What changes were proposed in this pull request? Fix scaladoc for UDFRegistration ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark udfregistration-scaladoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17337.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17337 commit 4480e72cb789212c2cc86cea50c61c20d3faa2d1 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-03-18T01:15:50Z [SQL][MINOR] Fix scaladoc for UDFRegistration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r104206250 --- Diff: kubernetes/pom.xml --- @@ -0,0 +1,54 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.1.0-SNAPSHOT +../pom.xml + + + spark-kubernetes_2.11 --- End diff -- `${scala.binary.version}` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r104206163 --- Diff: kubernetes/pom.xml --- @@ -0,0 +1,54 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.1.0-SNAPSHOT --- End diff -- `2.1.0-SNAPSHOT`??? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r104205883 --- Diff: kubernetes/README.md --- @@ -0,0 +1,21 @@ +# Pre-requisites +* maven, JDK and all other pre-requisites for building Spark. + +# Steps to compile + +* Clone the fork of spark: https://github.com/foxish/spark/ and switch to the k8s-support branch. +* Build the project +* ./build/mvn -Pkubernetes -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests package --- End diff -- I think `hadoop-2.4` is gone in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17042: [CORE][MINOR] Fix scaladoc
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/17042 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17042: [CORE][MINOR] Fix scaladoc
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/17042 Makes sense. Thanks @srowen! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17042: [CORE][MINOR] Fix scaladoc
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/17042 [CORE][MINOR] Fix scaladoc ## What changes were proposed in this pull request? Minor change to scaladoc of `HeartbeatReceiver` (the method is certainly not for tests only) ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark heartbeatreceiver-remove-only-for-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17042.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17042 commit b34a8295c1559f77237af66069135b90139c460e Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-02-23T21:09:01Z [CORE][MINOR] Fix scaladoc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16812#discussion_r99935530 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -139,6 +140,20 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { assert(result.schema === expectedSchema) } + test("test inferring booleans with custom values") { +val result = spark.read + .format("csv") + .option("header", "true") --- End diff -- Could you use `true` (boolean literal) instead to promote the more type-safe approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16812#discussion_r99935349 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala --- @@ -73,6 +73,12 @@ class CSVInferSchemaSuite extends SparkFunSuite { assert(CSVInferSchema.inferField(LongType, "2015-08 14:49:00", options) == StringType) } + test("Boolean field types are inferred correctly from custom value") { +val options = new CSVOptions(Map("trueValue" -> "yes")) +assert(CSVInferSchema.inferField(BooleanType, "yES", options) == BooleanType) --- End diff -- What's the reason for `==` not `===` as it's in the other asserts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16812#discussion_r99934599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -110,7 +110,11 @@ private[csv] class UnivocityParser( } case _: BooleanType => (d: String) => - nullSafeDatum(d, name, nullable, options)(_.toBoolean) + nullSafeDatum(d, name, nullable, options) { datum => +if (datum.equalsIgnoreCase(options.trueValue)) true --- End diff -- Is there any reason why you compare `datum` to `options.trueValue` and not vice versa as it's in `CSVInferSchema.scala` above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16550: [SPARK-19178][SQL] convert string of large number...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16550#discussion_r95871812 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -835,6 +835,187 @@ public UTF8String translate(Map<Character, Character> dict) { return fromString(sb.toString()); } + private int getDigit(byte b) { +if (b >= '0' && b <= '9') { + return b - '0'; +} +throw new NumberFormatException(toString()); + } + + /** + * Parses this UTF8String to long. + * + * Note that, in this method we accumulate the result in negative format, and convert it to + * positive format at the end, if this string is not started with '-'. This is because min value + * is bigger than max value in digits, e.g. Integer.MAX_VALUE is '2147483647' and + * Integer.MIN_VALUE is '-2147483648'. + * + * These codes are mostly copied from LazyLong.parseLong in Hive. + */ + public long toLong() { +if (numBytes == 0) { + throw new NumberFormatException("Empty string!"); --- End diff -- Could you remove `!` from the exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16481#discussion_r95065596 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -494,8 +500,13 @@ case class DataSource( catalogTable = catalogTable, fileIndex = fileIndex) sparkSession.sessionState.executePlan(plan).toRdd -// Replace the schema with that of the DataFrame we just wrote out to avoid re-inferring it. -copy(userSpecifiedSchema = Some(data.schema.asNullable)).resolveRelation() +if (isForWriteOnly) { + // Exit earlier and return null --- End diff -- I'd remove "and return null" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHand...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16492#discussion_r95065464 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -235,7 +235,10 @@ trait StreamTest extends QueryTest with SharedSQLContext with Timeouts { */ def testStream( _stream: Dataset[_], - outputMode: OutputMode = OutputMode.Append)(actions: StreamAction*): Unit = { + outputMode: OutputMode = OutputMode.Append)(actions: StreamAction*): Unit = synchronized { +// `synchronized` is added to prevent the user from calling `testStream` concurrently because --- End diff -- Can you elaborate why `StreamingQueryListener` can be used concurrently? I'm reading the comment: "it does not work because something else does not work". Why? What's wrong with `StreamingQueryListener`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHand...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16492#discussion_r95065415 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -238,7 +238,7 @@ class StreamSuite extends StreamTest { } } - testQuietly("fatal errors from a source should be sent to the user") { + testQuietly("handle fatal errors thrown from the stream thread correctly") { --- End diff -- `correctly` is superfluous and could be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16475 Closed as per @rxin's request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16475: [MINOR][CORE] Remove code duplication (so the int...
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/16475 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16475 Partially agree @srowen. The reason for the change was `blockId.isShuffle` condition that both methods use to do their shuffle-specific handling. The change might not be the most correct one but I'm sure something's wrong here, but can't nail it and hence the change to review this part of shuffle system. > "is a distant problem waiting to happen" I was concerned about that too, but that's what the tests are supposed to cover. If there are few missing, they should be added, shouldn't they? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16475 Proposed the changes since it made easier to understand the role of `getBlockData` vs `getLocalBytes` and in the end `ShuffleBlockResolver`. I'm not saying it should be accepted, but I'd like to hear your take on why it should not. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16475: [MINOR][CORE] Remove code duplication (so the int...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/16475 [MINOR][CORE] Remove code duplication (so the interface is used instead) ## What changes were proposed in this pull request? Removed code duplication and used the interface instead. /cc @JoshRosen @rxin ## How was this patch tested? Local build. Waiting for OK from Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark blockmanager-code-deduplication Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16475.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16475 commit 0c181ef686243c8c4c8138160fd8bad6fbc37953 Author: Jacek Laskowski <ja...@japila.pl> Date: 2017-01-05T09:26:51Z [MINOR][CORE] Remove code duplication (so the interface is used) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [WIP][SPARK-18896][TESTS] Update to ScalaTest 3.0.1
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 For reference: [scala-xml releases](https://github.com/scala/scala-xml/releases) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 The reason for the update to `scala-xml_2.11-1.0.5.jar` was that once I updated ScalaTest I got the issue from Jenkins that the dependency list changed. That's when I was told about `./dev/test-dependencies.sh --replace-manifest` (see https://github.com/apache/spark/pull/16309#issuecomment-267585832 and https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70249/console). You may be right that updating the jar could have impacted the XML issue Jenkins's reporting. Let me see if I can _somehow_ relate them by: 1. Reviewing the changes between `scala-xml_2.11-1.0.2.jar` and `scala-xml_2.11-1.0.5.jar` 2. Looking into the XML not generated issue in the recent builds. Thanks for the hint! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 The tests ran locally on my laptop have finished after...`7431 s` which is 2 hours (!) ``` [error] (sql/test:test) sbt.TestsFailedException: Tests unsuccessful [error] (streaming-kafka-0-10/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 7431 s, completed Dec 17, 2016 2:56:59 PM [error] Not a valid command: Pkinesis-asl (similar: itunes-pause) [error] Not a valid key: Pkinesis-asl [error] Pkinesis-asl [error] ^ ``` Not sure why I'm getting `Not a valid key: Pkinesis-asl` -- it looks like the profile might not have been picked up correctly (?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 @srowen Please help as I'm stuck with the `OutOfMemoryError: GC overhead limit exceeded` error. Should Jenkins run the tests with 6g? What's even more interesting is that the tests seem to have all passed according to the report in [Test build #70306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70306/testReport/). BTW, I'm running the tests on my laptop with the following settings `export SBT_OPTS="-Xmx6g -XX:ReservedCodeCacheSize=512m"` (as recommended). ``` Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at scala.collection.TraversableLike$class.withFilter(TraversableLike.scala:655) at scala.collection.AbstractTraversable.withFilter(Traversable.scala:104) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder$scoping$2$.getMerged(GenASM.scala:2067) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genLocalVariableTable$1(GenASM.scala:2112) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genCode(GenASM.scala:2753) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genMethod(GenASM.scala:1471) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Rebasing with master to trigger tests on Jenkins...(hoping this time they pass) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 I'm testing the changes locally with the following: ``` export SCALACTIC_FILL_FILE_PATHNAMES=yes export SBT_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" sbt -Phadoop-2.3 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly sbt -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test ``` It's been running for the past one hour and looks fine so far. I faced some SQL UDF-related errors that didn't look like mine so had to re-test everything again. I however got no changes to push so...how can I re-trigger tests on Jenkins to see if it's something temporary? Is there some magic Jenkins trigger? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Hey @srowen, any idea about the following error? I'd appreciate any hints to help me fix it. ``` [info] - can use a custom recovery mode factory (57 milliseconds) Exception in thread "Thread-89" [info] - master/worker web ui available (225 milliseconds) [info] - master/worker web ui available with reverseProxy (336 milliseconds) java.lang.OutOfMemoryError: GC overhead limit exceeded [info] - basic scheduling - spread out (36 milliseconds) [info] - basic scheduling - no spread out (34 milliseconds) at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3054) [info] - basic scheduling with more memory - spread out (33 milliseconds) at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2874) at java.io.ObjectInputStream.readString(ObjectInputStream.java:1639) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:809) at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:798) at java.lang.Thread.run(Thread.java:745) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Think the latest test failures are somehow related to: ``` Please set the environment variable SCALACTIC_FILL_FILE_PATHNAMES to yes at compile time to enable this feature. ``` Exploring... BTW, I'm using `sbt -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl test` locally and the tests are still being executed (for the past 40 mins or so) and green. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Learnt about `./dev/test-dependencies.sh --replace-manifest` just now. (Where's this all described?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Just learnt about `export SPARK_TESTING=1` to avoid some test failures. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Had to introduce changes to the tests given "Expired deprecations" in [ScalaTest 3.0.0](http://www.scalatest.org/release_notes/3.0.0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org