[GitHub] spark issue #16352: [SPARK-18947][SQL] SQLContext.tableNames should not call...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16352 LGTM again pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16352: [SPARK-18947][SQL] SQLContext.tableNames should n...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16352#discussion_r93390578 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -276,11 +276,12 @@ private[sql] object SQLUtils extends Logging { } def getTableNames(sparkSession: SparkSession, databaseName: String): Array[String] = { -databaseName match { - case n: String if n != null && n.trim.nonEmpty => -sparkSession.catalog.listTables(n).collect().map(_.name) +val db = databaseName match { + case _ if databaseName != null && databaseName.trim.nonEmpty => +databaseName.trim --- End diff -- : ) Yeah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70460/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16369 **[Test build #70460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70460/testReport)** for PR 16369 at commit [`0e96618`](https://github.com/apache/spark/commit/0e96618a9e6530cf6e43204dc7f80965bc759cae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16356: [SPARK-18949] [SQL] Add recoverPartitions API to Catalog
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16356 Sure, let me do it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16352: [SPARK-18947][SQL] SQLContext.tableNames should not call...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16352 **[Test build #70466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70466/testReport)** for PR 16352 at commit [`95d6f89`](https://github.com/apache/spark/commit/95d6f89623fd29458b6363a40d2bfcdf7af6902d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15996 **[Test build #70465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70465/testReport)** for PR 15996 at commit [`59f06ce`](https://github.com/apache/spark/commit/59f06ce86338b11b74164de632cee518bf513697). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16352: [SPARK-18947][SQL] SQLContext.tableNames should n...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16352#discussion_r93388755 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -276,11 +276,12 @@ private[sql] object SQLUtils extends Logging { } def getTableNames(sparkSession: SparkSession, databaseName: String): Array[String] = { -databaseName match { - case n: String if n != null && n.trim.nonEmpty => -sparkSession.catalog.listTables(n).collect().map(_.name) +val db = databaseName match { + case _ if databaseName != null && databaseName.trim.nonEmpty => +databaseName.trim --- End diff -- ok let me keep the previous behavior, although it's weird(check `...trim.nonEmpty` but not use the trimmed database name) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r93388333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala --- @@ -33,10 +33,9 @@ import org.apache.spark.util.collection.OpenHashMap * The Percentile aggregate function returns the exact percentile(s) of numeric column `expr` at * the given percentage(s) with value range in [0.0, 1.0]. * - * The operator is bound to the slower sort based aggregation path because the number of elements --- End diff -- `TypedImperativeAggregate` is't bound to sort based aggregation. `ObjectHashAggregateExec` supports `TypedImperativeAggregate` in hash based aggregation now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70464/testReport)** for PR 16371 at commit [`68d1e98`](https://github.com/apache/spark/commit/68d1e98a049da996feb202660e6c6b15f94183b7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16356: [SPARK-18949] [SQL] Add recoverPartitions API to ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16356 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16356: [SPARK-18949] [SQL] Add recoverPartitions API to Catalog
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16356 Can you send a pr for branch-2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16356: [SPARK-18949] [SQL] Add recoverPartitions API to Catalog
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16356 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93387942 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,89 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") +val array = ctx.freshName("array") -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val isPrimitiveArray = ctx.isPrimitiveType(et) +val primitiveTypeName = if (isPrimitiveArray) ctx.primitiveTypeName(et) else "" +val (preprocess, arrayData, arrayWriter) = + GenArrayData.getCodeArrayData(ctx, et, children.size, isPrimitiveArray, array) + +ev.copy(code = --- End diff -- can you refactor it a little bit? The logic gets more complicated and it's hard to read when you put it in `ev.copy(...)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r93387828 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -364,48 +366,157 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { throw new AnalysisException("Cannot create hive serde table with saveAsTable API") } -val tableExists = df.sparkSession.sessionState.catalog.tableExists(tableIdent) - -(tableExists, mode) match { - case (true, SaveMode.Ignore) => -// Do nothing - - case (true, SaveMode.ErrorIfExists) => -throw new AnalysisException(s"Table $tableIdent already exists.") - - case _ => -val existingTable = if (tableExists) { - Some(df.sparkSession.sessionState.catalog.getTableMetadata(tableIdent)) -} else { - None -} -val storage = if (tableExists) { - existingTable.get.storage -} else { - DataSource.buildStorageFormatFromOptions(extraOptions.toMap) -} -val tableType = if (tableExists) { - existingTable.get.tableType -} else if (storage.locationUri.isDefined) { - CatalogTableType.EXTERNAL -} else { - CatalogTableType.MANAGED +val catalog = df.sparkSession.sessionState.catalog +val db = tableIdent.database.getOrElse(catalog.getCurrentDatabase) +val tableIdentWithDB = tableIdent.copy(database = Some(db)) +val tableName = tableIdentWithDB.unquotedString + +catalog.getTableMetadataOption(tableIdentWithDB) match { + // If the table already exists... + case Some(existingTable) => +mode match { + case SaveMode.Ignore => // Do nothing + + case SaveMode.ErrorIfExists => +throw new AnalysisException(s"Table $tableName already exists. You can set SaveMode " + + "to SaveMode.Append to insert data into the table or set SaveMode to " + + "SaveMode.Overwrite to overwrite the existing data.") + + case SaveMode.Append => +if (existingTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException("Saving data into a view is not allowed.") +} + +if (existingTable.provider.get == DDLUtils.HIVE_PROVIDER) { + throw new AnalysisException(s"Saving data in the Hive serde table $tableName is " + +"not supported yet. Please use the insertInto() API as an alternative.") +} + +// Check if the specified data source match the data source of the existing table. +val existingProvider = DataSource.lookupDataSource(existingTable.provider.get) +val specifiedProvider = DataSource.lookupDataSource(source) +// TODO: Check that options from the resolved relation match the relation that we are +// inserting into (i.e. using the same compression). +if (existingProvider != specifiedProvider) { + throw new AnalysisException(s"The format of the existing table $tableName is " + +s"`${existingProvider.getSimpleName}`. It doesn't match the specified format " + +s"`${specifiedProvider.getSimpleName}`.") +} + +if (df.schema.length != existingTable.schema.length) { + throw new AnalysisException( +s"The column number of the existing table $tableName" + + s"(${existingTable.schema.catalogString}) doesn't match the data schema" + + s"(${df.schema.catalogString})") +} + +val resolver = df.sparkSession.sessionState.conf.resolver +val tableCols = existingTable.schema.map(_.name) + +// As we are inserting into an existing table, we should respect the existing schema and +// adjust the column order of the given dataframe according to it, or throw exception +// if the column names do not match. +val adjustedColumns = tableCols.map { col => + df.queryExecution.analyzed.resolve(Seq(col), resolver).getOrElse { +val inputColumns = df.schema.map(_.name).mkString(", ") +throw new AnalysisException( + s"cannot resolve '$col' given input columns: [$inputColumns]") + } +} + +// Check if the specified partition columns match the existing table. +val specifiedPartCols = CatalogUtils.normalizePartCols( + tableName, tableCols,
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13909 sorry for the delay. Yea looks like we can't reuse the byte array of unsafe data in expressions, which may get cached expectedly and leads to wrong result. I'm a little concerned about the hacks in `BufferHolder` and the array writer. The code is so coupled with unsafe row writer and we have to hack it so that we can write unsafe array directly. What if we actually write an unsafe row with a single array field and return the array column? Then we don't need the hacks, but waste some bits for the row format overhead, which seems acceptable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16351: [SPARK-18943][SQL] Avoid per-record type dispatch in CSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16351 **[Test build #70463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70463/testReport)** for PR 16351 at commit [`192bc6e`](https://github.com/apache/spark/commit/192bc6e59c3b9e7f5c782d3c9059e67d0e4550ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16351: [SPARK-18943][SQL] Avoid per-record type dispatch in CSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16351 **[Test build #70462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70462/testReport)** for PR 16351 at commit [`c75bd05`](https://github.com/apache/spark/commit/c75bd050925dac6efc3276f4aafef00135778f88). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/16337 Nat will run against DB2 and provide result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16351: [SPARK-18943][SQL] Avoid per-record type dispatch in CSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16351 **[Test build #70461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70461/testReport)** for PR 16351 at commit [`3289726`](https://github.com/apache/spark/commit/3289726ffbf4ffbbda36d935f37a0dfcc946b20e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16369 **[Test build #70460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70460/testReport)** for PR 16369 at commit [`0e96618`](https://github.com/apache/spark/commit/0e96618a9e6530cf6e43204dc7f80965bc759cae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16337 What are the refernce query results we can compare? For example, from DB2 or Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16356: [SPARK-18949] [SQL] Add recoverPartitions API to Catalog
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16356 **[Test build #3511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3511/testReport)** for PR 16356 at commit [`451ab05`](https://github.com/apache/spark/commit/451ab0598d59bb5df9a222df931b1be127c3082a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70459/testReport)** for PR 16371 at commit [`6e8aa82`](https://github.com/apache/spark/commit/6e8aa82d95e52c6c469aaa4a8e1cfc0105576e69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16371 [SPARK-18932][SQL] Support partial aggregation for collect_set/collect_list ## What changes were proposed in this pull request? Currently collect_set/collect_list aggregation expression don't support partial aggregation. This patch is to enable partial aggregation for them. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 collect-partial-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16371.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16371 commit 6e8aa82d95e52c6c469aaa4a8e1cfc0105576e69 Author: Liang-Chi HsiehDate: 2016-12-21T06:53:39Z Support partial mode for collect. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12775 **[Test build #70458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70458/testReport)** for PR 12775 at commit [`9778cef`](https://github.com/apache/spark/commit/9778cefce3e152d559e53cd4e2f5a113e561f0ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16351: [SPARK-18943][SQL] Avoid per-record type dispatch in CSV...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16351 Thank you @cloud-fan, will add some in the PR description soon after cleaning up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16352: [SPARK-18947][SQL] SQLContext.tableNames should n...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16352#discussion_r93385438 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -276,11 +276,12 @@ private[sql] object SQLUtils extends Logging { } def getTableNames(sparkSession: SparkSession, databaseName: String): Array[String] = { -databaseName match { - case n: String if n != null && n.trim.nonEmpty => -sparkSession.catalog.listTables(n).collect().map(_.name) +val db = databaseName match { + case _ if databaseName != null && databaseName.trim.nonEmpty => +databaseName.trim --- End diff -- uh... not sure whether we should support triming. So far, when we do something like ```Scala session.tableNames("default ") ``` It reports the error: ``` Database 'default ' not found; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12775 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/16337 Hello All: I have divided the test case to small groups based the discussion, and this pr will be the first pr for the IN subquery, it covers the simple and group-by cases. This the run time from running on my local macbook. $ build/sbt "~sql/test-only *SQLQueryTestSuite -- -z in-group-by.sql" [info] Run completed in 23 seconds, 876 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. $ build/sbt "~sql/test-only *SQLQueryTestSuite -- -z simple-in.sql" [info] Run completed in 9 seconds, 986 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16367: [SPARK-18903][SPARKR] Add API to get SparkUI URL
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16367#discussion_r93384251 --- Diff: R/pkg/R/sparkR.R --- @@ -410,6 +410,30 @@ sparkR.session <- function( sparkSession } +#' Get the URL of the SparkUI instance for the current active SparkSession +#' +#' Get the URL of the SparkUI instance for the current active SparkSession. --- End diff -- actually, no, the first is the title, then 2nd (after the empty line) is the description. we have that in some of our doc where there isn't really any more to say ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16337 **[Test build #70457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70457/testReport)** for PR 16337 at commit [`9c584fb`](https://github.com/apache/spark/commit/9c584fb2c1bd99cdf4c0f5a222bc7aec4b003227). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16355 @imatiach-msft Can you add a test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16355 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16370 **[Test build #70456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70456/testReport)** for PR 16370 at commit [`1d248c3`](https://github.com/apache/spark/commit/1d248c30bb6872494b82fe16a584b9b801058c58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16367: [SPARK-18903][SPARKR] Add API to get SparkUI URL
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16367#discussion_r93381426 --- Diff: R/pkg/R/sparkR.R --- @@ -410,6 +410,30 @@ sparkR.session <- function( sparkSession } +#' Get the URL of the SparkUI instance for the current active SparkSession +#' +#' Get the URL of the SparkUI instance for the current active SparkSession. --- End diff -- Duplicate line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16370 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16370: [SPARK-18960][SQL][SS] Avoid double reading file ...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16370 [SPARK-18960][SQL][SS] Avoid double reading file which is being copied. ## What changes were proposed in this pull request? In HDFS, when we copy a file into target directory, there will a temporary `._COPY_` file for a period of time. The duration depends on file size. If we do not skip this file, we will may read the same data for two times. ## How was this patch tested? update unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-18960 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16370 commit 1d248c30bb6872494b82fe16a584b9b801058c58 Author: uncleGenDate: 2016-12-21T03:36:04Z cp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16351: [SPARK-18943][SQL] Avoid per-record type dispatch in CSV...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16351 mostly LGTM, do you have some performance numbers about this optimization? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16351#discussion_r93381128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -215,84 +215,133 @@ private[csv] object CSVInferSchema { } private[csv] object CSVTypeCast { + // A `ValueConverter` is responsible for converting the given value to a desired type. + private type ValueConverter = String => Any /** - * Casts given string datum to specified type. - * Currently we do not support complex types (ArrayType, MapType, StructType). + * Create converters which cast each given string datum to each specified type in given schema. + * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`). * - * For string types, this is simply the datum. For other types. + * For string types, this is simply the datum. + * For other types, this is converted into the value according to the type. * For other nullable types, returns null if it is null or equals to the value specified * in `nullValue` option. * - * @param datum string value - * @param name field name in schema. - * @param castType data type to cast `datum` into. - * @param nullable nullability for the field. + * @param schema schema that contains data types to cast the given value into. * @param options CSV options. */ - def castTo( + def makeConverters( + schema: StructType, + options: CSVOptions = CSVOptions()): Array[ValueConverter] = { +schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray + } + + /** + * Create a converter which converts the string value to a value according to a desired type. + */ + def makeConverter( + name: String, + dataType: DataType, + nullable: Boolean = true, + options: CSVOptions = CSVOptions()): ValueConverter = dataType match { +case _: ByteType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => --- End diff -- nit: nullSafeDatum(d, name, nullable, options)(_.toByte) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16351#discussion_r93381049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -215,84 +215,133 @@ private[csv] object CSVInferSchema { } private[csv] object CSVTypeCast { + // A `ValueConverter` is responsible for converting the given value to a desired type. + private type ValueConverter = String => Any /** - * Casts given string datum to specified type. - * Currently we do not support complex types (ArrayType, MapType, StructType). + * Create converters which cast each given string datum to each specified type in given schema. + * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`). * - * For string types, this is simply the datum. For other types. + * For string types, this is simply the datum. + * For other types, this is converted into the value according to the type. * For other nullable types, returns null if it is null or equals to the value specified * in `nullValue` option. * - * @param datum string value - * @param name field name in schema. - * @param castType data type to cast `datum` into. - * @param nullable nullability for the field. + * @param schema schema that contains data types to cast the given value into. * @param options CSV options. */ - def castTo( + def makeConverters( + schema: StructType, + options: CSVOptions = CSVOptions()): Array[ValueConverter] = { +schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray + } + + /** + * Create a converter which converts the string value to a value according to a desired type. + */ + def makeConverter( + name: String, + dataType: DataType, + nullable: Boolean = true, + options: CSVOptions = CSVOptions()): ValueConverter = dataType match { +case _: ByteType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +datum.toByte + } + +case _: ShortType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +datum.toShort + } + +case _: IntegerType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +datum.toInt + } + +case _: LongType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +datum.toLong + } + +case _: FloatType => (d: String) => + nullSafeDatum(d, name, nullable, options) { +case options.nanValue => Float.NaN +case options.negativeInf => Float.NegativeInfinity +case options.positiveInf => Float.PositiveInfinity +case datum => + Try(datum.toFloat) + .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue()) + } + +case _: DoubleType => (d: String) => + nullSafeDatum(d, name, nullable, options) { +case options.nanValue => Double.NaN +case options.negativeInf => Double.NegativeInfinity +case options.positiveInf => Double.PositiveInfinity +case datum => + Try(datum.toDouble) + .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue()) + } + +case _: BooleanType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +datum.toBoolean + } + +case dt: DecimalType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +val value = new BigDecimal(datum.replaceAll(",", "")) +Decimal(value, dt.precision, dt.scale) + } + +case _: TimestampType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +// This one will lose microseconds parts. +// See https://issues.apache.org/jira/browse/SPARK-10681. +Try(options.timestampFormat.parse(datum).getTime * 1000L) + .getOrElse { + // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards + // compatibility. + DateTimeUtils.stringToTime(datum).getTime * 1000L +} + } + +case _: DateType => (d: String) => + nullSafeDatum(d, name, nullable, options) { case datum => +// This one will lose microseconds parts. +// See https://issues.apache.org/jira/browse/SPARK-10681.x + Try(DateTimeUtils.millisToDays(options.dateFormat.parse(datum).getTime)) +
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 How about we assign priority to implicit rules like http://stackoverflow.com/questions/1886953/is-there-a-way-to-control-which-implicit-conversion-will-be-the-default-used ? I think we should prefer `Seq` encoder over `Product` encoder, for `Seq with Product` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 I don't think the failure is related, and it can't be reproduced locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15212 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70454/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15212 **[Test build #70454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70454/testReport)** for PR 15212 at commit [`83a429e`](https://github.com/apache/spark/commit/83a429e9907aac389d45aa1b6a23f432216e0382). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16323 SGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15996 **[Test build #70455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70455/testReport)** for PR 15996 at commit [`7481150`](https://github.com/apache/spark/commit/748115047175420c842b6743ab33489882f18104). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70444/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70444/testReport)** for PR 16282 at commit [`9080acd`](https://github.com/apache/spark/commit/9080acd43f7568ba1b084ee892144a92f4cfa376). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16350 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70452/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16350 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16350 **[Test build #70452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70452/consoleFull)** for PR 16350 at commit [`8dd0169`](https://github.com/apache/spark/commit/8dd01693c5fca8a724fe0e9f1ada0f7bdaf1f5f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16360: [SPARK-18234][SS] Made update mode public
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16360 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70450/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16360: [SPARK-18234][SS] Made update mode public
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16360 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16360: [SPARK-18234][SS] Made update mode public
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16360 **[Test build #70450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70450/testReport)** for PR 16360 at commit [`628c6c2`](https://github.com/apache/spark/commit/628c6c2e801b8cce6a47608c41068a0a085698ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16369 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70453/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16369 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16369 **[Test build #70453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70453/testReport)** for PR 16369 at commit [`651ce53`](https://github.com/apache/spark/commit/651ce532423a728ec2a995c9f4149b71d2c0203c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16323 Since adding a switch for cbo is not a trivial one, I want to do it in a separate pr, and let this one only deal with decoupling Statistics from CatalogTable. Do you agree? @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16323#discussion_r93376249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,6 +200,10 @@ case class CatalogTable( locationUri, inputFormat, outputFormat, serde, compressed, properties)) } + def withStats(cboStatsEnabled: Boolean): CatalogTable = { --- End diff -- Thanks. I think the first one is better, the second one will lead to many if-else on caller sides. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15212 **[Test build #70454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70454/testReport)** for PR 15212 at commit [`83a429e`](https://github.com/apache/spark/commit/83a429e9907aac389d45aa1b6a23f432216e0382). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Fix event time watermark delay thresho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16304 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Fix event time watermark delay thresho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16304 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70449/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Fix event time watermark delay thresho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16304 **[Test build #70449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70449/testReport)** for PR 16304 at commit [`29f0037`](https://github.com/apache/spark/commit/29f0037631399bf2226ff3c2e630e4927e177eb4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16356: [SPARK-18949] [SQL] Add recoverPartitions API to Catalog
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16356 **[Test build #3511 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3511/testReport)** for PR 16356 at commit [`451ab05`](https://github.com/apache/spark/commit/451ab0598d59bb5df9a222df931b1be127c3082a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16296 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16296 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16296 **[Test build #70447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70447/testReport)** for PR 16296 at commit [`7b5f226`](https://github.com/apache/spark/commit/7b5f226b94f3e6b830f3d92b33599dd3f57f8dbd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DetermineHiveSerde(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkSession ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16369 **[Test build #70453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70453/testReport)** for PR 16369 at commit [`651ce53`](https://github.com/apache/spark/commit/651ce532423a728ec2a995c9f4149b71d2c0203c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16369: [SPARK-18956][SQL][PySpark] Reuse existing SparkS...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16369 [SPARK-18956][SQL][PySpark] Reuse existing SparkSession while creating new SQLContext instances ## What changes were proposed in this pull request? To reuse existing SparkSession while creating new SQLContext instances in PySpark. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 reuse-sparksession-pyspark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16369 commit 651ce532423a728ec2a995c9f4149b71d2c0203c Author: Liang-Chi HsiehDate: 2016-12-21T04:40:53Z Reuse existing SparkSession while creating new SQLContext instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16359: [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16359: [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70445/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16359: [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16359 **[Test build #70445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70445/testReport)** for PR 16359 at commit [`c502aeb`](https://github.com/apache/spark/commit/c502aeb123641f634c107c0ad8c0f1986fea8ee1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16352: [SPARK-18947][SQL] SQLContext.tableNames should not call...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16352 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70446/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16352: [SPARK-18947][SQL] SQLContext.tableNames should not call...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16352 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16352: [SPARK-18947][SQL] SQLContext.tableNames should not call...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16352 **[Test build #70446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70446/testReport)** for PR 16352 at commit [`1f69b38`](https://github.com/apache/spark/commit/1f69b381f0a916e98200cad596dfea534ec08ffc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16366: [SPARK-18953][CORE][WEB UI] Do now show the link to a de...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16366 We also show worker info in `driverRow`. Although it doesn't show worker state, I am wondering if we can also check worker state, and disable the link and add a suffix like `(DEAD)` if a worker is dead? https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala#L249 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/16350 Delete the UT and metrics done. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16350 **[Test build #70452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70452/consoleFull)** for PR 16350 at commit [`8dd0169`](https://github.com/apache/spark/commit/8dd01693c5fca8a724fe0e9f1ada0f7bdaf1f5f6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16366: [SPARK-18953][CORE][WEB UI] Do now show the link to a de...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16366 One question: should we do the same thing for `WorkerState.DECOMMISSIONED` and `WorkerState.UNKNOWN` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70439/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [SPARK-18588][SS][Kafka]Create a new KafkaConsumer when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70439/testReport)** for PR 16282 at commit [`7c789e8`](https://github.com/apache/spark/commit/7c789e80255fbcb400eb2f62c959fed7ccb93455). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15018 @neggert I am fine for throwing an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16368 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70451/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16368 **[Test build #70451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70451/testReport)** for PR 16368 at commit [`e2031ea`](https://github.com/apache/spark/commit/e2031ea36cc46c46fd3c0a20d8708eb313c78a28). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16323#discussion_r93370436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,6 +200,10 @@ case class CatalogTable( locationUri, inputFormat, outputFormat, serde, compressed, properties)) } + def withStats(cboStatsEnabled: Boolean): CatalogTable = { --- End diff -- I can think of two approaches: 1. We can keep the current naive version of `statistics` and add new `statistics` function which takes conf. A default implementation of the new `statistics` function simply returns the naive version of `statistics`. In `Join` or `Aggregate`, we can include more complex logic in the new `statistics` to return naive calculation or something estimation. The caller always calls new `statistics` function and passes in current conf. 2. Add new `statisticsCBO` which doesn't take conf because it is called only cbo is enabled. So the caller decides to call non-cbo version `statistics` or cbo version `statisticsCBO`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16366: [SPARK-18953][CORE][WEB UI] Do now show the link to a de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16366 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16366: [SPARK-18953][CORE][WEB UI] Do now show the link to a de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70442/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16366: [SPARK-18953][CORE][WEB UI] Do now show the link to a de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16366 **[Test build #70442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70442/testReport)** for PR 16366 at commit [`4e5d5f2`](https://github.com/apache/spark/commit/4e5d5f2ae4b13ec172c0ab81f15d86af8149596d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16350: [SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for eac...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16350 yeah, i don't think we need the unit test for 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16362: [SPARK-18954][Tests]Fix flaky test: o.a.s.streaming.Basi...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16362 LGTM. Did you run it many times? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST] InputStreamsSuite.socket input stream
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16343 Why didnt `eventually` and assert on the size of the collected data work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16314: [SPARK-18900][FLAKY-TEST] StateStoreSuite.mainten...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16314 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16314: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16314 Merging this master and 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16368 **[Test build #70451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70451/testReport)** for PR 16368 at commit [`e2031ea`](https://github.com/apache/spark/commit/e2031ea36cc46c46fd3c0a20d8708eb313c78a28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16368: [SPARK-18958][SPARKR] R API toJSON on DataFrame
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/16368 [SPARK-18958][SPARKR] R API toJSON on DataFrame ## What changes were proposed in this pull request? It would make it easier to integrate with other component expecting JSON format. ## How was this patch tested? manual, unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rJSON Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16368.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16368 commit 886efe9b962bb22eed469bb0f853ee280eb06a45 Author: Felix CheungDate: 2016-12-21T01:31:48Z add toJSON DataFrame API commit e2031ea36cc46c46fd3c0a20d8708eb313c78a28 Author: Felix Cheung Date: 2016-12-21T03:08:46Z fix test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org