[GitHub] spark pull request #11796: [SPARK-13579][build][test-maven] Stop building th...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/11796#discussion_r73781398 --- Diff: assembly/pom.xml --- @@ -69,6 +68,17 @@ spark-repl_${scala.binary.version} ${project.version} + + + --- End diff -- This is a problem for the Spark Cassandra Connector. The Cassandra Java Driver requires a 16.0 or greater version of guava. This necessarily means we need to shade now. This was on our roadmap anyway just wanted you to be aware. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14502: [SPARK-16909][Spark Core] - Streaming for postgre...
Github user princejwesley commented on a diff in the pull request: https://github.com/apache/spark/pull/14502#discussion_r73781306 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -79,12 +79,17 @@ class JdbcRDD[T: ClassTag]( val conn = getConnection() val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY) -// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force streaming results, -// rather than pulling entire resultset into memory. -// see http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html -if (conn.getMetaData.getURL.matches("jdbc:mysql:.*")) { +val url = conn.getMetaData.getURL +if (url.startsWith("jdbc:mysql:")) { + // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force streaming results, + // rather than pulling entire resultset into memory. + // see https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html + stmt.setFetchSize(Integer.MIN_VALUE) - logInfo("statement fetch size set to: " + stmt.getFetchSize + " to force MySQL streaming ") + logInfo("statement fetch size set to: " + stmt.getFetchSize + " to force MySQL streaming") +} else { + stmt.setFetchSize(100) + logInfo("statement fetch size set to: " + stmt.getFetchSize + " to force streaming") --- End diff -- @srowen Addressed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/12913 @holdenk Updated the PR, ready for review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63308/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63308/consoleFull)** for PR 14461 at commit [`76e68eb`](https://github.com/apache/spark/commit/76e68eb70187f977aa569fde50422018061c8bcf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14504 **[Test build #63310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63310/consoleFull)** for PR 14504 at commit [`545c8de`](https://github.com/apache/spark/commit/545c8dec58a4273ab34d300c35302a3f3bd97c76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11746 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63307/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63305/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #63307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63307/consoleFull)** for PR 13701 at commit [`2d34803`](https://github.com/apache/spark/commit/2d3480381317bba06274e4ea899bc8d98d5cb82c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11746 **[Test build #63305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63305/consoleFull)** for PR 11746 at commit [`6d8f4f6`](https://github.com/apache/spark/commit/6d8f4f6ef7e73fab0a6955a25eee30b0df49d5a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14496: [SPARK-16772] [Python] [Docs] Fix API doc references to ...
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14496 Thanks @srowen. ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14477: [SPARK-16870][docs]Summary:add "spark.sql.broadca...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14477#discussion_r73780468 --- Diff: docs/sql-programming-guide.md --- @@ -790,6 +790,15 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession + --- End diff -- PS this does not look like the right place for this option. This section covers Parquet. There's a later "other options" section covering broadcast join. Move it there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14496: [SPARK-16772] [Python] [Docs] Fix API doc referen...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14496 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14496: [SPARK-16772] [Python] [Docs] Fix API doc references to ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14496 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14491 @ganeshchand see for example structure-streaming-programming-guide.md or structured_network_wordcount.py. These have apparently similar comments -- same story right? Also see all the potential occurrences above. Let's address all of these as applicable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73780390 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand( } +/** + * Recover Partitions in ALTER TABLE: recover all the partition in the directory of a table and + * update the catalog. + * + * The syntax of this command is: + * {{{ + * ALTER TABLE table RECOVER PARTITIONS; + * MSCK REPAIR TABLE table; + * }}} + */ +case class AlterTableRecoverPartitionsCommand( +tableName: TableIdentifier, +cmd: String = "ALTER TABLE RECOVER PARTITIONS") extends RunnableCommand { + override def run(spark: SparkSession): Seq[Row] = { +val catalog = spark.sessionState.catalog +if (!catalog.tableExists(tableName)) { + throw new AnalysisException(s"Table $tableName in $cmd does not exist.") +} +val table = catalog.getTableMetadata(tableName) +if (catalog.isTemporaryTable(tableName)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on temporary tables: $tableName") +} +if (DDLUtils.isDatasourceTable(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on datasource tables: $tableName") +} +if (table.tableType != CatalogTableType.EXTERNAL) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on external tables: $tableName") +} +if (!DDLUtils.isTablePartitioned(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on partitioned tables: $tableName") +} +if (table.storage.locationUri.isEmpty) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on table with location provided: $tableName") +} + +val root = new Path(table.storage.locationUri.get) +val fs = root.getFileSystem(spark.sparkContext.hadoopConfiguration) +// Dummy jobconf to get to the pathFilter defined in configuration +// It's very expensive to create a JobConf(ClassUtil.findContainingJar() is slow) +val jobConf = new JobConf(spark.sparkContext.hadoopConfiguration, this.getClass) +val pathFilter = FileInputFormat.getInputPathFilter(jobConf) +val partitionSpecsAndLocs = scanPartitions( + spark, fs, pathFilter, root, Map(), table.partitionColumnNames.map(_.toLowerCase)) +val parts = partitionSpecsAndLocs.map { case (spec, location) => + // inherit table storage format (possibly except for location) + CatalogTablePartition(spec, table.storage.copy(locationUri = Some(location.toUri.toString))) +} +spark.sessionState.catalog.createPartitions(tableName, + parts.toArray[CatalogTablePartition], ignoreIfExists = true) +Seq.empty[Row] + } + + @transient private lazy val evalTaskSupport = new ForkJoinTaskSupport(new ForkJoinPool(8)) + + private def scanPartitions( + spark: SparkSession, + fs: FileSystem, + filter: PathFilter, + path: Path, + spec: TablePartitionSpec, + partitionNames: Seq[String]): GenSeq[(TablePartitionSpec, Path)] = { +if (partitionNames.length == 0) { + return Seq(spec -> path) +} + +val statuses = fs.listStatus(path) +val threshold = spark.conf.get("spark.rdd.parallelListingThreshold", "10").toInt +val statusPar: GenSeq[FileStatus] = + if (partitionNames.length > 1 && statuses.length > threshold || partitionNames.length > 2) { +val parArray = statuses.par +parArray.tasksupport = evalTaskSupport +parArray + } else { +statuses + } +statusPar.flatMap { st => + val name = st.getPath.getName + if (st.isDirectory && name.contains("=")) { +val ps = name.split("=", 2) +val columnName = PartitioningUtils.unescapePathName(ps(0)).toLowerCase +val value = PartitioningUtils.unescapePathName(ps(1)) +// comparing with case-insensitive, but preserve the case +if (columnName == partitionNames(0)) { --- End diff -- A directory name like "a=" will pass this condition and get empty partition value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To
[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73780357 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand( } +/** + * Recover Partitions in ALTER TABLE: recover all the partition in the directory of a table and + * update the catalog. + * + * The syntax of this command is: + * {{{ + * ALTER TABLE table RECOVER PARTITIONS; + * MSCK REPAIR TABLE table; + * }}} + */ +case class AlterTableRecoverPartitionsCommand( +tableName: TableIdentifier, +cmd: String = "ALTER TABLE RECOVER PARTITIONS") extends RunnableCommand { + override def run(spark: SparkSession): Seq[Row] = { +val catalog = spark.sessionState.catalog +if (!catalog.tableExists(tableName)) { + throw new AnalysisException(s"Table $tableName in $cmd does not exist.") +} +val table = catalog.getTableMetadata(tableName) +if (catalog.isTemporaryTable(tableName)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on temporary tables: $tableName") +} +if (DDLUtils.isDatasourceTable(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on datasource tables: $tableName") +} +if (table.tableType != CatalogTableType.EXTERNAL) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on external tables: $tableName") +} +if (!DDLUtils.isTablePartitioned(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on partitioned tables: $tableName") +} +if (table.storage.locationUri.isEmpty) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on table with location provided: $tableName") +} + +val root = new Path(table.storage.locationUri.get) +val fs = root.getFileSystem(spark.sparkContext.hadoopConfiguration) +// Dummy jobconf to get to the pathFilter defined in configuration +// It's very expensive to create a JobConf(ClassUtil.findContainingJar() is slow) +val jobConf = new JobConf(spark.sparkContext.hadoopConfiguration, this.getClass) +val pathFilter = FileInputFormat.getInputPathFilter(jobConf) +val partitionSpecsAndLocs = scanPartitions( + spark, fs, pathFilter, root, Map(), table.partitionColumnNames.map(_.toLowerCase)) +val parts = partitionSpecsAndLocs.map { case (spec, location) => + // inherit table storage format (possibly except for location) + CatalogTablePartition(spec, table.storage.copy(locationUri = Some(location.toUri.toString))) +} +spark.sessionState.catalog.createPartitions(tableName, + parts.toArray[CatalogTablePartition], ignoreIfExists = true) +Seq.empty[Row] + } + + @transient private lazy val evalTaskSupport = new ForkJoinTaskSupport(new ForkJoinPool(8)) + + private def scanPartitions( + spark: SparkSession, + fs: FileSystem, + filter: PathFilter, + path: Path, + spec: TablePartitionSpec, + partitionNames: Seq[String]): GenSeq[(TablePartitionSpec, Path)] = { +if (partitionNames.length == 0) { + return Seq(spec -> path) +} + +val statuses = fs.listStatus(path) +val threshold = spark.conf.get("spark.rdd.parallelListingThreshold", "10").toInt +val statusPar: GenSeq[FileStatus] = + if (partitionNames.length > 1 && statuses.length > threshold || partitionNames.length > 2) { +val parArray = statuses.par +parArray.tasksupport = evalTaskSupport +parArray + } else { +statuses + } +statusPar.flatMap { st => + val name = st.getPath.getName + if (st.isDirectory && name.contains("=")) { +val ps = name.split("=", 2) +val columnName = PartitioningUtils.unescapePathName(ps(0)).toLowerCase +val value = PartitioningUtils.unescapePathName(ps(1)) --- End diff -- Do we need to check if the value is valid. E.g., for a partition column "a" of IntegerType, "a=abc" is invalid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark issue #9524: [SPARK-10387][ML] Add code gen for gbt
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9524 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9524: [SPARK-10387][ML] Add code gen for gbt
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9524 **[Test build #63309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63309/consoleFull)** for PR 9524 at commit [`0a29f6a`](https://github.com/apache/spark/commit/0a29f6a2e2bb1bb70b3da926ecf310e7a07dd3c8). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9524: [SPARK-10387][ML] Add code gen for gbt
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9524 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63309/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73780281 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -864,6 +864,55 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { testAddPartitions(isDatasourceTable = true) } + test("alter table: recover partitions (sequential)") { +withSQLConf("spark.rdd.parallelListingThreshold" -> "1") { + testRecoverPartitions() +} + } + + test("after table: recover partition (parallel)") { --- End diff -- after -> alter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9524: [SPARK-10387][ML] Add code gen for gbt
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9524 **[Test build #63309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63309/consoleFull)** for PR 9524 at commit [`0a29f6a`](https://github.com/apache/spark/commit/0a29f6a2e2bb1bb70b3da926ecf310e7a07dd3c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14514: document that Mesos cluster mode supports python
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14514 OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13868: [SPARK-15899] [SQL] Fix the construction of the file pat...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13868 Any further comments, watchers? maybe worth implementing Marcelo's last comments and then let's merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14484: [SPARK-16796][Web UI] Mask spark.authenticate.sec...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14484 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14450 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` when `co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14518 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14484: [SPARK-16796][Web UI] Mask spark.authenticate.secret on ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14484 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` when `co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63304/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14450: [SPARK-16847][SQL] Prevent to potentially read corrupt s...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14450 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` when `co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14518 **[Test build #63304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63304/consoleFull)** for PR 14518 at commit [`af1a3b8`](https://github.com/apache/spark/commit/af1a3b837a3d384ba2387e2db0b5ae975870b21a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63308/consoleFull)** for PR 14461 at commit [`76e68eb`](https://github.com/apache/spark/commit/76e68eb70187f977aa569fde50422018061c8bcf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14461 Oh yes. Thanks :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14450: [SPARK-16847][SQL] Prevent to potentially read corrupt s...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14450 Oh, I think this should not be backported to 2.0 and 1.x. In current releases, this is all being manually prevented in Spark itself (in [here](https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L171-L180)). Also, they use Parquet 1.7.0/1.6.0rc3 but this was fixed in Parquet 1.8.0 (assuming from [PARQUET-251](https://issues.apache.org/jira/browse/PARQUET-251)). In master branch, this safeguard was removed with upgrading Parquet to 1.8.1 in [SPARK-9876](https://issues.apache.org/jira/browse/SPARK-9876). So, I believe this PR is only related with the current master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/14461 The excludes are a list, you forgot to add commas between them --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73779646 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala --- @@ -93,6 +102,38 @@ class UnsafeArraySuite extends SparkFunSuite { assert(unsafeString.getUTF8String(i).toString().equals(e)) } +val unsafeDate = ExpressionEncoder[Array[Int]].resolveAndBind(). + toRow(dateArray).getArray(0) +assert(unsafeDate.isInstanceOf[UnsafeArrayData]) +assert(unsafeDate.numElements == dateArray.length) +dateArray.zipWithIndex.map { case (e, i) => + assert(unsafeDate.get(i, DateType) == e) +} + +val unsafeTimestamp = ExpressionEncoder[Array[Long]].resolveAndBind(). + toRow(timestampArray).getArray(0) +assert(unsafeTimestamp.isInstanceOf[UnsafeArrayData]) +assert(unsafeTimestamp.numElements == timestampArray.length) +timestampArray.zipWithIndex.map { case (e, i) => + assert(unsafeTimestamp.get(i, TimestampType) == e) +} + +val unsafeDecimal = ExpressionEncoder[Array[Decimal]].resolveAndBind(). --- End diff -- the external type for decimal is `java.math.BigDecimal` or scala decimal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73779549 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -341,16 +343,20 @@ public UnsafeArrayData copy() { int size = numElements(); byte[] values = new byte[size]; Platform.copyMemory( - baseObject, baseOffset + headerInBytes, values, Platform.BYTE_ARRAY_OFFSET, size); + baseObject, elementOffset, values, Platform.BYTE_ARRAY_OFFSET, size); return values; } @Override public short[] toShortArray() { -int size = numElements(); +if (numElements > Integer.MAX_VALUE) { --- End diff -- `numElements` is a int right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63301/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13680 **[Test build #63301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63301/consoleFull)** for PR 13680 at commit [`7b4e819`](https://github.com/apache/spark/commit/7b4e819431de327482fdc6c3722f16c2858955c5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14474 it's a bug since the very beginning, should we merge it to 1.6 too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63306/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63306/consoleFull)** for PR 14461 at commit [`e3707db`](https://github.com/apache/spark/commit/e3707db1649bf084d75a86bbf2fa755b7dc526d1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #63307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63307/consoleFull)** for PR 13701 at commit [`2d34803`](https://github.com/apache/spark/commit/2d3480381317bba06274e4ea899bc8d98d5cb82c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63306/consoleFull)** for PR 14461 at commit [`e3707db`](https://github.com/apache/spark/commit/e3707db1649bf084d75a86bbf2fa755b7dc526d1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` when `co...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14518 Hi @yhuai , I open the PR as suggested. Just to triple-check, I want to be sure on the behaviour written in the PR description, 1. Check `compression` and use this if it is set. 2. If `compression` is not set, check `orc.compress` and use it. 3. If `compression` and `orc.compress` are not set, then use the default snappy. I apologise that I am asking similar things again and again but please excuse this and bear with this because I just want to avoid make multiple PRs to change something forwards and backwards (this is almost identical with the initial version of the past PR). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` when `co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14518 **[Test build #63304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63304/consoleFull)** for PR 14518 at commit [`af1a3b8`](https://github.com/apache/spark/commit/af1a3b837a3d384ba2387e2db0b5ae975870b21a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11746 **[Test build #63305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63305/consoleFull)** for PR 11746 at commit [`6d8f4f6`](https://github.com/apache/spark/commit/6d8f4f6ef7e73fab0a6955a25eee30b0df49d5a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14518: [SPARK-16610][SQL] Do not ignore `orc.compress` w...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14518 [SPARK-16610][SQL] Do not ignore `orc.compress` when `compression` option is unset ## What changes were proposed in this pull request? For ORC source, Spark SQL has a writer option `compression`, which is used to set the codec and its value will be also set to `orc.compress` (the orc conf used for codec). However, if a user only set `orc.compress` in the writer option, we should not use the default value of `compression` (snappy) as the codec. Instead, we should respect the value of `orc.compress`. This PR make ORC data source not ignoring `orc.compress` when `comperssion` is unset. So, here is the behaviour, 1. Check `compression` and use this if it is set. 2. If `orc.compress` is not set, check `orc.compress` and use it. 3. If `compression` and `orc.compress` are not set, then use the default snappy. ## How was this patch tested? Unit test in `OrcQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16610 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14518 commit 2d55a61c7b5e59f2442fa20d8dc5bec8eceda650 Author: hyukjinkwonDate: 2016-08-06T02:05:25Z [SPARK-16610][SQL] Do not ignore `orc.compress` when `compression` option is unset commit 4f2731370621e1fd9b25105a8f0184c98a7465f7 Author: hyukjinkwon Date: 2016-08-06T02:08:28Z Use SNAPPY as default commit 1ad44eca2d796202c894c262efad666249a7b942 Author: hyukjinkwon Date: 2016-08-06T02:09:59Z Fix indentation commit af1a3b837a3d384ba2387e2db0b5ae975870b21a Author: hyukjinkwon Date: 2016-08-06T02:11:38Z Add a comment for default value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/11746 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63299/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14266 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14266 **[Test build #63299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63299/consoleFull)** for PR 14266 at commit [`e990794`](https://github.com/apache/spark/commit/e990794139a3d3d4c66689d4b979553dd04f449f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63303/consoleFull)** for PR 14461 at commit [`4564bc5`](https://github.com/apache/spark/commit/4564bc5742ed44559f22646dea8a343102f3677c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class ShuffleIndexInformation ` * `public class ShuffleIndexRecord ` * `case class Least(children: Seq[Expression]) extends Expression ` * `case class Greatest(children: Seq[Expression]) extends Expression ` * `case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan])` * `case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63303/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14472 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63297/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14472 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63303/consoleFull)** for PR 14461 at commit [`4564bc5`](https://github.com/apache/spark/commit/4564bc5742ed44559f22646dea8a343102f3677c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14461 retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14472 **[Test build #63297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63297/consoleFull)** for PR 14472 at commit [`2352d6f`](https://github.com/apache/spark/commit/2352d6f2005bce8c241bc55a8668e1968f65d450). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14461 @ajbozarth Thanks for pointing out that. I have removed the link on the application summary page and reverted related changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63302/consoleFull)** for PR 14461 at commit [`0192b37`](https://github.com/apache/spark/commit/0192b374695af4e28cbbe264c75406966ad36f82). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63302/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #63302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63302/consoleFull)** for PR 14461 at commit [`0192b37`](https://github.com/apache/spark/commit/0192b374695af4e28cbbe264c75406966ad36f82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63295/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #63295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63295/consoleFull)** for PR 9207 at commit [`00173aa`](https://github.com/apache/spark/commit/00173aad74775e1d416bb2d311b54530274d1050). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class ShuffleIndexInformation ` * `public class ShuffleIndexRecord ` * `case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan])` * `case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 BTW, this problem exists in the external CSV data source as well. The root cause of https://github.com/databricks/spark-csv/issues/370 is this issue and also if my understanding is correct, the external CSV data source would not work in Spark 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63298/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14513 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14513 **[Test build #63298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63298/consoleFull)** for PR 14513 at commit [`b8f2549`](https://github.com/apache/spark/commit/b8f254987b4e0f09d7b3e0f080a340238ac2e088). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11746 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11746 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63294/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11746 **[Test build #63294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63294/consoleFull)** for PR 11746 at commit [`6d8f4f6`](https://github.com/apache/spark/commit/6d8f4f6ef7e73fab0a6955a25eee30b0df49d5a6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13680 **[Test build #63301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63301/consoleFull)** for PR 13680 at commit [`7b4e819`](https://github.com/apache/spark/commit/7b4e819431de327482fdc6c3722f16c2858955c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution in CTE by ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 ping @cloud-fan @hvanhovell Can you look at this if it is making sense for you? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14517 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...
GitHub user GregBowyer opened a pull request: https://github.com/apache/spark/pull/14517 [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy ## What changes were proposed in this pull request? API access to allow pyspark to use bucketBy and sortBy in datraframes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/GregBowyer/spark pyspark-bucketing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14517.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14517 commit 47d9ef797e229b9e3239c5dcb7ea72bef1c54683 Author: Greg BowyerDate: 2016-08-06T00:53:30Z [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14311 **[Test build #3204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3204/consoleFull)** for PR 14311 at commit [`7543c4a`](https://github.com/apache/spark/commit/7543c4abc67de3559da92f2c290c792cb4ca78bc). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73776684 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/ColumnTypeSuite.scala --- @@ -73,8 +73,8 @@ class ColumnTypeSuite extends SparkFunSuite with Logging { checkActualSize(BINARY, Array.fill[Byte](4)(0.toByte), 4 + 4) checkActualSize(COMPACT_DECIMAL(15, 10), Decimal(0, 15, 10), 8) checkActualSize(LARGE_DECIMAL(20, 10), Decimal(0, 20, 10), 5) -checkActualSize(ARRAY_TYPE, Array[Any](1), 16) --- End diff -- It seem to have to keep `Any`. When I changed it to `Int`, I got the following cast error: ```java [I cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData java.lang.ClassCastException: [I cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getArray(rows.scala:236) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(generated.java:34) at org.apache.spark.sql.execution.columnar.ColumnTypeSuite$$anonfun$2.checkActualSize$1(ColumnTypeSuite.scala:60) ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63300/consoleFull)** for PR 14447 at commit [`eb5f5af`](https://github.com/apache/spark/commit/eb5f5afea9512015900dcc690edb292622b02379). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63300/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...
Github user djk121 commented on the issue: https://github.com/apache/spark/pull/14118 I'm doing this: val dataframe = sparkSession.read .format("com.databricks.spark.csv") .option("header", "true") .option("nullValue", "null") .schema(schema) .load(csvPath) I then take that dataframe and attempt to write it out to parquet like so: dataframe.write .mode(SaveMode.Overwrite) .option("compression", "snappy") .parquet(outputPath) When the parquet writes go, I get the same traceback as above. I can see from that traceback that it's org.apache.spark.sql.execution.datasources.csv, so for whatever reason, com.databricks.spark.csv isn't being used. Do I need to do something different to force it to be used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63300/consoleFull)** for PR 14447 at commit [`eb5f5af`](https://github.com/apache/spark/commit/eb5f5afea9512015900dcc690edb292622b02379). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73776214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -199,6 +209,19 @@ private[sql] case class FileSourceScanExec( options = relation.options, hadoopConf = relation.sparkSession.sessionState.newHadoopConfWithOptions(relation.options)) + (file: PartitionedFile) => { +val iter = func(file) +// Only for test purpose. +// Once the vectorized Parquet reader is initialized in the above method, we can read its +// variable numRowGroups. +if (fileFormat != null) { --- End diff -- I think we can directly update the accumulator in ParquetFileFormat if there is the accumulator. I will update the codes later and please check if it is good for you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/13300 @rxin Do you think we can revisit this feature and have it in 2.1? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73775993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand( } +/** + * Recover Partitions in ALTER TABLE: recover all the partition in the directory of a table and + * update the catalog. + * + * The syntax of this command is: + * {{{ + * ALTER TABLE table RECOVER PARTITIONS; + * MSCK REPAIR TABLE table; + * }}} + */ +case class AlterTableRecoverPartitionsCommand( +tableName: TableIdentifier, +cmd: String = "ALTER TABLE RECOVER PARTITIONS") extends RunnableCommand { + override def run(spark: SparkSession): Seq[Row] = { +val catalog = spark.sessionState.catalog +if (!catalog.tableExists(tableName)) { + throw new AnalysisException(s"Table $tableName in $cmd does not exist.") +} +val table = catalog.getTableMetadata(tableName) +if (catalog.isTemporaryTable(tableName)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on temporary tables: $tableName") +} +if (DDLUtils.isDatasourceTable(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd on datasource tables: $tableName") +} +if (table.tableType != CatalogTableType.EXTERNAL) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on external tables: $tableName") +} +if (!DDLUtils.isTablePartitioned(table)) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on partitioned tables: $tableName") +} +if (table.storage.locationUri.isEmpty) { + throw new AnalysisException( +s"Operation not allowed: $cmd only works on table with location provided: $tableName") +} + +val root = new Path(table.storage.locationUri.get) +val fs = root.getFileSystem(spark.sparkContext.hadoopConfiguration) +// Dummy jobconf to get to the pathFilter defined in configuration +// It's very expensive to create a JobConf(ClassUtil.findContainingJar() is slow) +val jobConf = new JobConf(spark.sparkContext.hadoopConfiguration, this.getClass) +val pathFilter = FileInputFormat.getInputPathFilter(jobConf) +val partitionSpecsAndLocs = scanPartitions( + spark, fs, pathFilter, root, Map(), table.partitionColumnNames.map(_.toLowerCase)) +val parts = partitionSpecsAndLocs.map { case (spec, location) => + // inherit table storage format (possibly except for location) + CatalogTablePartition(spec, table.storage.copy(locationUri = Some(location.toUri.toString))) +} +spark.sessionState.catalog.createPartitions(tableName, + parts.toArray[CatalogTablePartition], ignoreIfExists = true) +Seq.empty[Row] + } + + @transient private lazy val evalTaskSupport = new ForkJoinTaskSupport(new ForkJoinPool(8)) + + private def scanPartitions( + spark: SparkSession, + fs: FileSystem, + filter: PathFilter, + path: Path, + spec: TablePartitionSpec, + partitionNames: Seq[String]): GenSeq[(TablePartitionSpec, Path)] = { +if (partitionNames.length == 0) { + return Seq(spec -> path) +} + +val statuses = fs.listStatus(path) +val threshold = spark.conf.get("spark.rdd.parallelListingThreshold", "10").toInt +val statusPar: GenSeq[FileStatus] = + if (partitionNames.length > 1 && statuses.length > threshold || partitionNames.length > 2) { +val parArray = statuses.par --- End diff -- i didn't look carefully - but if you are using the default exec context, please create a new one. otherwise it'd block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63291/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14266 **[Test build #63299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63299/consoleFull)** for PR 14266 at commit [`e990794`](https://github.com/apache/spark/commit/e990794139a3d3d4c66689d4b979553dd04f449f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14513 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14513 **[Test build #63291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63291/consoleFull)** for PR 14513 at commit [`70e32c6`](https://github.com/apache/spark/commit/70e32c661cb9f0aea9c77ac26e8326268599317c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63289/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org