[GitHub] spark issue #15229: [SPARK-17654] [SQL] Propagate bucketing information for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15229 **[Test build #65862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65862/consoleFull)** for PR 15229 at commit [`8726cc6`](https://github.com/apache/spark/commit/8726cc6430cbeaf8c2eebd7cef40199a7c563218). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15229: [SPARK-17654] [SQL] Propagate bucketing informati...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15229 [SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / from Catalog ## What changes were proposed in this pull request? Currently Spark does not respect bucketing for Hive tables. This PR includes following changes: - will extract table's bucketing information in `HiveClientImpl` - while writing table info to metastore, `MetastoreRelation` now populates the bucketing information in the hive `Table` object - `HiveTableScanExec` now exposes `outputPartitioning` and `outputOrdering` as per bucketing spec. - `InsertIntoHiveTable` now exposes `requiredChildDistribution` and `requiredChildOrdering` based on the target table's bucketing spec. TODOs (which will be done in linked PRs and not this one): - [ ] `ClusteredDistribution` does not guarantee the number of partitions (which corresponds to output bucket files created) generated. This will require adding strict guarantees to `ClusteredDistribution`. I think it will need more thought and better to do incrementally and not packing in this PR. - [ ] While writing to bucketed files, Hive's hashing function should be used. I have a PR open to implement Hive hashing native in Spark : https://github.com/apache/spark/pull/15047 - [ ] Allow creating Hive bucketed tables ## How was this patch tested? Tested with Hive tables created locally. Adding a new test case will need implementing bucketed table creation which is not supported :( Suggestions welcome. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17654_hive_extract_bucketing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15229.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15229 commit caef89a198dac2fee4afaad622e2ecc11f200836 Author: Tejas PatilDate: 2016-08-23T20:45:00Z Support bucketing for Hive tables commit ee79dd2ae1e174ab38fc5f6b10f5a9a2e2721533 Author: Tejas Patil Date: 2016-08-23T20:45:00Z Support bucketing for Hive tables commit 8726cc6430cbeaf8c2eebd7cef40199a7c563218 Author: Tejas Patil Date: 2016-09-24T03:22:07Z Merge remote-tracking branch 'origin/SPARK-17654_hive_extract_bucketing' into SPARK-17654_hive_extract_bucketing_2 # Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableSca nExec.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15168 **[Test build #65861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65861/consoleFull)** for PR 15168 at commit [`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15168 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/12601 Mostly LGTM, except three minor comments. Thank you for your hard work, @JustinPihony ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80353253 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -420,62 +420,11 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { def jdbc(url: String, table: String, connectionProperties: Properties): Unit = { assertNotPartitioned("jdbc") assertNotBucketed("jdbc") - -// to add required options like URL and dbtable -val params = extraOptions.toMap ++ Map("url" -> url, "dbtable" -> table) -val jdbcOptions = new JDBCOptions(params) -val jdbcUrl = jdbcOptions.url -val jdbcTable = jdbcOptions.table - -val props = new Properties() -extraOptions.foreach { case (key, value) => - props.put(key, value) -} // connectionProperties should override settings in extraOptions -props.putAll(connectionProperties) -val conn = JdbcUtils.createConnectionFactory(jdbcUrl, props)() - -try { - var tableExists = JdbcUtils.tableExists(conn, jdbcUrl, jdbcTable) - - if (mode == SaveMode.Ignore && tableExists) { -return - } - - if (mode == SaveMode.ErrorIfExists && tableExists) { -sys.error(s"Table $jdbcTable already exists.") - } - - if (mode == SaveMode.Overwrite && tableExists) { -if (jdbcOptions.isTruncate && -JdbcUtils.isCascadingTruncateTable(jdbcUrl) == Some(false)) { - JdbcUtils.truncateTable(conn, jdbcTable) -} else { - JdbcUtils.dropTable(conn, jdbcTable) - tableExists = false -} - } - - // Create the table if the table didn't exist. - if (!tableExists) { -val schema = JdbcUtils.schemaString(df, jdbcUrl) -// To allow certain options to append when create a new table, which can be -// table_options or partition_options. -// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" -val createtblOptions = jdbcOptions.createTableOptions -val sql = s"CREATE TABLE $jdbcTable ($schema) $createtblOptions" -val statement = conn.createStatement -try { - statement.executeUpdate(sql) -} finally { - statement.close() -} - } -} finally { - conn.close() -} - -JdbcUtils.saveTable(df, jdbcUrl, jdbcTable, props) +this.extraOptions = this.extraOptions ++ (connectionProperties.asScala) +// explicit url and dbtable should override all +this.extraOptions += ("url" -> url, "dbtable" -> table) +format("jdbc").save --- End diff -- The omission of parentheses on methods should only be used when the method has no side-effects. Thus, please change it to `save()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80353203 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -208,4 +210,84 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count()) assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length) } + + test("save works for format(\"jdbc\") if url and dbtable are set") { +val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2) + +df.write.format("jdbc") +.options(Map("url" -> url, "dbtable" -> "TEST.SAVETEST")) +.save --- End diff -- Nit: `save` -> `save()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15168 The failure seems to be irrelevant. Retest this please. ``` [info] - Naive Bayes Multinomial *** FAILED *** (137 milliseconds) [info] Expected 0.7 and 0.6494565217391305 to be within 0.05 using absolute tolerance. [info] validateModelFit: ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15228 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15228 **[Test build #65857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65857/consoleFull)** for PR 15228 at commit [`caef89a`](https://github.com/apache/spark/commit/caef89a198dac2fee4afaad622e2ecc11f200836). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15228 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65857/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80353010 --- Diff: docs/sql-programming-guide.md --- @@ -1096,13 +1096,17 @@ the Data Sources API. The following options are supported: {% highlight sql %} -CREATE TEMPORARY VIEW jdbcTable +CREATE TEMPORARY TABLE jdbcTable --- End diff -- Please change it back. `CREATE TEMPORARY TABLE` is deprecated. You will get a Parser error ``` CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW as an alternative.(line 1, pos 0) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15168 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15168 **[Test build #65859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65859/consoleFull)** for PR 15168 at commit [`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15168 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65859/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15217: [SPARK-17577][Core] Update SparkContext.addFile to make ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15217 Close this PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15217: [SPARK-17577][Core] Update SparkContext.addFile t...
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/15217 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/15228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user JustinPihony commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80352586 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -21,6 +21,7 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.List; +import java.util.Properties; // $example off:schema_merging$ --- End diff -- @HyukjinKwon Yes, that is what I was talking about...just fixed it back --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12601 **[Test build #65860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65860/consoleFull)** for PR 12601 at commit [`8fb86b4`](https://github.com/apache/spark/commit/8fb86b482929e321f4ec8865124b8661f1a29bbf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15168 **[Test build #65859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65859/consoleFull)** for PR 15168 at commit [`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12601 Thanks for mentioning me. It looks good to me in my personal view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80352317 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -21,6 +21,7 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.List; +import java.util.Properties; // $example off:schema_merging$ --- End diff -- Oh, maybe, my previous comment was not clear. I meant ```java Import java.util.List; // $example off:schema_merging$ Import java.util.Properties; ``` I haven't tried to build the doc against the current state but I guess we won't need this import for Parquet`s schema mering example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12601 **[Test build #65858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65858/consoleFull)** for PR 12601 at commit [`06c1cba`](https://github.com/apache/spark/commit/06c1cba1da5ab140d71c29f41afd608e863bfe1b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/12601 @gatorsmile I added the R and SQL documentation. I took the SQL portion from https://github.com/apache/spark/pull/6121/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15228 **[Test build #65857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65857/consoleFull)** for PR 15228 at commit [`caef89a`](https://github.com/apache/spark/commit/caef89a198dac2fee4afaad622e2ecc11f200836). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15071: [SPARK-17517][SQL]Improve generated Code for BroadcastHa...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/15071 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15228 [SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / from Catalog ## What changes were proposed in this pull request? Currently Spark does not respect bucketing for Hive tables. This PR includes following changes: - will extract table's bucketing information in `HiveClientImpl` - while writing table info to metastore, `MetastoreRelation` now populates the bucketing information in the hive `Table` object - `HiveTableScanExec` now exposes `outputPartitioning` and `outputOrdering` as per bucketing spec. - `InsertIntoHiveTable` now exposes `requiredChildDistribution` and `requiredChildOrdering` based on the target table's bucketing spec. TODOs (which will be done in linked PRs and not this one): - [ ] `ClusteredDistribution` does not guarantee the number of partitions (which corresponds to output bucket files created) generated. This will require adding strict guarantees to `ClusteredDistribution`. I think it will need more thought and better to do incrementally and not packing in this PR. - [ ] While writing to bucketed files, Hive's hashing function should be used. I have a PR open to implement Hive hashing native in Spark : https://github.com/apache/spark/pull/15047 - [ ] Allow creating Hive bucketed tables ## How was this patch tested? Tested with Hive tables created locally. Adding a new test case will need implementing bucketed table creation which is not supported :( Suggestions welcome. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17654_hive_extract_bucketing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15227: [SPARK-17655][SQL]Remove unused variables declarations a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15227 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15227: [SPARK-17655][SQL]Remove unused variables declara...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/15227 [SPARK-17655][SQL]Remove unused variables declarations and definations in a WholeStageCodeGened stage ## What changes were proposed in this pull request? A WholeStageCodeGened stage with multiple CodegenSupport Operators generates unused result rows and their associated buffer holders and row writers, which can be removed. ## How was this patch tested? existing ut. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark rm-unused-object Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15227.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15227 commit eabd4a55cbe8fd57c722396c95087a2b6c695587 Author: Kent YaoDate: 2016-09-24T01:58:42Z remove redundant variables declarations and definations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15218 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15218 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15218 **[Test build #65856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65856/consoleFull)** for PR 15218 at commit [`f71f1c0`](https://github.com/apache/spark/commit/f71f1c0f245aa9534330c9b4913ce40a1cfa250e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80350919 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -23,6 +23,8 @@ import java.util.List; // $example off:schema_merging$ +import java.util.Properties; + --- End diff -- No reason to not follow the guildline? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15226 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65855/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15226 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15226 **[Test build #65855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65855/consoleFull)** for PR 15226 at commit [`0e014b0`](https://github.com/apache/spark/commit/0e014b02d03eeda8373cd8892662ed6ce9de664c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user JustinPihony commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80350755 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -23,6 +23,8 @@ import java.util.List; // $example off:schema_merging$ +import java.util.Properties; + --- End diff -- Should this really be added to the example, though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r80350458 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -23,6 +23,8 @@ import java.util.List; // $example off:schema_merging$ +import java.util.Properties; + --- End diff -- I think we should put `java.util` imports together without additional newline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15224 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15224 **[Test build #65854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65854/consoleFull)** for PR 15224 at commit [`49afc56`](https://github.com/apache/spark/commit/49afc5686d7ccf9a7864fc9b9c9eb5217a281086). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15226#discussion_r80350179 --- Diff: core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala --- @@ -117,6 +124,24 @@ private[spark] abstract class AsynchronousListenerBus[L <: AnyRef, E](name: Stri eventLock.release() } else { onDropEvent(event) + droppedEventsCounter.incrementAndGet() +} + +val droppedEvents = droppedEventsCounter.get +if (droppedEvents > 0) { + // Don't log too frequently + if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) { --- End diff -- Won't nanotime be overkill ? Even if there is a single dropped event, this check will get executed with every post() so having currentTimeMillis (which is less costly) is preferable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15213 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65853/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15213 **[Test build #65853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65853/consoleFull)** for PR 15213 at commit [`1127ca1`](https://github.com/apache/spark/commit/1127ca1538e9a9ded9e91ead65af8c710e99003d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65851/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15226#discussion_r80347195 --- Diff: core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala --- @@ -117,6 +124,24 @@ private[spark] abstract class AsynchronousListenerBus[L <: AnyRef, E](name: Stri eventLock.release() } else { onDropEvent(event) + droppedEventsCounter.incrementAndGet() +} + +val droppedEvents = droppedEventsCounter.get +if (droppedEvents > 0) { + // Don't log too frequently + if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) { --- End diff -- use nanotime --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15220 **[Test build #65851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65851/consoleFull)** for PR 15220 at commit [`77d7ba0`](https://github.com/apache/spark/commit/77d7ba0ad3f2382c52a15a24cabcb02c3c0009f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15226 **[Test build #65855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65855/consoleFull)** for PR 15226 at commit [`0e014b0`](https://github.com/apache/spark/commit/0e014b02d03eeda8373cd8892662ed6ce9de664c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15218 **[Test build #65856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65856/consoleFull)** for PR 15218 at commit [`f71f1c0`](https://github.com/apache/spark/commit/f71f1c0f245aa9534330c9b4913ce40a1cfa250e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 > actual problem is not in abortStage but rather in improper additions to failedStages correct, i think a more accurate description for this issue is "do not add `failedStages` when abortStage for fetch failure" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/15226 [SPARK-17649][CORE] Log how many Spark events got dropped in AsynchronousListenerBus ## What changes were proposed in this pull request? Backport #15220 to 1.6. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-17649-branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15226.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15226 commit 0e014b02d03eeda8373cd8892662ed6ce9de664c Author: Shixiong ZhuDate: 2016-09-23T23:57:28Z [SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus Log how many Spark events got dropped in LiveListenerBus so that the user can get insights on how to set a correct event queue size. Jenkins Author: Shixiong Zhu Closes #15220 from zsxwing/SPARK-17649. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @gatorsmile Thanks. #65832 is the latest one which does not have the same failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15220 Thanks! Merging to master / 2.0. I will submit a patch for 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15223 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65849/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15223 **[Test build #65849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65849/consoleFull)** for PR 15223 at commit [`a0122f0`](https://github.com/apache/spark/commit/a0122f0569b9caa8995c65eb27314edb0234a5ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 Right, but `abortStage` occurs elsewhere. "When abort stage" seems to imply that this fix is necessary for all usages of `abortStage` when the actual problem is not in `abortStage` but rather in improper additions to `failedStages`. I've got to go now, but I'll come back to this soon(ish). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 Actually the failedStages only added here in spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 @scwf That description would actually be at least as bad since there are multiple routes to `abortStage` and this issue of adding to `failedStages` only applies to these two. I'll take another look soon and see if I can come up with a clean refactoring and a better description for the commit message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65850/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15089 **[Test build #65850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65850/consoleFull)** for PR 15089 at commit [`5239042`](https://github.com/apache/spark/commit/52390429fb1f7b20705ddad5621e8267c2aff12b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 Ok, that makes better sense. The `disallowStageRetryForTest` case doesn't worry me too much since it is only used in tests. If we can fix this case, great; else if it remains possible to create failing tests that can never happen outside of the tests, then that is not all that important (but should at least be noted in comments in the test suite.) Yes, not adding to `failedStages` after going down either of those two paths to `abortStage` is a correct fix even if the description of the problem wasn't really accurate. I'll take another look over the weekend to see if the logic can be expressed a bit more clearly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 Thanks @zsxwing to explain this. @markhamstra the issue happens in the case of my PR description. It usually depends on muti-thread submitting jobs cases and the order of fetch failure, so i said it is a race condition. If you think it is confusing, how about change the title to " do not add failedStages when abort stage"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65846/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15224 **[Test build #65846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65846/consoleFull)** for PR 15224 at commit [`c65f94f`](https://github.com/apache/spark/commit/c65f94f440fd67c1d3b555e647dede95ac71fa25). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65847/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15220 **[Test build #65847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65847/consoleFull)** for PR 15220 at commit [`2f47c30`](https://github.com/apache/spark/commit/2f47c30bf9b3ad1e929fe9bf0da4b835e7ea13cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65848/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15213 @markhamstra I agreed this is not a race condition since there is only one single thread. This issue is the code doesn't handle the following two corner cases: - `failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) && failedStages.isEmpty` is true - `disallowStageRetryForTest && failedStages.isEmpty` In the above cases, `ResubmitFailedStages` won't be posted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15089 **[Test build #65848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65848/consoleFull)** for PR 15089 at commit [`87ecc0d`](https://github.com/apache/spark/commit/87ecc0db2c5c980273e06d37ecb764fd03ad2b65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15224 **[Test build #65854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65854/consoleFull)** for PR 15224 at commit [`49afc56`](https://github.com/apache/spark/commit/49afc5686d7ccf9a7864fc9b9c9eb5217a281086). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/15224 @zsxwing Thanks for the review. Addressed the nit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 This doesn't make sense to me. The DAGSchedulerEventProcessLoop runs on a single thread and processes a single event from its queue at a time. When the first CompletionEvent is run as a result of a fetch failure, failedStages is added to and a ResubmitFailedStages event is queued. After handleTaskCompletion is done, the next event from the queue will be processed. As events are sequentially dequeued and handled, either the ResubmitFailedStages event will be handled before the CompletionEvent for the second fetch failure, or the CompletionEvent will be handled before the ResubmitFailedStages event. If the ResubmitFailedStages is handled first, then failedStages will be cleared in resubmitFailedStages, and there will be nothing preventing the subsequent CompletionEvent from queueing another ResubmitFailedStages event to handle additional fetch failures. In the alternative that the second CompletionEvent is queued and handled before the ResubmitFailedStages event, then the additional stages are added to the non-empty failedStages, but there is no need to schedule another ResubmitFailedStages event because the one from the first CompletionEvent is still on the queue and the handling of that queued event will also handle the newly added failedStages from the second CompletionEvent. In either ordering, all the failedStages are handled and there is no race condition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15223 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65844/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15223 **[Test build #65844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65844/consoleFull)** for PR 15223 at commit [`742a787`](https://github.com/apache/spark/commit/742a7879865a4b85883337798c36af99c867ccae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14808 I think we should close this. The new example and the user guide should be updated against [SPARK-17239](https://issues.apache.org/jira/browse/SPARK-17239). @jaceklaskowski If you'd still like to do it, please let me know otherwise I am happy to do it. We should try to get this in soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15213 **[Test build #65853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65853/consoleFull)** for PR 15213 at commit [`1127ca1`](https://github.com/apache/spark/commit/1127ca1538e9a9ded9e91ead65af8c710e99003d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15200: Skip building R vignettes if Spark is not built
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15200 if it's part of the `-Psparkr` profile of the build it will be regenerated by default. If it's changed and not in .gitignore it should be flagged for commit.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15220 **[Test build #65843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65843/consoleFull)** for PR 15220 at commit [`b4f56a0`](https://github.com/apache/spark/commit/b4f56a073ac8f5b76db929a456f18b77b8e8910f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15225#discussion_r80340526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) { try { reserveInternal(newCapacity); } catch (OutOfMemoryError outOfMemoryError) { - throwUnsupportedException(newCapacity, requiredCapacity, outOfMemoryError); + throwUnsupportedException(requiredCapacity, outOfMemoryError); } } else { -throwUnsupportedException(newCapacity, requiredCapacity, null); +throwUnsupportedException(requiredCapacity, null); } } } - private void throwUnsupportedException(int newCapacity, int requiredCapacity, Throwable cause) { -String message = "Cannot reserve more than " + newCapacity + -" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a" + -" workaround, you can disable the vectorized reader by setting " -+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false."; + private void throwUnsupportedException(int requiredCapacity, Throwable cause) { +String message = "Cannot reserve additional contiguous bytes in the vectorized reader " + +"(requested = " + requiredCapacity + " bytes). As a workaround, you can disable the " + +"vectorized reader by setting " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + +" to false."; --- End diff -- oh, I was thinking if we can explain the reason that fails the allocation instead of just saying we cannot allocate mem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/12601 Not sure you already knew it. Just want to share the commands how to build the doc. ```Scala SKIP_API=1 jekyll build SKIP_API=1 jekyll serve ``` After the second command, you can visit the generated document: ``` Server address: http://127.0.0.1:4000/ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15200: Skip building R vignettes if Spark is not built
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15200 Yeah - so I'm thinking we should just auto-generate this and check in the html file in git. Its not that big. When somebody updates the vignette we need to remind them to regenerate it though as a part of the PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/15225#discussion_r80339515 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) { try { reserveInternal(newCapacity); } catch (OutOfMemoryError outOfMemoryError) { - throwUnsupportedException(newCapacity, requiredCapacity, outOfMemoryError); + throwUnsupportedException(requiredCapacity, outOfMemoryError); } } else { -throwUnsupportedException(newCapacity, requiredCapacity, null); +throwUnsupportedException(requiredCapacity, null); } } } - private void throwUnsupportedException(int newCapacity, int requiredCapacity, Throwable cause) { -String message = "Cannot reserve more than " + newCapacity + -" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a" + -" workaround, you can disable the vectorized reader by setting " -+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false."; + private void throwUnsupportedException(int requiredCapacity, Throwable cause) { +String message = "Cannot reserve additional contiguous bytes in the vectorized reader " + +"(requested = " + requiredCapacity + " bytes). As a workaround, you can disable the " + +"vectorized reader by setting " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + +" to false."; --- End diff -- Shouldn't the first line work: `Cannot reserve additional contiguous bytes in the vectorized reader`? Do you have something in mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15225 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65852/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15224: [SPARK-17650] malformed url's throw exceptions be...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15224#discussion_r80337786 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -698,6 +698,28 @@ private[spark] object Utils extends Logging { } /** + * Validate that a given URI is actually a valid URL as well. + * @param uri The URI to validate + */ + @throws[MalformedURLException]("when the URI is an invalid URL") + def validateURL(uri: URI): Unit = { +Option(uri.getScheme).getOrElse("file") match { + case "http" | "https" | "ftp" => +try { + uri.toURL +} catch { + case e: MalformedURLException => +val msg = s"URI (${uri.toString}) is not a valid URL." +logError(msg) --- End diff -- nit: not need to log it since it's already be thrown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15225 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15225 **[Test build #65852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65852/consoleFull)** for PR 15225 at commit [`ed87537`](https://github.com/apache/spark/commit/ed8753766e7d3e18603b2408553b624e17edec0b). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15218 See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65832/testReport/org.apache.spark.streaming.kafka010/DirectKafkaStreamSuite/pattern_based_subscription/history/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15225#discussion_r80337683 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) { try { reserveInternal(newCapacity); } catch (OutOfMemoryError outOfMemoryError) { - throwUnsupportedException(newCapacity, requiredCapacity, outOfMemoryError); + throwUnsupportedException(requiredCapacity, outOfMemoryError); } } else { -throwUnsupportedException(newCapacity, requiredCapacity, null); +throwUnsupportedException(requiredCapacity, null); } } } - private void throwUnsupportedException(int newCapacity, int requiredCapacity, Throwable cause) { -String message = "Cannot reserve more than " + newCapacity + -" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a" + -" workaround, you can disable the vectorized reader by setting " -+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false."; + private void throwUnsupportedException(int requiredCapacity, Throwable cause) { +String message = "Cannot reserve additional contiguous bytes in the vectorized reader " + +"(requested = " + requiredCapacity + " bytes). As a workaround, you can disable the " + +"vectorized reader by setting " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + +" to false."; --- End diff -- Is it possible to also explain what's the cause of this error in the error message? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org