[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16467 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16467 LGTM - merging to master/2.1. Thanks! Sorry for the long wait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16467#discussion_r97667507 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out --- @@ -0,0 +1,59 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 5 + + +-- !query 0 +create temporary view t1 as select * from values + (1, 1), (2, 1), (null, 1), + (1, 3), (null, 3), + (1, null), (null, 2) +as t1(a1, b1) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + (1, 1), + (null, 3), + (1, null) +as t2(a2, b2) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +select a1,b1 +from t1 +where (a1,b1) not in (select a2,b2 + from t2) +-- !query 2 schema +struct +-- !query 2 output +2 1 + --- End diff -- Ok yeah you are right. I was confusing this with the or rules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71947/testReport)** for PR 16650 at commit [`b2bf1f7`](https://github.com/apache/spark/commit/b2bf1f78a3287e79b86074653b45710ddb127d98). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71947/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16465: [SPARK-19064][PySpark]Fix pip installing of sub componen...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16465 Small follow up ping for @joshrosen to @davies maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71948/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71948/testReport)** for PR 16694 at commit [`2980e67`](https://github.com/apache/spark/commit/2980e67d3415df2b810a9df9b96f2a5402c5c490). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16688 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16660 LGTM except one comment. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16660 The remaining comment is: https://github.com/apache/spark/pull/16660#discussion_r97427591 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15768 **[Test build #3550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3550/consoleFull)** for PR 15768 at commit [`cdeca1c`](https://github.com/apache/spark/commit/cdeca1cdd8ed61274137c3012ba49ff57d459190). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71944/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71944/testReport)** for PR 16650 at commit [`bc3d969`](https://github.com/apache/spark/commit/bc3d969a7fe72b6ea54fd187b996f11965048367). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16329 Sorry for the delay. This LGTM, but I'm currently away from my Apache SSH keys. Other committers should feel free to merge if you get there before I do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71948/testReport)** for PR 16694 at commit [`2980e67`](https://github.com/apache/spark/commit/2980e67d3415df2b810a9df9b96f2a5402c5c490). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/16685 A few comments: 1. The mayor concern is that this solution need to pull in the whole target table data and do a join operation between the source dataframe and the target table to determine potential rows for update and inserts. I am worried that this join operation itself adds a lot of performance overhead for the upsert operation. And during this decision making process, the target table may have been advanced a lot, which makes the decision of inserts/updates worthless. 2. The primary key set provided may not be the exact match of potential unique constraints on the target table, which will lead to failure of inserts or updates, because some columns that are part of unique constraints maybe outside of the provided primary key set. 3. The insert is batch execution of the same # of statements as # of insert rows. Same for updates. We need to pass many statements via JDBC to target database. Will it perform better if column values are set to host variables in prepared statement for batch-size# of rows and executed once per batch? 4. Most of database systems provide UPSERT capability, such as` INSERT ON DUPLICATE KEY UPDATE `from MySQL, `INSERT ON CONFLICT ... DO UPDATE SET` from PostgreSQL, MERGE statement for DB2, oracle, etc., where whether insert or update is decided by the database. Maybe we can take advantage of this by expanding different JDBCDialect? PR https://github.com/apache/spark/pull/16692 actually minimize the issues above. Please take a look to compare. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16685 It sounds like you consider there to be too many errata and assumptions made in this patch for it to be a worthwhile code contribution. Given the numerous assumptions made in this PR, how would you instead feel about converting this as a documentation patch and somehow providing this as example code for users? I'm not sure if there is currently any official documentation around doing UPDATE in Spark so maybe this instead becomes a source of helpful information for others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71947/testReport)** for PR 16650 at commit [`b2bf1f7`](https://github.com/apache/spark/commit/b2bf1f78a3287e79b86074653b45710ddb127d98). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16685 @gatorsmile That makes a lot of sense. Here is a code snippet that relies on the database to do the UPSERT: ``` /** * Generate the SQL statement to perform an upsert (UPDATE OR INSERT) of a given row into a specific table * * @param row The row to insert into the table * @param schema The table schema * @param tableName The table name in the database * @param primaryKeys The unique constraint imposed on the database * @return */ @transient def genUpsertScript(row: Row, schema: StructType, tableName: String, primaryKeys: Set[String]): String = { val primaryKeyString: String = getKeyString(primaryKeys) val schemaString = schema.map(s => s.name).reduce(_ + ", " + _) val valString = row.toSeq.map(v => "'" + v.toString.replaceAll("'", "''") + "'").reduce(_ + "," + _) val withExcluded = { schema.map(_.name) .filterNot(primaryKeys.contains) .map(s => s + " = EXCLUDED." + s) //EXCLUDED is a magic internal Postgres table .reduce(_ + ",\n" + _) } val upsert = { s"INSERT INTO $tableName ($schemaString)\n VALUES ($valString)\n" + s"ON CONFLICT ($primaryKeyString) DO UPDATE\n" + s"SET\n" + withExcluded + ";" } logS("Generated SQL: " + upsert, Level.DEBUG) upsert } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71946/testReport)** for PR 16694 at commit [`98bd7e7`](https://github.com/apache/spark/commit/98bd7e77161e249a028e18ebfe19898a9b8952ac). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71946/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71946/testReport)** for PR 16694 at commit [`98bd7e7`](https://github.com/apache/spark/commit/98bd7e77161e249a028e18ebfe19898a9b8952ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 jenkins retest please test failure from build HiveSparkSubmitSuite `set hive.metastore.warehouse.dir` is unrelated to the change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16685 Currently, I do not have a solution for supporting the parallel mass UPDATE, because the rows in the DataFrame might be out of order and a global transaction is missing. The solution posted in this PR makes many assumptions. If users need to do it, the current suggested workaround it to let Spark SQL insert the results to a table and use a separate RDBMS application to do the update (outside Spark SQL). I fully understand the challenges. I can post a solution which I did in the database replication area. https://www.google.com/patents/US20050193041 Although this patent still has a hole, it generally explains how to do it. In that use case, we can do the parallel update/insert/delete by using the maintained transaction dependencies and retries logics with spill queues. Unfortunately, it is not applicable to Spark SQL. `UPSERT` is pretty useful to Spark SQL users. I prefer to using the capability provided by RDBMS directly, instead of implementing it in Spark SQL. Then, we can avoid fetching/joining the data from the JDBC tables. More importantly, we can ensure each individual UPSERT works correctly even if the target tables are inserting/updating by the other applications at the same time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71945/testReport)** for PR 16694 at commit [`abafaeb`](https://github.com/apache/spark/commit/abafaebdaace472ea643d3d7f1457e58d5b37831). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71945/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16694 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16694 **[Test build #71945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71945/testReport)** for PR 16694 at commit [`abafaeb`](https://github.com/apache/spark/commit/abafaebdaace472ea643d3d7f1457e58d5b37831). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16694 cc @hhbyyh Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16694 [SPARK-19336][ML][Pyspark]: LinearSVC Python API ## What changes were proposed in this pull request? Add Python API for the newly added LinearSVC algorithm. ## How was this patch tested? Add new doc string test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark ser Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16694.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16694 commit 020f6fcd821a15a98201aef5541c0e040a1e1e79 Author: wm...@hotmail.com Date: 2017-01-24T05:51:20Z linearsvm python initial checkin commit f5c9856b3bce096be6c7f39e9869d662c8d5bed2 Author: wm...@hotmail.com Date: 2017-01-24T07:33:16Z check in doc test commit 605c102349ce81fbda229cfdef86dea791024edf Author: wm...@hotmail.com Date: 2017-01-24T07:36:12Z add shared param commit abafaebdaace472ea643d3d7f1457e58d5b37831 Author: wm...@hotmail.com Date: 2017-01-24T19:41:47Z add a negative test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71944/testReport)** for PR 16650 at commit [`bc3d969`](https://github.com/apache/spark/commit/bc3d969a7fe72b6ea54fd187b996f11965048367). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/16650 Working through some test failures in org.apache.spark.deploy.StandaloneDynamicAllocationSuite and org.apache.spark.HeartbeatReceiverSuite... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71940/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71940/testReport)** for PR 16605 at commit [`94902ce`](https://github.com/apache/spark/commit/94902cebbabcec5464f5b1a9bbfba64cb6bba0b9). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71942/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71942/testReport)** for PR 16686 at commit [`74d96fc`](https://github.com/apache/spark/commit/74d96fc9049a0a0fb6de6d011eb896b7d7c32b30). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16685 I recognize that this is not an optimal solution, but Spark has historically contained multiple sub-optimal operations that are nonetheless useful in certain contexts and it's left to the user to understand and use things correctly. A few examples off the top of my head include collectPartitions, zipWithIndex, and repartition - all of which may be expensive operations but are nonetheless useful when used appropriately. I believe there's value in introducing this as a starting point which works in most scenarios and is more efficient than relying on the database to handle the uniqueness constraint and be responsible for a mass update, with the expectation of future improvement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16685 To support the UPSERT, this PR basically implements it by using SELECT, UPDATE and INSERT. It has to read the whole table from the JDBC-connected database, and process it in Spark. It does not perform well when the target table is huge. We are still facing the same issue caused by the UPDATE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16685 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71943/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16685 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16685 **[Test build #71943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71943/testReport)** for PR 16685 at commit [`c6af861`](https://github.com/apache/spark/commit/c6af861b8d1f9a9c72cc6803e417df30148d93ac). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r97625393 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -168,6 +169,21 @@ private[scheduler] class BlacklistTracker ( if (newTotal >= MAX_FAILURES_PER_EXEC && !executorIdToBlacklistStatus.contains(exec)) { logInfo(s"Blacklisting executor id: $exec because it has $newTotal" + s" task failures in successful task sets") +conf.get(config.BLACKLIST_ENABLED) match { --- End diff -- Yes indeed. The only non-test usage of BlacklistTracker.isBlacklistEnabled is in the TaskSchedulerImpl's maybeCreateBlacklistTracker, which uses it as a condition for creating the BlacklistTracker at all. So I agree, we don't need a check here to see if the blacklist enabled, and if we did, isBlacklistEnabled would be a better choice. However, it is still meaningful to see if spark.blacklist.kill is set. I believe this was just a typo: s/BLACKLIST_ENABLED/BLACKLIST_KILL_ENABLED/. This is fixed in my latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16685 @gatorsmile What is a "key" update and in what context would that sort of operation be needed? I don't think a secondary index on the table prevent this method from working, the primary issue is that makes it a more expensive operation. The database still enforces any existing constraints. If the ask is to support a "uniqueness" constraint on multiple columns, that is already supported via ```primaryKeys``` passed to the upsert function(). The update uses the "id" column not as a uniqueness constraint, but as a simple and efficient way to identify a given row to update. A future improvement would be to support using multiple columns to identify the row to update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16685 **[Test build #71943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71943/testReport)** for PR 16685 at commit [`c6af861`](https://github.com/apache/spark/commit/c6af861b8d1f9a9c72cc6803e417df30148d93ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16685 I still have a concern for Update, even if we do an update based on key values. Using this way for update statements, we are still facing non-deterministic results. For example, you are unable to do a key update. For non-key updates, we also face another issue when the target table has the secondary unique index. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16685: [SPARK-19335] Introduce insert, update, and upser...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/16685#discussion_r97623026 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -722,14 +724,246 @@ object JdbcUtils extends Logging { } /** + * Check whether a table exists in a given database + * + * @return True if the table exists. + */ + @transient + def checkTableExists(targetDb: String, tableName: String): Boolean = { +val dbc: Connection = DriverManager.getConnection(targetDb) +val dbm = dbc.getMetaData() +// Check if the table exists. If it exists, perform an upsert. +// Otherwise, do a simple dataframe write to the DB +val tables = dbm.getTables(null, null, tableName, null) +val exists = tables.next() // Returns false if next does not exist +dbc.close() +exists + } + + // Provide a reasonable starting batch size for database operations. + private val DEFAULT_BATCH_SIZE: Int = 200 + + // Limit the number of database connections. Some DBs suffer when there are many open + // connections. + private val DEFAULT_MAX_CONNECTIONS: Int = 50 --- End diff -- Got it, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15768 **[Test build #3550 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3550/consoleFull)** for PR 15768 at commit [`cdeca1c`](https://github.com/apache/spark/commit/cdeca1cdd8ed61274137c3012ba49ff57d459190). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r97622417 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -719,7 +719,11 @@ private[spark] object TaskSchedulerImpl { private def maybeCreateBlacklistTracker(sc: SparkContext): Option[BlacklistTracker] = { if (BlacklistTracker.isBlacklistEnabled(sc.conf)) { - Some(new BlacklistTracker(sc)) + val executorAllocClient: Option[ExecutorAllocationClient] = sc.schedulerBackend match { +case b: ExecutorAllocationClient => Some(b.asInstanceOf[ExecutorAllocationClient]) --- End diff -- ...I didn't realize we got a free type cast with the case match. Neat. Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r97622210 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -429,7 +429,7 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg // We don't directly use the application blacklist, but its presence triggers blacklisting // within the taskset. val mockListenerBus = mock(classOf[LiveListenerBus]) -val blacklistTrackerOpt = Some(new BlacklistTracker(mockListenerBus, conf, clock)) +val blacklistTrackerOpt = Some(new BlacklistTracker(null, conf, None, clock)) --- End diff -- Yes, indeed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16685: [SPARK-19335] Introduce insert, update, and upser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16685#discussion_r97622295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -722,14 +724,246 @@ object JdbcUtils extends Logging { } /** + * Check whether a table exists in a given database + * + * @return True if the table exists. + */ + @transient + def checkTableExists(targetDb: String, tableName: String): Boolean = { +val dbc: Connection = DriverManager.getConnection(targetDb) +val dbm = dbc.getMetaData() +// Check if the table exists. If it exists, perform an upsert. +// Otherwise, do a simple dataframe write to the DB +val tables = dbm.getTables(null, null, tableName, null) +val exists = tables.next() // Returns false if next does not exist +dbc.close() +exists + } + + // Provide a reasonable starting batch size for database operations. + private val DEFAULT_BATCH_SIZE: Int = 200 + + // Limit the number of database connections. Some DBs suffer when there are many open + // connections. + private val DEFAULT_MAX_CONNECTIONS: Int = 50 --- End diff -- Please see the logics [here](https://github.com/ilganeli/spark/blob/56545ed88f665ed57a50a8c5d114c6ae8130eab3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L714-L720). We already can do it for Insert by using `coalesce`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r97622188 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -43,7 +43,7 @@ class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with M clock.setTime(0) listenerBusMock = mock[LiveListenerBus] -blacklist = new BlacklistTracker(listenerBusMock, conf, clock) +blacklist = new BlacklistTracker(null, conf, None, clock) --- End diff -- Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user tcondie commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97620911 --- Diff: external/kafka-0-10-sql/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister --- @@ -1 +1 @@ -org.apache.spark.sql.kafka010.KafkaSourceProvider +org.apache.spark.sql.kafka010.KafkaProvider --- End diff -- That's true, but revised Provider not only provides a Source but also a Relation, hence the decision to rename to something more general. Not clear if this outweighs the risks you've pointed out. @tdas @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71942/testReport)** for PR 16686 at commit [`74d96fc`](https://github.com/apache/spark/commit/74d96fc9049a0a0fb6de6d011eb896b7d7c32b30). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16521: [SPARK-19139][core] New auth mechanism for transp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16521 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16685: [SPARK-19335] Introduce insert, update, and upser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16685#discussion_r97619926 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -722,14 +724,246 @@ object JdbcUtils extends Logging { } /** + * Check whether a table exists in a given database + * + * @return True if the table exists. + */ + @transient + def checkTableExists(targetDb: String, tableName: String): Boolean = { +val dbc: Connection = DriverManager.getConnection(targetDb) +val dbm = dbc.getMetaData() +// Check if the table exists. If it exists, perform an upsert. +// Otherwise, do a simple dataframe write to the DB +val tables = dbm.getTables(null, null, tableName, null) +val exists = tables.next() // Returns false if next does not exist +dbc.close() +exists + } + + // Provide a reasonable starting batch size for database operations. + private val DEFAULT_BATCH_SIZE: Int = 200 + + // Limit the number of database connections. Some DBs suffer when there are many open + // connections. + private val DEFAULT_MAX_CONNECTIONS: Int = 50 --- End diff -- Well, since Spark 2.1, we already provide the parm for limiting the max num of concurrent JDBC connection when inserting data to JDBC tables. The parm is `numPartitions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71941/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16521: [SPARK-19139][core] New auth mechanism for transport lib...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16521 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #71941 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71941/testReport)** for PR 14702 at commit [`d9047f0`](https://github.com/apache/spark/commit/d9047f0d5728075d8b50c64afbef65a5279a1847). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16377 Thanks @jkbradley! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16690 **[Test build #71938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71938/testReport)** for PR 16690 at commit [`ce5216e`](https://github.com/apache/spark/commit/ce5216eb09b58f95c2f9c045c47b33ef5138e963). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16377 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16690 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71938/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka
Github user tcondie commented on the issue: https://github.com/apache/spark/pull/16686 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16377 LGTM Thanks! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15880 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15880: [SPARK-17913][SQL] compare atomic and string type...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15880 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71939/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71939/testReport)** for PR 16605 at commit [`a738158`](https://github.com/apache/spark/commit/a7381587ae4eb22b2a63f1518cd62f82355a8018). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 Oh OK! Thanks @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71936/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71936/testReport)** for PR 16677 at commit [`9d4cadb`](https://github.com/apache/spark/commit/9d4cadb782afcba52b8081402f5dd89cb0a27ae5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: Int) extends Partitioning ` * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode with CodegenSupport ` * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16582 > Why not simply remove old redirect handler like collection.removeHandler ? I find it cleaner to just not do something that to do it then have to undo things when they fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97610826 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging { conf: SparkConf, serverName: String = ""): ServerInfo = { -val collection = new ContextHandlerCollection addFilters(handlers, conf) val gzipHandlers = handlers.map { h => + h.setVirtualHosts(Array("@" + SPARK_CONNECTOR_NAME)) --- End diff -- `ContextHandlerCollection.addHandler` does not call `setVirtualHosts`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97610759 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -337,17 +350,20 @@ private[spark] object JettyUtils extends Logging { // The number of selectors always equals to the number of acceptors minThreads += connector.getAcceptors * 2 } - server.setConnectors(connectors.toArray) pool.setMaxThreads(math.max(pool.getMaxThreads, minThreads)) val errorHandler = new ErrorHandler() errorHandler.setShowStacks(true) errorHandler.setServer(server) server.addBean(errorHandler) + + gzipHandlers.foreach(collection.addHandler) server.setHandler(collection) + + server.setConnectors(connectors.toArray) --- End diff -- Mostly for grouping. "This is where all handlers are added to the server." In any case I have another change (#16625) that kinda moves all this stuff around anyway... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user paragpc commented on the issue: https://github.com/apache/spark/pull/11867 thanks @squito :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16689 yes. but please see my other comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16661 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71937/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16661 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16661 **[Test build #71937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71937/testReport)** for PR 16661 at commit [`5672d13`](https://github.com/apache/spark/commit/5672d1345f661665f521fd1dd4410313ef3ab554). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #71941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71941/testReport)** for PR 14702 at commit [`d9047f0`](https://github.com/apache/spark/commit/d9047f0d5728075d8b50c64afbef65a5279a1847). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16688 cc @cloud-fan @gatorsmile Could you please trigger a test for this. Not sure why the last run didn't succeed. Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71940/testReport)** for PR 16605 at commit [`94902ce`](https://github.com/apache/spark/commit/94902cebbabcec5464f5b1a9bbfba64cb6bba0b9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16387 No the question is whether you tested without @viirya commit `b1ef9ec` (the last one that forces spills of in-memory maps), or just the very last version of the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71939/testReport)** for PR 16605 at commit [`a738158`](https://github.com/apache/spark/commit/a7381587ae4eb22b2a63f1518cd62f82355a8018). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r97590536 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.examples.sql; + +// $example on:typed_custom_aggregation$ +import java.io.Serializable; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.TypedColumn; +import org.apache.spark.sql.expressions.Aggregator; +// $example off:typed_custom_aggregation$ + +public class JavaUserDefinedTypedAggregation { + + // $example on:typed_custom_aggregation$ + public static class Employee implements Serializable { +private String name; +private long salary; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public String getName() { + return name; +} + +public void setName(String name) { + this.name = name; +} + +public long getSalary() { + return salary; +} + +public void setSalary(long salary) { + this.salary = salary; +} +// $example on:typed_custom_aggregation$ + } + + public static class Average implements Serializable { +private long sum; +private long count; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public Average() { +} + +public Average(long sum, long count) { + this.sum = sum; + this.count = count; +} + +public long getSum() { + return sum; +} + +public void setSum(long sum) { + this.sum = sum; +} + +public long getCount() { + return count; +} + +public void setCount(long count) { + this.count = count; +} +// $example on:typed_custom_aggregation$ + } + + public static class MyAverage extends Aggregator { +// A zero value for this aggregation. Should satisfy the property that any b + zero = b +public Average zero() { --- End diff -- My bad, I read this incorrectly while skimming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r97589440 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.examples.sql; + +// $example on:typed_custom_aggregation$ +import java.io.Serializable; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.TypedColumn; +import org.apache.spark.sql.expressions.Aggregator; +// $example off:typed_custom_aggregation$ + +public class JavaUserDefinedTypedAggregation { + + // $example on:typed_custom_aggregation$ + public static class Employee implements Serializable { +private String name; +private long salary; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public String getName() { + return name; +} + +public void setName(String name) { + this.name = name; +} + +public long getSalary() { + return salary; +} + +public void setSalary(long salary) { + this.salary = salary; +} +// $example on:typed_custom_aggregation$ + } + + public static class Average implements Serializable { +private long sum; +private long count; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public Average() { +} + +public Average(long sum, long count) { + this.sum = sum; + this.count = count; +} + +public long getSum() { + return sum; +} + +public void setSum(long sum) { + this.sum = sum; +} + +public long getCount() { + return count; +} + +public void setCount(long count) { + this.count = count; +} +// $example on:typed_custom_aggregation$ + } + + public static class MyAverage extends Aggregator { +// A zero value for this aggregation. Should satisfy the property that any b + zero = b +public Average zero() { --- End diff -- @srowen `Average` is a Java bean that holds current sum and count. It is defined earlier. Here it represents a zero value. `MyAverage`, in turn, is the actual aggregator that accepts instances of the `Employee` class, stores intermediate results using an instance of`Average`, and produces `Double` as a result. I can rename `MyAverage` to `MyAverageAggregator` if this makes things clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16690 cc @vanzin @zsxwing (I was going to make a longer comment about removing `askWithRetry` and whether we need another method, but then saw the comments on the referenced PR -- I'll defer to the experts here) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71935/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16690 **[Test build #71938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71938/testReport)** for PR 16690 at commit [`ce5216e`](https://github.com/apache/spark/commit/ce5216eb09b58f95c2f9c045c47b33ef5138e963). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org