[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...
GitHub user saucam opened a pull request: https://github.com/apache/spark/pull/2561 SPARK-3711: Optimize where in clause filter queries The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list. You can merge this pull request into a Git repository by running: $ git pull https://github.com/saucam/spark branch-1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2561 commit bee98aadcea7cb8fa6402d72af45aef2a4de8c19 Author: Yash Datta yash.da...@guavus.com Date: 2014-09-28T05:54:49Z SPARK-3711: Optimize where in clause filter queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2561#issuecomment-57076163 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57076212 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20929/consoleFull) for PR 2542 at commit [`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-57076438 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull) for PR 1290 at commit [`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57076888 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20928/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57076884 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20928/consoleFull) for PR 2514 at commit [`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]` * `class AvroWrapperToJavaConverter extends Converter[Any, Any] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user epahomov commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57077001 Sorry for such messy pull request, I didn't review my student code close enough. Would try my best next time. We'll fix everything by the middle of the week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57077090 I tested for the time out issue, https://github.com/apache/spark/pull/1689 lead to this issue, but have not found the root cause --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57077291 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57077406 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20931/consoleFull) for PR 2514 at commit [`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2538#issuecomment-57077576 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20932/consoleFull) for PR 2538 at commit [`847f9b9`](https://github.com/apache/spark/commit/847f9b9faba9f9e6af20c9f5e72e68bc9eb52f4d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-57077703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20930/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-57077701 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull) for PR 1290 at commit [`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas ` * `class OutputFrame2D( title: String ) extends Frame( title ) ` * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas ` * `class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title) ` * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]` * `class AvroWrapperToJavaConverter extends Converter[Any, Any] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2560#issuecomment-57078093 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20927/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2560#issuecomment-57078096 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20927/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2547#issuecomment-57078294 Thanks. I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2560#issuecomment-57078276 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/173/consoleFull) for PR 2560 at commit [`9eff95a`](https://github.com/apache/spark/commit/9eff95afe6051b264854b415b5d305dc9e4bf3ef). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57078355 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20929/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57078351 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20929/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57078720 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20931/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57078715 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20931/consoleFull) for PR 2514 at commit [`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]` * `class AvroWrapperToJavaConverter extends Converter[Any, Any] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/2376 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2376#issuecomment-57078975 I temporarily can not reproduce it, and close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-57079061 Hive is really strange dealing with date, there's still some inconsistency between, but I think it is better to update in following PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-57079131 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20933/consoleFull) for PR 2344 at commit [`f4058ab`](https://github.com/apache/spark/commit/f4058ab1a185b3dc3fb5fe1522ef5b481601d873). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57079256 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/174/consoleFull) for PR 2542 at commit [`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2529#issuecomment-57079526 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20934/consoleFull) for PR 2529 at commit [`4c18c29`](https://github.com/apache/spark/commit/4c18c29faedd52ca6a3d925ea039841b860862f7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2538#issuecomment-57079953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20932/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2538#issuecomment-57079950 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20932/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2516#discussion_r18128789 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { } } -object SparkSubmitArguments { - /** Load properties present in the given file. */ - def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), sProperties file $file does not exist) -require(file.isFile(), sProperties file $file is not a normal file) -val inputStream = new FileInputStream(file) +private[spark] object SparkSubmitArguments { + /** + * Resolves Configuration sources in order of highest to lowest + * 1. Each map passed in as additionalConfig from first to last + * 2. Environment variables (including legacy variable mappings) + * 3. System config variables (eg by using -Dspark.var.name) + * 4 SPARK_DEFAULT_CONF/spark-defaults.conf or SPARK_HOME/conf/spark-defaults.conf + * 5. hard coded defaults in class path at spark-submit-defaults.prop + * + * A property file specified by one of the means listed above gets read in and the properties are + * considered to be at the priority of the method that specified the files. + * A property specified in a property file will not override an existing + * config value at that same level + * + * @param additionalConfigs Seq of additional Map[ConfigName-ConfigValue] in order of highest + * priority to lowest this will have priority over internal sources + * @return Map[propName-propFile] containing values merged from all sources in order of priority + */ + def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = { +// Configuration read in from spark-submit-defaults.prop file found on the classpath +var hardCodedDefaultConfig: Option[Map[String,String]] = None +var is: InputStream = null +var isr: Option[InputStreamReader] = None try { - val properties = new Properties() - properties.load(inputStream) - properties.stringPropertyNames().toSeq.map(k = (k, properties(k).trim)) -} catch { - case e: IOException = -val message = sFailed when loading Spark properties file $file -throw new SparkException(message, e) + is = Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults) + + // only open InputStreamReader if InputStream was successfully opened + isr = Option(is).map{is: InputStream = +new InputStreamReader(is, CharEncoding.UTF_8) + } + + hardCodedDefaultConfig = isr.map( defaultValueStream = + SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream)) } finally { - inputStream.close() + Option(is).foreach(_.close) + isr.foreach(_.close) } + +if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size == 0)) { + throw new IllegalStateException(sDefault values not found at classpath $ClassPathSparkSubmitDefaults) +} + +// Configuration read in from defaults file if it exists +var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig + +if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile)) { + SparkSubmitArguments.getPropertyValuesFromFile( + sparkDefaultConfig.get(SparkPropertiesFile).get) +} else { + Map.empty +} + +// Configuration from java system properties +val systemPropertyConfig = SparkSubmitArguments.getPropertyMap(System.getProperties) + +// Configuration variables from the environment +// support legacy variables +val environmentConfig = System.getenv().asScala + +val legacyEnvVars = Seq(MASTER-SparkMaster, DEPLOY_MODE-SparkDeployMode, + SPARK_DRIVER_MEMORY-SparkDriverMemory, SPARK_EXECUTOR_MEMORY-SparkExecutorMemory) + +// legacy variables act at the priority of a system property +val propsWithEnvVars : mutable.Map[String,String] = new mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars + .map( {case(varName, propName) = (environmentConfig.get(varName), propName) }) + .filter( {case(varVariable, _) = varVariable.isDefined !varVariable.get.isEmpty} ) + .map{case(varVariable, propName) = (propName, varVariable.get)} + +val ConfigSources = additionalConfigs ++ Seq ( + environmentConfig, + propsWithEnvVars, + sparkDefaultConfig, + hardCodedDefaultConfig.get +) + +// Load properties file at
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/2524#discussion_r18128796 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -112,6 +112,10 @@ class DAGScheduler( // stray messages to detect. private val failedEpoch = new HashMap[String, Long] + // stageId = (SplitId - (accumulatorId, accumulatorValue)) + private[scheduler] val stageIdToAccumulators = new HashMap[Int, --- End diff -- This may cause a memory leak? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2560#issuecomment-57080788 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/173/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2516#discussion_r18128856 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { } } -object SparkSubmitArguments { - /** Load properties present in the given file. */ - def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), sProperties file $file does not exist) -require(file.isFile(), sProperties file $file is not a normal file) -val inputStream = new FileInputStream(file) +private[spark] object SparkSubmitArguments { + /** + * Resolves Configuration sources in order of highest to lowest + * 1. Each map passed in as additionalConfig from first to last + * 2. Environment variables (including legacy variable mappings) + * 3. System config variables (eg by using -Dspark.var.name) + * 4 SPARK_DEFAULT_CONF/spark-defaults.conf or SPARK_HOME/conf/spark-defaults.conf + * 5. hard coded defaults in class path at spark-submit-defaults.prop + * + * A property file specified by one of the means listed above gets read in and the properties are + * considered to be at the priority of the method that specified the files. + * A property specified in a property file will not override an existing + * config value at that same level + * + * @param additionalConfigs Seq of additional Map[ConfigName-ConfigValue] in order of highest + * priority to lowest this will have priority over internal sources + * @return Map[propName-propFile] containing values merged from all sources in order of priority + */ + def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = { +// Configuration read in from spark-submit-defaults.prop file found on the classpath +var hardCodedDefaultConfig: Option[Map[String,String]] = None +var is: InputStream = null +var isr: Option[InputStreamReader] = None try { - val properties = new Properties() - properties.load(inputStream) - properties.stringPropertyNames().toSeq.map(k = (k, properties(k).trim)) -} catch { - case e: IOException = -val message = sFailed when loading Spark properties file $file -throw new SparkException(message, e) + is = Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults) + + // only open InputStreamReader if InputStream was successfully opened + isr = Option(is).map{is: InputStream = +new InputStreamReader(is, CharEncoding.UTF_8) + } + + hardCodedDefaultConfig = isr.map( defaultValueStream = + SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream)) } finally { - inputStream.close() + Option(is).foreach(_.close) + isr.foreach(_.close) } + +if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size == 0)) { + throw new IllegalStateException(sDefault values not found at classpath $ClassPathSparkSubmitDefaults) +} + +// Configuration read in from defaults file if it exists +var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig + +if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile)) { + SparkSubmitArguments.getPropertyValuesFromFile( + sparkDefaultConfig.get(SparkPropertiesFile).get) +} else { + Map.empty +} + +// Configuration from java system properties +val systemPropertyConfig = SparkSubmitArguments.getPropertyMap(System.getProperties) + +// Configuration variables from the environment +// support legacy variables +val environmentConfig = System.getenv().asScala + +val legacyEnvVars = Seq(MASTER-SparkMaster, DEPLOY_MODE-SparkDeployMode, + SPARK_DRIVER_MEMORY-SparkDriverMemory, SPARK_EXECUTOR_MEMORY-SparkExecutorMemory) + +// legacy variables act at the priority of a system property +val propsWithEnvVars : mutable.Map[String,String] = new mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars + .map( {case(varName, propName) = (environmentConfig.get(varName), propName) }) + .filter( {case(varVariable, _) = varVariable.isDefined !varVariable.get.isEmpty} ) + .map{case(varVariable, propName) = (propName, varVariable.get)} + +val ConfigSources = additionalConfigs ++ Seq ( + environmentConfig, + propsWithEnvVars, + sparkDefaultConfig, + hardCodedDefaultConfig.get +) + +// Load properties file at
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2516#discussion_r18128941 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { } } -object SparkSubmitArguments { - /** Load properties present in the given file. */ - def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), sProperties file $file does not exist) -require(file.isFile(), sProperties file $file is not a normal file) -val inputStream = new FileInputStream(file) +private[spark] object SparkSubmitArguments { + /** + * Resolves Configuration sources in order of highest to lowest + * 1. Each map passed in as additionalConfig from first to last + * 2. Environment variables (including legacy variable mappings) + * 3. System config variables (eg by using -Dspark.var.name) + * 4 SPARK_DEFAULT_CONF/spark-defaults.conf or SPARK_HOME/conf/spark-defaults.conf + * 5. hard coded defaults in class path at spark-submit-defaults.prop + * + * A property file specified by one of the means listed above gets read in and the properties are + * considered to be at the priority of the method that specified the files. + * A property specified in a property file will not override an existing + * config value at that same level + * + * @param additionalConfigs Seq of additional Map[ConfigName-ConfigValue] in order of highest + * priority to lowest this will have priority over internal sources + * @return Map[propName-propFile] containing values merged from all sources in order of priority + */ + def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = { +// Configuration read in from spark-submit-defaults.prop file found on the classpath +var hardCodedDefaultConfig: Option[Map[String,String]] = None +var is: InputStream = null +var isr: Option[InputStreamReader] = None try { - val properties = new Properties() - properties.load(inputStream) - properties.stringPropertyNames().toSeq.map(k = (k, properties(k).trim)) -} catch { - case e: IOException = -val message = sFailed when loading Spark properties file $file -throw new SparkException(message, e) + is = Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults) + + // only open InputStreamReader if InputStream was successfully opened + isr = Option(is).map{is: InputStream = +new InputStreamReader(is, CharEncoding.UTF_8) + } + + hardCodedDefaultConfig = isr.map( defaultValueStream = + SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream)) } finally { - inputStream.close() + Option(is).foreach(_.close) + isr.foreach(_.close) } + +if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size == 0)) { + throw new IllegalStateException(sDefault values not found at classpath $ClassPathSparkSubmitDefaults) +} + +// Configuration read in from defaults file if it exists +var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig + +if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile)) { + SparkSubmitArguments.getPropertyValuesFromFile( + sparkDefaultConfig.get(SparkPropertiesFile).get) +} else { + Map.empty +} + +// Configuration from java system properties +val systemPropertyConfig = SparkSubmitArguments.getPropertyMap(System.getProperties) + +// Configuration variables from the environment +// support legacy variables +val environmentConfig = System.getenv().asScala + +val legacyEnvVars = Seq(MASTER-SparkMaster, DEPLOY_MODE-SparkDeployMode, + SPARK_DRIVER_MEMORY-SparkDriverMemory, SPARK_EXECUTOR_MEMORY-SparkExecutorMemory) + +// legacy variables act at the priority of a system property --- End diff -- Ok my way was even getting me confused. Lets use your suggested code and treat legacy env variables at the same priority as normal environment variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-57081729 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20933/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-57081730 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20933/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57081837 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/174/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2562 [SPARK-3712][STREAMING]: add a new UpdateDStream to update a rdd dynamically Maybe, we can achieve the aim by using forEachRdd function. But it is weird in this way, because I need to pass a closure, like this: val baseRdd = ... var updatedRDD = ... val inputStream = ... val func = (rdd: RDD[T], t: Time) = { updatedRDD = baseRDD.op(rdd) } inputStream.foreachRDD(func _) In my PR, we can update a rdd like: val updateStream = inputStream.updateRDD(baseRDD, func).asInstanceOf[U, V, T] and obtain the updatedRDD like this: val updatedRDD = updateStream.getUpdatedRDD You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master-clean-14928 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2562.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2562 commit 265c941fe1b7cd164eef11c58f622a0c434a229b Author: uncleGen husty...@gmail.com Date: 2014-09-28T07:48:20Z [STREAMING]: add a new UpdateDStream to update a rdd dynamically commit b5cdb62410c3461115e76a9549f160460b63b8fb Author: uncleGen husty...@gmail.com Date: 2014-09-28T10:37:40Z fix test commit 41d9a952d39f8bc64a38312856ab57e304a59382 Author: uncleGen husty...@gmail.com Date: 2014-09-28T10:40:37Z clerical error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57082094 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2529#issuecomment-57082151 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20934/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57082146 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20935/consoleFull) for PR 2562 at commit [`41d9a95`](https://github.com/apache/spark/commit/41d9a952d39f8bc64a38312856ab57e304a59382). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2529#issuecomment-57082150 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20934/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57082251 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20936/consoleFull) for PR 2559 at commit [`199a85d`](https://github.com/apache/spark/commit/199a85d2e7ef482f3c0ac9cacc4dbeb2a21d5901). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57082932 @liancheng I tried `export` and it worked. Thanks for the suggestion. Also modified permission of `stop-thriftserver.sh`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57083074 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20937/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57083228 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20935/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57083227 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20935/consoleFull) for PR 2562 at commit [`41d9a95`](https://github.com/apache/spark/commit/41d9a952d39f8bc64a38312856ab57e304a59382). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UpdateDStream[U: ClassTag, T: ClassTag, V: ClassTag](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57083376 Thanks for your options. @vanzin @andrewor14 .i have changed code according your options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57083416 Test failure appears to be unrelated to my patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57083428 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/175/consoleFull) for PR 2542 at commit [`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2563 [SPARK-3713][SQL] Uses JSON to serialize DataType objects This PR uses JSON instead of `toString` to serialize `DataType`s. The latter is not only hard to parse but also flaky in many cases. Since we already write schema information to Parquet metadata in the old style, we have to reserve the old `DataType` parser and ensure downward compatibility. The old parser is now renamed to `CaseClassStringParser` and moved into `object DataType`. @JoshRosen @davis Please help review PySpark related changes, thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark datatype-to-json Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2563.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2563 commit dca9153d213a9a9603d7b327d78750af66021ed2 Author: Cheng Lian lian.cs@gmail.com Date: 2014-09-25T09:28:06Z De/serializes DataType objects from/to JSON commit 5f792df158128f6bf41a49e816a915150698a9d2 Author: Cheng Lian lian.cs@gmail.com Date: 2014-09-28T11:19:34Z Adds PySpark support commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1 Author: Cheng Lian lian.cs@gmail.com Date: 2014-09-28T11:54:26Z Adds compatibility est case for Parquet type conversion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57084958 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20936/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57084960 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20936/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57084987 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20938/consoleFull) for PR 2563 at commit [`26c6563`](https://github.com/apache/spark/commit/26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57084988 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20938/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2291#issuecomment-57085068 PR #2563 supersedes this one. Closing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/2291 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/2562#discussion_r18129623 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -602,6 +602,18 @@ abstract class DStream[T: ClassTag] ( } /** + * Return a new UpdateDStream in which each RDD is used to update the original rdd by + * applying a function on each RDD of 'this' DStream. + */ + def updateRDD[U: ClassTag, V: ClassTag]( + rdd: RDD[V], + updateFunc: (Option[RDD[T]], RDD[V]) = RDD[U] +): DStream[T] = { +val cleanF = ssc.sparkContext.clean(updateFunc) +new UpdateDStream[U, T, V](this, cleanF, rdd).register() --- End diff -- Hi @uncleGen , I'm not sure why do you need to register this DStream? looks like your `updateRDD` operator is a transformation DStream, not a action DStream, I don't think you need to call register. It's only for output DStream like `ForEachDStream`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57085435 I think this can be done by `foreachRDD` or `transform` as you said, I'm not sure what is your purpose to do so? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57085586 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/176/consoleFull) for PR 2509 at commit [`5dcaab2`](https://github.com/apache/spark/commit/5dcaab2d4ef6c279872aa65e62c1be5456858c6c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57085803 After updating my local repo, I found that `stop-thriftserver.sh` is still not executable. Make sure to `git add` this file after `chmod +x`. This is the only pending issue from my perspective. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57086391 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/175/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57087294 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull) for PR 2563 at commit [`03da3ec`](https://github.com/apache/spark/commit/03da3ec870940bd6ff56e03450993da6125b40a4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2562#issuecomment-57087576 @jerryshao Thanks for your comments! I want to abstract an independent DStream to achieve the aim. I feel it is weird to update a rdd by passing a closure. Maybe, this patch is not very appropriate, I will close it first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/2562 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57089022 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/176/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57089524 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20940/consoleFull) for PR 2432 at commit [`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/2564 SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting issues along the way I believe. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-2548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2564.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2564 commit 179b3c2ca892a8db237ff714147aabf54d7d2b3a Author: Sean Owen so...@cloudera.com Date: 2014-09-28T16:16:03Z Re-port RecoverableNetworkWordCount to Java example, and touch up doc / formatting in related examples --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57090626 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20941/consoleFull) for PR 2564 at commit [`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57090930 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57090932 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20939/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57091751 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20940/consoleFull) for PR 2432 at commit [`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class KillExecutor(` * ` case class MasterChangeAcknowledged(appId: ApplicationId)` * ` case class RegisteredApplication(appId: ApplicationId, masterUrl: String) extends DeployMessage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57091756 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20940/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57092226 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57092314 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20942/consoleFull) for PR 2432 at commit [`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57092386 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20941/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57092384 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20941/consoleFull) for PR 2564 at commit [`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class JavaRecoverableNetworkWordCount ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-57094501 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20943/consoleFull) for PR 2520 at commit [`b9e0bfb`](https://github.com/apache/spark/commit/b9e0bfb693fed6b5befbc40cadb883617670e389). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57094648 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20942/consoleFull) for PR 2432 at commit [`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class KillExecutor(` * ` case class MasterChangeAcknowledged(appId: ApplicationId)` * ` case class RegisteredApplication(appId: ApplicationId, masterUrl: String) extends DeployMessage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57094652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20942/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57095584 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57095758 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20944/consoleFull) for PR 2564 at commit [`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-57095860 @manishamde Fixed the typo. I believe I have addressed everything, so please let me know if it looks good. Thank you for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-57095923 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20945/consoleFull) for PR 2435 at commit [`c694174`](https://github.com/apache/spark/commit/c6941748b58f5b77a480cfbc85cdece9ce8dec5a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-57096850 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20943/consoleFull) for PR 2520 at commit [`b9e0bfb`](https://github.com/apache/spark/commit/b9e0bfb693fed6b5befbc40cadb883617670e389). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-57096853 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20943/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SQL] Diagnose test timeouts
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/2565 [WIP][SQL] Diagnose test timeouts You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark testTimeOut Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2565 commit a72b75161dd4ea5ee71ce59c09cf79ac717816a9 Author: Michael Armbrust mich...@databricks.com Date: 2014-09-28T19:33:46Z Force test run --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SQL] Diagnose test timeouts
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2565#issuecomment-57097221 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20946/consoleFull) for PR 2565 at commit [`a72b751`](https://github.com/apache/spark/commit/a72b75161dd4ea5ee71ce59c09cf79ac717816a9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2509#issuecomment-57097696 eh...I cloned the repository on another laptop and found it's executable, as shown in top-left corner of https://github.com/WangTaoTheTonic/spark/blob/thriftserver/sbin/stop-thriftserver.sh. Could you verify this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-57097830 It looks like this most recent test failure is due to Mima: ``` [info] spark-core: found 1 potential binary incompatibilities (filtered 223) [error] * synthetic method org$apache$spark$SparkContext$$createTaskScheduler(org.apache.spark.SparkContext,java.lang.String)org.apache.spark.scheduler.TaskScheduler in object org.apache.spark.SparkContext does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.SparkContext.org$apache$spark$SparkContext$$createTaskScheduler) ``` Here's the v1.1.0 definition of `SparkContext.createTaskScheduler` that MiMa is comparing against: https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L1478 This MiMa error is confusing because this is a `private` method that's only called from SparkContext. It's also called from `SparkContextSchedulerCreationSuite` using reflection. Unless anyone has an objection, I think we should add the ```scala ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.SparkContext.org$apache$spark$SparkContext$$createTaskScheduler) ``` exclusion, since this method is `private`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2432#discussion_r18132263 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -334,6 +352,8 @@ class SparkContext(config: SparkConf) extends Logging { localProperties.set(props) } + def getApplicationId = appId --- End diff -- I guess the return type of this method is `ApplicationId`, but that class is `private[spark]`, so this method is leaking a private type to users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57098005 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20944/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2564#issuecomment-57098002 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20944/consoleFull) for PR 2564 at commit [`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class JavaRecoverableNetworkWordCount ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-57098176 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20945/consoleFull) for PR 2435 at commit [`c694174`](https://github.com/apache/spark/commit/c6941748b58f5b77a480cfbc85cdece9ce8dec5a). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RandomForestModel(val trees: Array[DecisionTreeModel], val algo: Algo) extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-57098177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20945/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user giwa commented on a diff in the pull request: https://github.com/apache/spark/pull/2538#discussion_r18132327 --- Diff: python/pyspark/streaming/dstream.py --- @@ -0,0 +1,632 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from itertools import chain, ifilter, imap +import operator +import time +from datetime import datetime + +from pyspark import RDD +from pyspark.storagelevel import StorageLevel +from pyspark.streaming.util import rddToFileName, RDDFunction +from pyspark.rdd import portable_hash +from pyspark.resultiterable import ResultIterable + +__all__ = [DStream] + + +class DStream(object): +def __init__(self, jdstream, ssc, jrdd_deserializer): +self._jdstream = jdstream +self._ssc = ssc +self.ctx = ssc._sc +self._jrdd_deserializer = jrdd_deserializer +self.is_cached = False +self.is_checkpointed = False + +def context(self): + +Return the StreamingContext associated with this DStream + +return self._ssc + +def count(self): + +Return a new DStream which contains the number of elements in this DStream. + +return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() + +def sum(self): + +Add up the elements in this DStream. + +return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) + +def filter(self, f): + +Return a new DStream containing only the elements that satisfy predicate. + +def func(iterator): +return ifilter(f, iterator) +return self.mapPartitions(func, True) + +def flatMap(self, f, preservesPartitioning=False): + +Pass each value in the key-value pair DStream through flatMap function +without changing the keys: this also retains the original RDD's partition. + +def func(s, iterator): +return chain.from_iterable(imap(f, iterator)) +return self.mapPartitionsWithIndex(func, preservesPartitioning) + +def map(self, f, preservesPartitioning=False): + +Return a new DStream by applying a function to each element of DStream. + +def func(iterator): +return imap(f, iterator) +return self.mapPartitions(func, preservesPartitioning) + +def mapPartitions(self, f, preservesPartitioning=False): + +Return a new DStream by applying a function to each partition of this DStream. + +def func(s, iterator): +return f(iterator) +return self.mapPartitionsWithIndex(func, preservesPartitioning) + +def mapPartitionsWithIndex(self, f, preservesPartitioning=False): + +Return a new DStream by applying a function to each partition of this DStream, +while tracking the index of the original partition. + +return self.transform(lambda rdd: rdd.mapPartitionsWithIndex(f, preservesPartitioning)) + +def reduce(self, func): + +Return a new DStream by reduceing the elements of this RDD using the specified +commutative and associative binary operator. + +return self.map(lambda x: (None, x)).reduceByKey(func, 1).map(lambda x: x[1]) + +def reduceByKey(self, func, numPartitions=None): + +Merge the value for each key using an associative reduce function. + +This will also perform the merging locally on each mapper before +sending results to reducer, similarly to a combiner in MapReduce. + +Output will be hash-partitioned with C{numPartitions} partitions, or +the default parallelism level if C{numPartitions} is not specified. + +return
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user giwa commented on a diff in the pull request: https://github.com/apache/spark/pull/2538#discussion_r18132355 --- Diff: examples/src/main/python/streaming/wordcount.py --- @@ -0,0 +1,21 @@ +import sys + +from pyspark import SparkContext +from pyspark.streaming import StreamingContext + +if __name__ == __main__: +if len(sys.argv) != 2: +print sys.stderr, Usage: wordcount directory +exit(-1) + +sc = SparkContext(appName=PythonStreamingWordCount) +ssc = StreamingContext(sc, 1) + +lines = ssc.textFileStream(sys.argv[1]) +counts = lines.flatMap(lambda line: line.split( ))\ + .map(lambda x: (x, 1))\ + .reduceByKey(lambda a, b: a+b) +counts.pyprint() --- End diff -- counts.pyprint() should be counts.pprint() ``` def pprint(self): Print the first ten elements of each RDD generated in this DStream. This is an output operator, so this DStream will be registered as an output stream and there materialized. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming
Github user giwa commented on a diff in the pull request: https://github.com/apache/spark/pull/2538#discussion_r18132354 --- Diff: examples/src/main/python/streaming/network_wordcount.py --- @@ -0,0 +1,20 @@ +import sys + +from pyspark import SparkContext +from pyspark.streaming import StreamingContext + +if __name__ == __main__: +if len(sys.argv) != 3: +print sys.stderr, Usage: wordcount hostname port +exit(-1) +sc = SparkContext(appName=PythonStreamingNetworkWordCount) +ssc = StreamingContext(sc, 1) + +lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2])) +counts = lines.flatMap(lambda line: line.split( ))\ + .map(lambda word: (word, 1))\ + .reduceByKey(lambda a, b: a+b) +counts.pyprint() --- End diff -- counts.pyprint() should be counts.pprint() ``` def pprint(self): Print the first ten elements of each RDD generated in this DStream. This is an output operator, so this DStream will be registered as an output stream and there materialized. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL]fix spark sql hive tests time out issue
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2566 [SQL]fix spark sql hive tests time out issue When set ```TestHive.cacheTables = true''' the ```correlationoptimizer14``` will be pending which lead to time out, maybe there are some issue bebind this. This PR is a quick fix for the hive test issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2566 commit f4897581dcd2878f7af71110305d7d0ef8e3d7e3 Author: wangfei wangf...@huawei.com Date: 2014-09-28T20:13:50Z fix spark sql hive test time out issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org