[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91026585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) { + sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") + sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") + withTempView("inputTable") { +val testData = spark.sparkContext.parallelize( + (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF() +testData.createOrReplaceTempView("inputTable") +// Distribute all rows to one partition (all rows have the same content), --- End diff -- @cloud-fan Thanks for the review. Because all rows only contains IntegerCaseClass(1), RepartitionByExpression will assign all rows to one partition, which has 10 records. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91026433 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) { + sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") + sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") + withTempView("inputTable") { +val testData = spark.sparkContext.parallelize( + (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF() +testData.createOrReplaceTempView("inputTable") +// Distribute all rows to one partition (all rows have the same content), +// and expected Max(s) is 10 as statefulUDF returns the sequence number starting from 1. +checkAnswer( + sql( +""" +|SELECT MAX(s) FROM +| (SELECT statefulUDF() as s FROM +|(SELECT i from inputTable DISTRIBUTE by i) a +|) b + """.stripMargin), + Row(10)) + +// Expected Max(s) is 5, as there are 2 partitions with 5 rows each, and statefulUDF +// returns the sequence number of the rows in the partition starting from 1. +checkAnswer( + sql( +""" + |SELECT MAX(s) FROM + | (SELECT statefulUDF() as s FROM + |(SELECT i from inputTable) a + |) b +""".stripMargin), + Row(5)) + +// Expected Max(s) is 1, as stateless UDF is deterministic and replaced by constant 1. --- End diff -- StatelessUDF is foldable: override def foldable: Boolean = isUDFDeterministic && children.forall(_.foldable) ConstantFolding optimizer will replace it with constant: case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType) Here is the explain(true): == Parsed Logical Plan == 'Project [unresolvedalias('MAX('s), None)] +- 'SubqueryAlias b +- 'Project ['statelessUDF() AS s#39] +- 'SubqueryAlias a +- 'RepartitionByExpression ['i] +- 'Project ['i] +- 'UnresolvedRelation `inputTable` == Analyzed Logical Plan == max(s): bigint Aggregate [max(s#39L) AS max(s)#46L] +- SubqueryAlias b +- Project [HiveSimpleUDF#org.apache.spark.sql.hive.execution.StatelessUDF() AS s#39L] +- SubqueryAlias a +- RepartitionByExpression [i#4] +- Project [i#4] +- SubqueryAlias inputtable +- SerializeFromObject [assertnotnull(assertnotnull(input[0, org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product input object), - root class: "org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4] +- ExternalRDD [obj#3] == Optimized Logical Plan == Aggregate [max(s#39L) AS max(s)#46L] +- Project [1 AS s#39L] +- RepartitionByExpression [i#4] +- SerializeFromObject [assertnotnull(assertnotnull(input[0, org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product input object), - root class: "org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4] +- ExternalRDD [obj#3] == Physical Plan == *HashAggregate(keys=[], functions=[max(s#39L)], output=[max(s)#46L]) +- Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_max(s#39L)], output=[max#48L]) +- *Project [1 AS s#39L] +- Exchange hashpartitioning(i#4, 5) +- *SerializeFromObject [assertnotnull(assertnotnull(input[0, org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product input object), - root class: "org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4] +- Scan ExternalRDDScan[obj#3] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16161 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16103: [SPARK-18374][ML]Incorrect words in StopWords/english.tx...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16103 Thanks for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16167 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69711/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16167 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16167 **[Test build #69711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69711/consoleFull)** for PR 16167 at commit [`41066dd`](https://github.com/apache/spark/commit/41066ddcf2863872af06320bd4d871b90a4fc3ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15994 ok ,thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91021502 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -479,7 +479,12 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine numInstances: Double, weightSum: Double): Double = { -2.0 * predictions.map { case (y: Double, mu: Double, weight: Double) => -weight * dist.Binomial(1, mu).logProbabilityOf(math.round(y).toInt) +val wt = math.round(weight).toInt +if (wt == 0) { + 0.0 +} else { + dist.Binomial(wt, mu).logProbabilityOf(math.round(y * weight).toInt) --- End diff -- So I think the real issue here is that we don't currently allow users to specify a binomial GLM using success/outcome pairs. One way to mash that kind of grouped data into the format Spark requires is using the process described above by @actuaryzhang, but then we need to adjust the log-likelihood computation as was also noted. So @srowen is correct in saying that this is inaccurate for non-integer weights. I checked with R's glmnet, and it seems that they obey the semantics of data weights for a binomial GLM corresponding to the number of successes. So they log a warning when you input data weights of non-integer values, then proceed with the method proposed in this patch. So, this actually _does_ match R's behavior and I am in favor of the change. But we need to log appropriate warnings and write good unit tests. What are others' thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91020060 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) { + sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") + sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") + withTempView("inputTable") { +val testData = spark.sparkContext.parallelize( + (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF() +testData.createOrReplaceTempView("inputTable") +// Distribute all rows to one partition (all rows have the same content), --- End diff -- Why `DISTRIBUTE BY` can distribute all rows to one partition? It's implemented by `RepartitionByExpression` which doesn't always use one partition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16166 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69710/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16166 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16166 **[Test build #69710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69710/consoleFull)** for PR 16166 at commit [`095184d`](https://github.com/apache/spark/commit/095184da2f6d65ecde9970a4296db2d08dd9f797). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16068 **[Test build #69716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69716/consoleFull)** for PR 16068 at commit [`87f134c`](https://github.com/apache/spark/commit/87f134c5b5885c18513d38c30ab0cf553226d822). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16068 **[Test build #69715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69715/consoleFull)** for PR 16068 at commit [`78e9b38`](https://github.com/apache/spark/commit/78e9b38454cea5059306e2e26ef3c7d77b19c81e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16128 **[Test build #69714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69714/consoleFull)** for PR 16128 at commit [`26a86d6`](https://github.com/apache/spark/commit/26a86d64f2f492094960b19332cabd7457f95e61). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16129 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69709/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @srowen Done. Thanks for the suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16137: [SPARK-18708][CORE] Improvement/improve docs in spark co...
Github user Mironor commented on the issue: https://github.com/apache/spark/pull/16137 @srowen I reversed obvious comments as well as some minor changes (such as capitalizing). I only left javadoc for some of the non-trivial public api. I can reverse changes for comments where the only diff is the wrapping of references in backquotes. I'd like to know if you have some idea on whether to use indentation (as in [javadoc](http://www.oracle.com/technetwork/articles/java/index-137868.html)/[scaladoc](http://docs.scala-lang.org/style/scaladoc.html)) as well as what type of character to use when linking a reference (backquote or brackets?), I could modify Spark code style documentation if if's different from javadoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #69709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69709/consoleFull)** for PR 16129 at commit [`b4a197a`](https://github.com/apache/spark/commit/b4a197ac09e19693f6dc0ce9d50c32ce5064786f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16128 **[Test build #3468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3468/consoleFull)** for PR 16128 at commit [`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user Mironor commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91015596 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends Logging { } /** - * Get an RDD for a Hadoop SequenceFile with given key and value types. + * Get an RDD for a Hadoop `SequenceFile` with given key and value types. * - * @note Because Hadoop's RecordReader class re-uses the same Writable object for each - * record, directly caching the returned RDD or directly passing it to an aggregation or shuffle - * operation will create many references to the same object. - * If you plan to directly cache, sort, or aggregate Hadoop writable objects, you should first - * copy them using a `map` function. + * @note because Hadoop's `RecordReader` class re-uses the same `Writable` object for each --- End diff -- Correct, but [they](http://www.oracle.com/technetwork/articles/java/index-137868.html) also contain continuation indentation (they even align parameter description) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91015460 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends Logging { } /** - * Get an RDD for a Hadoop SequenceFile with given key and value types. + * Get an RDD for a Hadoop `SequenceFile` with given key and value types. * - * @note Because Hadoop's RecordReader class re-uses the same Writable object for each - * record, directly caching the returned RDD or directly passing it to an aggregation or shuffle - * operation will create many references to the same object. - * If you plan to directly cache, sort, or aggregate Hadoop writable objects, you should first - * copy them using a `map` function. + * @note because Hadoop's `RecordReader` class re-uses the same `Writable` object for each --- End diff -- I understand they are used in a mixed way and I see the example of multiple lines with `@return` in the scaladoc. I am fine with this but I just wanted to note my worry here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69713/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #69713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69713/consoleFull)** for PR 16138 at commit [`8837bdb`](https://github.com/apache/spark/commit/8837bdb176963be6da02c2b0e91c5673cd3fa1b2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ToTimestamp(left: Expression, right: Expression, child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16000: [SPARK-18537][Web UI]Add a REST api to spark stre...
Github user ChorPangChan commented on a diff in the pull request: https://github.com/apache/spark/pull/16000#discussion_r91014959 --- Diff: streaming/src/main/java/org/apache/spark/streaming/status/api/v1/BatchStatus.java --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.status.api.v1; --- End diff -- allright, I understand the problem now. in order to merge with another plan(SPARK-18085) in the furture streaming api may need to support history in the furture, and thus need to use /api/v1/applications/:id/:attempt/streaming as endpoint in order to use /api/v1/applications/:id/:attempt/streaming as end point someone will need to implement a hooking mechanism to "mount" the streaming api to the applications resource am I correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16163 @srowen Thanks. I sent this one because the `consoleFull` page frequently freezes my browser these days, not mentioning viewing Jenkins build results via mobile phone... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #69713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69713/consoleFull)** for PR 16138 at commit [`8837bdb`](https://github.com/apache/spark/commit/8837bdb176963be6da02c2b0e91c5673cd3fa1b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16014 **[Test build #69712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69712/consoleFull)** for PR 16014 at commit [`6ef26fe`](https://github.com/apache/spark/commit/6ef26fe3134880924fad03f39b4d6faa84aa05e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...
Github user seyfe commented on the issue: https://github.com/apache/spark/pull/16165 Hi @srowen, Thanks for the quick feedback. Let me get rid of the on/off knob for inprogress files. Would you like me to remove the maxAge setting for inprogress files as well? I initially worried about long running jobs (streaming?) but I think even in that case the files will get updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user Mironor commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91013560 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends Logging { } /** - * Get an RDD for a Hadoop SequenceFile with given key and value types. + * Get an RDD for a Hadoop `SequenceFile` with given key and value types. * - * @note Because Hadoop's RecordReader class re-uses the same Writable object for each - * record, directly caching the returned RDD or directly passing it to an aggregation or shuffle - * operation will create many references to the same object. - * If you plan to directly cache, sort, or aggregate Hadoop writable objects, you should first - * copy them using a `map` function. + * @note because Hadoop's `RecordReader` class re-uses the same `Writable` object for each --- End diff -- [Spark style guide](http://spark.apache.org/contributing.html) doesn't contain anything about continuation indentation and refers to the [Scala's own style guide](http://docs.scala-lang.org/style/scaladoc.html) which shows that indentation should be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16167 **[Test build #69711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69711/consoleFull)** for PR 16167 at commit [`41066dd`](https://github.com/apache/spark/commit/41066ddcf2863872af06320bd4d871b90a4fc3ad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16128 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16167: [DO NOT MERGE]Remove workaround for Netty memory ...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16167 [DO NOT MERGE]Remove workaround for Netty memory leak ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark remove-netty-workaround Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16167 commit 41066ddcf2863872af06320bd4d871b90a4fc3ad Author: Shixiong ZhuDate: 2016-12-06T05:04:07Z Remove workaround for Netty memory leak --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16165 I don't think it makes sense to expose yet another set of settings for this. I think the risk of course is that this accidentally cleans up another instance's work in progress. However if it's quite old, I'd think it's as safe to clean up an in-progress file as any other? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16166 **[Test build #69710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69710/consoleFull)** for PR 16166 at commit [`095184d`](https://github.com/apache/spark/commit/095184da2f6d65ecde9970a4296db2d08dd9f797). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16166: [SPARK-18734][SS] Represent timestamp in Streamin...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/16166 [SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as formatted string instead of millis ## What changes were proposed in this pull request? Easier to read while debugging as a formatted string (in ISO8601 format) than in millis ## How was this patch tested? Updated unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-18734 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16166.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16166 commit 095184da2f6d65ecde9970a4296db2d08dd9f797 Author: Tathagata DasDate: 2016-12-06T04:59:37Z Changed to string --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config o...
GitHub user seyfe opened a pull request: https://github.com/apache/spark/pull/16165 [SPARK-18733] [WEBUI] HistoryServer: Add config option to cleanup in-progress files ## What changes were proposed in this pull request? Add 2 new config parameters 1) spark.history.fs.cleaner.inProgress.files: Default value will be false so no behavior change for anyone. 2) spark.history.fs.cleaner.inProgress.maxAge: Have a way to specify age of inprogress files Default value is 28days. ## How was this patch tested? Added new unittests and via existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/seyfe/spark clear_old_inprogress_files Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16165 commit 90b790bffbf3b90e6cf8abcddecb323e906f1c18 Author: Ergin SeyfeDate: 2016-12-06T01:10:47Z History Server clean old inprogress files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16165 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16148#discussion_r91011837 --- Diff: examples/src/main/r/ml/lda.R --- @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# To run this example use +# ./bin/spark-submit examples/src/main/r/ml/lda.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-ML-lda-example") + +# $example on$ +# Load training data +df <- read.df("data/mllib/sample_lda_libsvm_data.txt", source = "libsvm") +training <- df +test <- df + +# Fit a latent dirichlet allocation model with spark.lda +model <- spark.lda(training, k=10, maxIter=10) --- End diff -- nit: please put space, ie. `k = 10, maxIter = 10` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16148 this is great, thanks! btw, how are these examples getting run? is there a way to know if the examples are broken because of API changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16148#discussion_r91011772 --- Diff: examples/src/main/r/ml/randomForest.R --- @@ -0,0 +1,63 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# To run this example use +# ./bin/spark-submit examples/src/main/r/ml/randomForest.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-ML-randomForest-example") + +# Random forest classification model + +# $example on:classification$ +# Load training data +df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm") +training <- df +test <- df + +# Fit a random forest classification model with spark.randomForest +model <- spark.randomForest(training, label ~ features, "classification", numTrees=10) --- End diff -- ditto below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16148#discussion_r91011734 --- Diff: examples/src/main/r/ml/randomForest.R --- @@ -0,0 +1,63 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# To run this example use +# ./bin/spark-submit examples/src/main/r/ml/randomForest.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-ML-randomForest-example") + +# Random forest classification model + +# $example on:classification$ +# Load training data +df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm") +training <- df +test <- df + +# Fit a random forest classification model with spark.randomForest +model <- spark.randomForest(training, label ~ features, "classification", numTrees=10) --- End diff -- nit: I would put space around, ie. `numTrees = 10` instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16148#discussion_r91011578 --- Diff: docs/sparkr.md --- @@ -512,39 +512,33 @@ head(teenagers) # Machine Learning -SparkR supports the following machine learning algorithms currently: `Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression Model`, `Naive Bayes Model` and `KMeans Model`. -Under the hood, SparkR uses MLlib to train the model. -Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models. -SparkR supports a subset of the available R formula operators for model fitting, including â~â, â.â, â:â, â+â, and â-â. - ## Algorithms -### Generalized Linear Model - -[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits generalized linear model against a Spark DataFrame. -Currently "gaussian", "binomial", "poisson" and "gamma" families are supported. -{% include_example glm r/ml.R %} - -### Accelerated Failure Time (AFT) Survival Regression Model - -[spark.survreg()](api/R/spark.survreg.html) fits an accelerated failure time (AFT) survival regression model on a SparkDataFrame. -Note that the formula of [spark.survreg()](api/R/spark.survreg.html) does not support operator '.' currently. --- End diff -- another R specific info that would be deleted? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16148#discussion_r91011559 --- Diff: docs/sparkr.md --- @@ -512,39 +512,33 @@ head(teenagers) # Machine Learning -SparkR supports the following machine learning algorithms currently: `Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression Model`, `Naive Bayes Model` and `KMeans Model`. -Under the hood, SparkR uses MLlib to train the model. -Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models. -SparkR supports a subset of the available R formula operators for model fitting, including â~â, â.â, â:â, â+â, and â-â. - ## Algorithms -### Generalized Linear Model - -[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits generalized linear model against a Spark DataFrame. -Currently "gaussian", "binomial", "poisson" and "gamma" families are supported. --- End diff -- looks like we would be missing out some R specific things from this delete? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16150 there is also this form `\code{apriori} (the label distribution)` and this form `\item{\code{docConcentration}}{concentration parameter commonly named \code{alpha} ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16150 thanks, there is also the issue with `\code{numOfInputs}` vs `number of iterations IRLS takes` - should it be a "variable" (and thus wrapped with `\code{something}` - or should it be a description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16160: [SPARK-18721][SS]Fix ForeachSink with watermark +...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16160 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16164 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69708/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16164 **[Test build #69708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69708/consoleFull)** for PR 16164 at commit [`4d71250`](https://github.com/apache/spark/commit/4d712503cd413a94827bf41942d3b90dc52e4905). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16159 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69704/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16159 **[Test build #69704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69704/consoleFull)** for PR 16159 at commit [`ce2aa99`](https://github.com/apache/spark/commit/ce2aa99194e0f25843e74697429674807670). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91010366 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends Logging { /** * Add a file to be downloaded with this Spark job on every node. - * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported - * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, - * use `SparkFiles.get(fileName)` to find its download location. + * + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. */ def addFile(path: String): Unit = { addFile(path, false) } /** - * Returns a list of file paths that are added to resources. + * A list of file paths that are added to resources. --- End diff -- We shouldn't duplicate the documentation. converting this to a `@return` is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91010077 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends Logging { /** * Add a file to be downloaded with this Spark job on every node. - * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported - * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, - * use `SparkFiles.get(fileName)` to find its download location. + * + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. */ def addFile(path: String): Unit = { addFile(path, false) } /** - * Returns a list of file paths that are added to resources. + * A list of file paths that are added to resources. --- End diff -- For me, I personally think we just better leave them or duplicate the description into `@return`. I think I am not supposed to decide this. cc @srowen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91010001 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1401,8 +1532,11 @@ class SparkContext(config: SparkConf) extends Logging { /** * Broadcast a read-only variable to the cluster, returning a - * [[org.apache.spark.broadcast.Broadcast]] object for reading it in distributed functions. + * `org.apache.spark.broadcast.Broadcast` object for reading it in distributed functions. --- End diff -- Actually brackets are better because they make links. Some were backquoted because javadoc8 complains about this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16098: [SPARK-18672][CORE] Close recordwriter in SparkHa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16098 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16160 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16160 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16160 **[Test build #69706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69706/consoleFull)** for PR 16160 at commit [`3a7afe7`](https://github.com/apache/spark/commit/3a7afe7f428b996fb5367f1f213a8d0072912ec0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16163 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16098 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91009699 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1620,8 +1766,9 @@ class SparkContext(config: SparkConf) extends Logging { /** * :: DeveloperApi :: - * Return information about what RDDs are cached, if they are in mem or on disk, how much space - * they take, etc. --- End diff -- If you want to remove `Return` or make the description into `@returns`, I guess it should be at least consistent. It seem https://github.com/apache/spark/pull/16137/files#r91009309 is a bit different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user Mironor commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91009631 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends Logging { /** * Add a file to be downloaded with this Spark job on every node. - * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported - * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, - * use `SparkFiles.get(fileName)` to find its download location. + * + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. */ def addFile(path: String): Unit = { addFile(path, false) } /** - * Returns a list of file paths that are added to resources. + * A list of file paths that are added to resources. --- End diff -- The second line is redundant here, my question here is whether it's worth to replace `Return` with `@return` or just to leave it as it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16163 **[Test build #69703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69703/testReport)** for PR 16163 at commit [`6aa9f34`](https://github.com/apache/spark/commit/6aa9f34fa2abd02ae07dea5c0a404d67f7ae5998). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user Mironor commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91009155 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1401,8 +1532,11 @@ class SparkContext(config: SparkConf) extends Logging { /** * Broadcast a read-only variable to the cluster, returning a - * [[org.apache.spark.broadcast.Broadcast]] object for reading it in distributed functions. + * `org.apache.spark.broadcast.Broadcast` object for reading it in distributed functions. --- End diff -- Yes, the initial intent was to make it the same everywhere (either backquotes or brackets) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16139: [SPARK-18705][ML][DOC] Update user guide to reflect one ...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16139 ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91008727 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging { /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI - * @param conf a [[org.apache.spark.SparkConf]] object specifying other Spark parameters + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI + * @param conf a `org.apache.spark.SparkConf` object specifying other Spark parameters */ def this(master: String, appName: String, conf: SparkConf) = this(SparkContext.updatedConf(conf, master, appName)) /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI. - * @param sparkHome Location where Spark is installed on cluster nodes. - * @param jars Collection of JARs to send to the cluster. These can be paths on the local file + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI. + * @param sparkHome location where Spark is installed on cluster nodes. + * @param jars collection of JARs to send to the cluster. These can be paths on the local file * system or HDFS, HTTP, HTTPS, or FTP URLs. - * @param environment Environment variables to set on worker nodes. + * @param environment environment variables to set on worker nodes. --- End diff -- Oh, I am sorry it was mentioned in https://github.com/apache/spark/pull/16137/files#r90948641. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91008621 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging { /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI - * @param conf a [[org.apache.spark.SparkConf]] object specifying other Spark parameters + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI + * @param conf a `org.apache.spark.SparkConf` object specifying other Spark parameters */ def this(master: String, appName: String, conf: SparkConf) = this(SparkContext.updatedConf(conf, master, appName)) /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI. - * @param sparkHome Location where Spark is installed on cluster nodes. - * @param jars Collection of JARs to send to the cluster. These can be paths on the local file + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI. + * @param sparkHome location where Spark is installed on cluster nodes. + * @param jars collection of JARs to send to the cluster. These can be paths on the local file * system or HDFS, HTTP, HTTPS, or FTP URLs. - * @param environment Environment variables to set on worker nodes. + * @param environment environment variables to set on worker nodes. --- End diff -- Do we have a rule (or other references) to make them lower-cased? I am worried of similar changes in the future and it might be great if we have a concrete reason here to change so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91008657 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging { /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI - * @param conf a [[org.apache.spark.SparkConf]] object specifying other Spark parameters + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI + * @param conf a `org.apache.spark.SparkConf` object specifying other Spark parameters */ def this(master: String, appName: String, conf: SparkConf) = this(SparkContext.updatedConf(conf, master, appName)) /** * Alternative constructor that allows setting common Spark properties directly * - * @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). - * @param appName A name for your application, to display on the cluster web UI. - * @param sparkHome Location where Spark is installed on cluster nodes. - * @param jars Collection of JARs to send to the cluster. These can be paths on the local file + * @param master cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]). + * @param appName a name for your application, to display on the cluster web UI. + * @param sparkHome location where Spark is installed on cluster nodes. + * @param jars collection of JARs to send to the cluster. These can be paths on the local file * system or HDFS, HTTP, HTTPS, or FTP URLs. - * @param environment Environment variables to set on worker nodes. + * @param environment environment variables to set on worker nodes. --- End diff -- I am fine if there is not too if there can be coherent and this can be decided here too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91008467 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -923,15 +971,13 @@ class SparkContext(config: SparkConf) extends Logging { /** * Load data from a flat binary file, assuming the length of each record is constant. * - * @note We ensure that the byte array for each record in the resulting RDD + * @note we ensure that the byte array for each record in the resulting RDD --- End diff -- I am not sure of making this lower-cased because we will see this as the start of sentence - scaladoc ![2016-12-06 1 02 12](https://cloud.githubusercontent.com/assets/6477701/20912647/38ccf9e4-bbb4-11e6-8346-42cd6297d075.png) - javadoc ![2016-12-06 1 02 05](https://cloud.githubusercontent.com/assets/6477701/20912648/38d2160e-bbb4-11e6-8d72-1c4480ca2276.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16137#discussion_r91008183 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends Logging { /** * Add a file to be downloaded with this Spark job on every node. - * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported - * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, - * use `SparkFiles.get(fileName)` to find its download location. + * + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, --- End diff -- nit: we could make this single spaced here` URI. To` and also same instances, at least for the lines this PR changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16128 **[Test build #3468 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3468/consoleFull)** for PR 16128 at commit [`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 @srowen If I have understand what you mean correctly, the **"log rotation"** is different with **"job event log clean up"**. The "job event log" is used to reply to build spark history ui. Rightï¼ï¼ï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16128 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16128 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69705/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16128 **[Test build #69705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69705/consoleFull)** for PR 16128 at commit [`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16159: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/16159#discussion_r91006592 --- Diff: project/SparkBuild.scala --- @@ -596,19 +596,17 @@ object Hive { } object Assembly { - import sbtassembly.AssemblyUtils._ - import sbtassembly.Plugin._ - import AssemblyKeys._ + import sbtassembly.AssemblyPlugin.autoImport._ val hadoopVersion = taskKey[String]("The version of hadoop that spark is compiled against.") - lazy val settings = assemblySettings ++ Seq( + lazy val settings = Seq( --- End diff -- Hi, @srowen Thanks for reviewing this PR. Yes, removing `assemblySettings ++` is on purpose. [Quote from [sbt-assembly/Migration.md](https://github.com/sbt/sbt-assembly/blob/master/Migration.md): "Remove `assemblySettings.` The settings are now auto injected to all projects with JvmPlugin" ] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15998 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15998 thanks, merging to master/2.1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r91006319 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -346,6 +346,31 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(new Path(partitionLocation) == defaultPartitionLocation) } + test("list partition names") { +val catalog = newBasicCatalog() +val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), storageFormat) +catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = false) + +val partitionNames = catalog.listPartitionNames("db2", "tbl2") +assert(partitionNames == Seq("a=1/b=%25%3D", "a=1/b=2", "a=3/b=4")) + } + + test("list partition names with partial partition spec") { +val catalog = newBasicCatalog() +val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), storageFormat) +catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = false) + +val partitionNames1 = catalog.listPartitionNames("db2", "tbl2", Some(Map("a" -> "1"))) +assert(partitionNames1 == Seq("a=1/b=%25%3D", "a=1/b=2")) --- End diff -- ok, maybe we should consider diverging from Hive here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16164 @srowen Indeed, it is not a normal case. And I found this problem when the streaming job went wrong. As you said > one can compare the graphs visually. It still may mislead users in some cases, like 'scheduling delay' in 'ms' and 'processing time' in 's' or 'min' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r91006034 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -346,6 +346,31 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(new Path(partitionLocation) == defaultPartitionLocation) } + test("list partition names") { +val catalog = newBasicCatalog() +val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), storageFormat) +catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = false) + +val partitionNames = catalog.listPartitionNames("db2", "tbl2") +assert(partitionNames == Seq("a=1/b=%25%3D", "a=1/b=2", "a=3/b=4")) + } + + test("list partition names with partial partition spec") { +val catalog = newBasicCatalog() +val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), storageFormat) +catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = false) + +val partitionNames1 = catalog.listPartitionNames("db2", "tbl2", Some(Map("a" -> "1"))) +assert(partitionNames1 == Seq("a=1/b=%25%3D", "a=1/b=2")) --- End diff -- Yeah, I tried Hive 1.2. It actually returns the weird value. ``` hive> create table partTab (col1 int, col2 int) partitioned by (pcol1 String, pcol2 String); OK hive> insert into table partTab partition(pcol1='1', pcol2='2') select 3, 4 from dummy; OK hive> insert into table partTab partition(pcol1='1', pcol2='%=') select 3, 4 from dummy; OK hive> show partitions partTab; OK pcol1=1/pcol2=%25%3D pcol1=1/pcol2=2 hive> show partitions partTab PARTITION(pcol1=1); OK pcol1=1/pcol2=2 pcol1=1/pcol2=%25%3D ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r91005791 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,27 @@ object LBFGS extends Logging { val bcW = data.context.broadcast(w) val localGradient = gradient - val (gradientSum, lossSum) = data.treeAggregate((Vectors.zeros(n), 0.0))( - seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) => -val l = localGradient.compute( - features, label, bcW.value, grad) -(grad, loss + l) - }, - combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => -axpy(1.0, grad2, grad1) -(grad1, loss1 + loss2) - }) + // Given (current accumulated gradient, current loss) and (label, features) + // tuples, updates the current gradient and current loss + val seqOp = (c: (Vector, Double), v: (Double, Vector)) => +(c, v) match { + case ((grad, loss), (label, features)) => +val denseGrad = grad.toDense +val l = localGradient.compute(features, label, bcW.value, denseGrad) +(denseGrad, loss + l) +} + + // Adds two (gradient, loss) tuples + val combOp = (c1: (Vector, Double), c2: (Vector, Double)) => +(c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => + val denseGrad1 = grad1.toDense --- End diff -- Meaning, when would the args ever not be dense? I agree, shouldn't be sparse at this stage, but doing this defensively seems fine since it's a no-op for dense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16142 Yes, but the alternative is reimplementing an ad-hoc log rotation system here, which isn't great either. Are you saying the history server already manages logs? pardon, I don't know it at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 @srowen Spark History Server may do the clean-up work. The precondition is we start it and it keeps running. Besides, if there are abundant applications constantly, the event log may still take up abundant storage space. This PR gives system another chance to clean-up work before each application begins saving event log. What's more, if you are more concerned with storage cost, this PR provides 'space' mode to restrict the disk usage of spark event log. > It's something you often leave to a cron job or something to archive and clean up. IMHO, I do not think it is a stable way to do this work. We may make sure it works and keeps working, just like Spark history server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16146 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69707/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16146 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16146 **[Test build #69707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69707/consoleFull)** for PR 16146 at commit [`8672343`](https://github.com/apache/spark/commit/86723436ba2b711d0eb6f2de92f3651006e3bff4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #69709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69709/consoleFull)** for PR 16129 at commit [`b4a197a`](https://github.com/apache/spark/commit/b4a197ac09e19693f6dc0ce9d50c32ce5064786f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16164 CC @zsxwing because it works this way on purpose, so that one can compare the graphs visually. Usually these values aren't too different in scale; it's a problem here because scheduling delay is unusually large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16147: [SPARK-18718][TESTS] Skip some test failures due to path...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16147 Just FYI, I ran some more tests for each package for myself and grepped `local-cluster` before submitting this PR and It seems there are not many same instances. If I face the same problems frequently a lot, I would definitely try to work around within the test codes in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16164 **[Test build #69708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69708/consoleFull)** for PR 16164 at commit [`4d71250`](https://github.com/apache/spark/commit/4d712503cd413a94827bf41942d3b90dc52e4905). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16142 Hm, does Spark generally manage log rotation? I confess ignorance. It's something you often leave to a cron job or something to archive and clean up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16128: [SPARK-18671][SS][TEST] Added tests to ensure sta...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16128#discussion_r91003428 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1022,6 +1021,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest { val options = new FileStreamOptions(Map("maxfilespertrigger" -> "1")) assert(options.maxFilesPerTrigger == Some(1)) } + + test("FileStreamSource offset - read Spark 2.1.0 log format") { +val offset = readOffsetFromResource("file-source-offset-version-2.1.0.txt") --- End diff -- same comment as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16128: [SPARK-18671][SS][TEST] Added tests to ensure sta...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16128#discussion_r91003243 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -81,7 +81,14 @@ private object JsonUtils { */ def partitionOffsets(partitionOffsets: Map[TopicPartition, Long]): String = { val result = new HashMap[String, HashMap[Int, Long]]() -partitionOffsets.foreach { case (tp, off) => +implicit val ordering = new Ordering[TopicPartition] { + override def compare(x: TopicPartition, y: TopicPartition): Int = { +Ordering.Tuple2[String, Int].compare((x.topic, x.partition), (y.topic, y.partition)) + } +} +val partitions = partitionOffsets.keySet.toSeq.sorted // sort for more determinism +partitions.foreach { tp => --- End diff -- I want to sort by topic and partitions together. so that partitions are ordered when json is generated (currently is not) and hard to read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org