date:20161205

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r91026585
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
 assert(count4 == 1)
 sql("DROP TABLE parquet_tmp")
   }
+
+  test("Hive Stateful UDF") {
+withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) 
{
+  sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
+  sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
+  withTempView("inputTable") {
+val testData = spark.sparkContext.parallelize(
+  (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF()
+testData.createOrReplaceTempView("inputTable")
+// Distribute all rows to one partition (all rows have the same 
content),
--- End diff --

@cloud-fan  Thanks for the review. Because all rows only contains 
IntegerCaseClass(1), RepartitionByExpression will assign all rows to one 
partition, which has 10 records.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r91026433
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
 assert(count4 == 1)
 sql("DROP TABLE parquet_tmp")
   }
+
+  test("Hive Stateful UDF") {
+withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) 
{
+  sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
+  sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
+  withTempView("inputTable") {
+val testData = spark.sparkContext.parallelize(
+  (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF()
+testData.createOrReplaceTempView("inputTable")
+// Distribute all rows to one partition (all rows have the same 
content),
+// and expected Max(s) is 10 as statefulUDF returns the sequence 
number starting from 1.
+checkAnswer(
+  sql(
+"""
+|SELECT MAX(s) FROM
+|  (SELECT statefulUDF() as s FROM
+|(SELECT i from inputTable DISTRIBUTE by i) a
+|) b
+  """.stripMargin),
+  Row(10))
+
+// Expected Max(s) is 5, as there are 2 partitions with 5 rows 
each, and statefulUDF
+// returns the sequence number of the rows in the partition 
starting from 1.
+checkAnswer(
+  sql(
+"""
+  |SELECT MAX(s) FROM
+  |  (SELECT statefulUDF() as s FROM
+  |(SELECT i from inputTable) a
+  |) b
+""".stripMargin),
+  Row(5))
+
+// Expected Max(s) is 1, as stateless UDF is deterministic and 
replaced by constant 1.
--- End diff --

StatelessUDF is foldable:   override def foldable: Boolean = 
isUDFDeterministic && children.forall(_.foldable)

ConstantFolding optimizer will replace it with constant:
  case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)

Here is the explain(true):
== Parsed Logical Plan ==
'Project [unresolvedalias('MAX('s), None)]
+- 'SubqueryAlias b
   +- 'Project ['statelessUDF() AS s#39]
  +- 'SubqueryAlias a
 +- 'RepartitionByExpression ['i]
+- 'Project ['i]
   +- 'UnresolvedRelation `inputTable`

== Analyzed Logical Plan ==
max(s): bigint
Aggregate [max(s#39L) AS max(s)#46L]
+- SubqueryAlias b
   +- Project 
[HiveSimpleUDF#org.apache.spark.sql.hive.execution.StatelessUDF() AS s#39L]
  +- SubqueryAlias a
 +- RepartitionByExpression [i#4]
+- Project [i#4]
   +- SubqueryAlias inputtable
  +- SerializeFromObject 
[assertnotnull(assertnotnull(input[0, 
org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product 
input object), - root class: 
"org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4]
 +- ExternalRDD [obj#3]

== Optimized Logical Plan ==
Aggregate [max(s#39L) AS max(s)#46L]
+- Project [1 AS s#39L]
   +- RepartitionByExpression [i#4]
  +- SerializeFromObject [assertnotnull(assertnotnull(input[0, 
org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product 
input object), - root class: 
"org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4]
 +- ExternalRDD [obj#3]

== Physical Plan ==
*HashAggregate(keys=[], functions=[max(s#39L)], output=[max(s)#46L])
+- Exchange SinglePartition
   +- *HashAggregate(keys=[], functions=[partial_max(s#39L)], 
output=[max#48L])
  +- *Project [1 AS s#39L]
 +- Exchange hashpartitioning(i#4, 5)
+- *SerializeFromObject [assertnotnull(assertnotnull(input[0, 
org.apache.spark.sql.hive.execution.IntegerCaseClass, true], top level Product 
input object), - root class: 
"org.apache.spark.sql.hive.execution.IntegerCaseClass").i AS i#4]
   +- Scan ExternalRDDScan[obj#3]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...

2016-12-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16161
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16103: [SPARK-18374][ML]Incorrect words in StopWords/english.tx...

2016-12-05 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/16103
  
Thanks for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16167
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69711/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16167
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16167
  
**[Test build #69711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69711/consoleFull)**
 for PR 16167 at commit 
[`41066dd`](https://github.com/apache/spark/commit/41066ddcf2863872af06320bd4d871b90a4fc3ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-12-05 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/15994
  
ok ,thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-05 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16149#discussion_r91021502
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -479,7 +479,12 @@ object GeneralizedLinearRegression extends 
DefaultParamsReadable[GeneralizedLine
 numInstances: Double,
 weightSum: Double): Double = {
   -2.0 * predictions.map { case (y: Double, mu: Double, weight: 
Double) =>
-weight * dist.Binomial(1, mu).logProbabilityOf(math.round(y).toInt)
+val wt = math.round(weight).toInt
+if (wt == 0) {
+  0.0
+} else {
+  dist.Binomial(wt, mu).logProbabilityOf(math.round(y * 
weight).toInt)
--- End diff --

So I think the real issue here is that we don't currently allow users to 
specify a binomial GLM using success/outcome pairs. One way to mash that kind 
of grouped data into the format Spark requires is using the process described 
above by @actuaryzhang, but then we need to adjust the log-likelihood 
computation as was also noted. 

So @srowen is correct in saying that this is inaccurate for non-integer 
weights. I checked with R's glmnet, and it seems that they obey the semantics 
of data weights for a binomial GLM corresponding to the number of successes. So 
they log a warning when you input data weights of non-integer values, then 
proceed with the method proposed in this patch. 

So, this actually _does_ match R's behavior and I am in favor of the 
change. But we need to log appropriate warnings and write good unit tests. What 
are others' thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r91020060
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
 assert(count4 == 1)
 sql("DROP TABLE parquet_tmp")
   }
+
+  test("Hive Stateful UDF") {
+withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) 
{
+  sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
+  sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
+  withTempView("inputTable") {
+val testData = spark.sparkContext.parallelize(
+  (0 until 10) map (x => IntegerCaseClass(1)), 2).toDF()
+testData.createOrReplaceTempView("inputTable")
+// Distribute all rows to one partition (all rows have the same 
content),
--- End diff --

Why `DISTRIBUTE BY` can distribute all rows to one partition? It's 
implemented by `RepartitionByExpression` which doesn't always use one partition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69710/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16166
  
**[Test build #69710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69710/consoleFull)**
 for PR 16166 at commit 
[`095184d`](https://github.com/apache/spark/commit/095184da2f6d65ecde9970a4296db2d08dd9f797).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16068
  
**[Test build #69716 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69716/consoleFull)**
 for PR 16068 at commit 
[`87f134c`](https://github.com/apache/spark/commit/87f134c5b5885c18513d38c30ab0cf553226d822).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16068
  
**[Test build #69715 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69715/consoleFull)**
 for PR 16068 at commit 
[`78e9b38`](https://github.com/apache/spark/commit/78e9b38454cea5059306e2e26ef3c7d77b19c81e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16128
  
**[Test build #69714 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69714/consoleFull)**
 for PR 16128 at commit 
[`26a86d6`](https://github.com/apache/spark/commit/26a86d64f2f492094960b19332cabd7457f95e61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69709/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-05 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16131
  
@srowen Done. Thanks for the suggestion. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16137: [SPARK-18708][CORE] Improvement/improve docs in spark co...

2016-12-05 Thread Mironor

Github user Mironor commented on the issue:

https://github.com/apache/spark/pull/16137
  
@srowen I reversed obvious comments as well as some minor changes (such as 
capitalizing). I only left javadoc for some of the non-trivial public api. I 
can reverse changes for comments where the only diff is the wrapping of 
references in backquotes.
I'd like to know if you have some idea on whether to use indentation (as in 
[javadoc](http://www.oracle.com/technetwork/articles/java/index-137868.html)/[scaladoc](http://docs.scala-lang.org/style/scaladoc.html))
 as well as what type of character to use when linking a reference (backquote 
or brackets?), I could modify Spark code style documentation if if's different 
from javadoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69709 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69709/consoleFull)**
 for PR 16129 at commit 
[`b4a197a`](https://github.com/apache/spark/commit/b4a197ac09e19693f6dc0ce9d50c32ce5064786f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16128
  
**[Test build #3468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3468/consoleFull)**
 for PR 16128 at commit 
[`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread Mironor

Github user Mironor commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91015596
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends 
Logging {
   }
 
   /**
-   * Get an RDD for a Hadoop SequenceFile with given key and value types.
+   * Get an RDD for a Hadoop `SequenceFile` with given key and value types.
*
-   * @note Because Hadoop's RecordReader class re-uses the same Writable 
object for each
-   * record, directly caching the returned RDD or directly passing it to 
an aggregation or shuffle
-   * operation will create many references to the same object.
-   * If you plan to directly cache, sort, or aggregate Hadoop writable 
objects, you should first
-   * copy them using a `map` function.
+   * @note because Hadoop's `RecordReader` class re-uses the same 
`Writable` object for each
--- End diff --

Correct, but 
[they](http://www.oracle.com/technetwork/articles/java/index-137868.html) also 
contain continuation indentation (they even align parameter description) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91015460
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends 
Logging {
   }
 
   /**
-   * Get an RDD for a Hadoop SequenceFile with given key and value types.
+   * Get an RDD for a Hadoop `SequenceFile` with given key and value types.
*
-   * @note Because Hadoop's RecordReader class re-uses the same Writable 
object for each
-   * record, directly caching the returned RDD or directly passing it to 
an aggregation or shuffle
-   * operation will create many references to the same object.
-   * If you plan to directly cache, sort, or aggregate Hadoop writable 
objects, you should first
-   * copy them using a `map` function.
+   * @note because Hadoop's `RecordReader` class re-uses the same 
`Writable` object for each
--- End diff --

I understand they are used in a mixed way and I see the example of multiple 
lines with `@return` in the scaladoc. I am fine with this but I just wanted to 
note my worry here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16138
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69713/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #69713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69713/consoleFull)**
 for PR 16138 at commit 
[`8837bdb`](https://github.com/apache/spark/commit/8837bdb176963be6da02c2b0e91c5673cd3fa1b2).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ToTimestamp(left: Expression, right: Expression, child: 
Expression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16138
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16000: [SPARK-18537][Web UI]Add a REST api to spark stre...

2016-12-05 Thread ChorPangChan

Github user ChorPangChan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16000#discussion_r91014959
  
--- Diff: 
streaming/src/main/java/org/apache/spark/streaming/status/api/v1/BatchStatus.java
 ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.status.api.v1;
--- End diff --

allright,
I understand the problem now.

in order to merge with another plan(SPARK-18085) in the furture
streaming api may need to support history in the furture,
and thus need to use /api/v1/applications/:id/:attempt/streaming as endpoint

in order to use /api/v1/applications/:id/:attempt/streaming as end point
someone will need to implement a hooking mechanism to "mount" the streaming 
api to the applications resource

am I correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...

2016-12-05 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16163
  
@srowen Thanks. I sent this one because the `consoleFull` page frequently 
freezes my browser these days, not mentioning viewing Jenkins build results via 
mobile phone...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #69713 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69713/consoleFull)**
 for PR 16138 at commit 
[`8837bdb`](https://github.com/apache/spark/commit/8837bdb176963be6da02c2b0e91c5673cd3fa1b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16014
  
**[Test build #69712 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69712/consoleFull)**
 for PR 16014 at commit 
[`6ef26fe`](https://github.com/apache/spark/commit/6ef26fe3134880924fad03f39b4d6faa84aa05e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...

2016-12-05 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/16165
  
Hi @srowen,

Thanks for the quick feedback. Let me get rid of the on/off knob for 
inprogress files. Would you like me to remove the maxAge setting for inprogress 
files as well?

I initially worried about long running jobs (streaming?) but I think even 
in that case the files will get updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread Mironor

Github user Mironor commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91013560
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1144,13 +1218,19 @@ class SparkContext(config: SparkConf) extends 
Logging {
   }
 
   /**
-   * Get an RDD for a Hadoop SequenceFile with given key and value types.
+   * Get an RDD for a Hadoop `SequenceFile` with given key and value types.
*
-   * @note Because Hadoop's RecordReader class re-uses the same Writable 
object for each
-   * record, directly caching the returned RDD or directly passing it to 
an aggregation or shuffle
-   * operation will create many references to the same object.
-   * If you plan to directly cache, sort, or aggregate Hadoop writable 
objects, you should first
-   * copy them using a `map` function.
+   * @note because Hadoop's `RecordReader` class re-uses the same 
`Writable` object for each
--- End diff --

[Spark style guide](http://spark.apache.org/contributing.html) doesn't 
contain anything about continuation indentation and refers to the [Scala's own 
style guide](http://docs.scala-lang.org/style/scaladoc.html) which shows that 
indentation should be used.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16167: [DO NOT MERGE]Remove workaround for Netty memory leak

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16167
  
**[Test build #69711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69711/consoleFull)**
 for PR 16167 at commit 
[`41066dd`](https://github.com/apache/spark/commit/41066ddcf2863872af06320bd4d871b90a4fc3ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16128
  
LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16167: [DO NOT MERGE]Remove workaround for Netty memory ...

2016-12-05 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16167

[DO NOT MERGE]Remove workaround for Netty memory leak

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark remove-netty-workaround

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16167


commit 41066ddcf2863872af06320bd4d871b90a4fc3ad
Author: Shixiong Zhu 
Date:   2016-12-06T05:04:07Z

Remove workaround for Netty memory leak




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...

2016-12-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16165
  
I don't think it makes sense to expose yet another set of settings for this.
I think the risk of course is that this accidentally cleans up another 
instance's work in progress.
However if it's quite old, I'd think it's as safe to clean up an 
in-progress file as any other?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16166: [SPARK-18734][SS] Represent timestamp in StreamingQueryP...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16166
  
**[Test build #69710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69710/consoleFull)**
 for PR 16166 at commit 
[`095184d`](https://github.com/apache/spark/commit/095184da2f6d65ecde9970a4296db2d08dd9f797).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16166: [SPARK-18734][SS] Represent timestamp in Streamin...

2016-12-05 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/16166

[SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as 
formatted string instead of millis

## What changes were proposed in this pull request?

Easier to read while debugging as a formatted string (in ISO8601 format) 
than in millis

## How was this patch tested?
Updated unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-18734

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16166


commit 095184da2f6d65ecde9970a4296db2d08dd9f797
Author: Tathagata Das 
Date:   2016-12-06T04:59:37Z

Changed to string




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config o...

2016-12-05 Thread seyfe

GitHub user seyfe opened a pull request:

https://github.com/apache/spark/pull/16165

[SPARK-18733] [WEBUI] HistoryServer: Add config option to cleanup 
in-progress files

## What changes were proposed in this pull request?
Add 2 new config parameters
1) spark.history.fs.cleaner.inProgress.files: Default value will be false 
so no behavior change for anyone.
2) spark.history.fs.cleaner.inProgress.maxAge: Have a way to specify age of 
inprogress files Default value is 28days.

## How was this patch tested?
Added new unittests and via existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seyfe/spark clear_old_inprogress_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16165


commit 90b790bffbf3b90e6cf8abcddecb323e906f1c18
Author: Ergin Seyfe 
Date:   2016-12-06T01:10:47Z

History Server clean old inprogress files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16165: [SPARK-18733] [WEBUI] HistoryServer: Add config option t...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16165
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...

2016-12-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16148#discussion_r91011837
  
--- Diff: examples/src/main/r/ml/lda.R ---
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/lda.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-lda-example")
+
+# $example on$
+# Load training data
+df <- read.df("data/mllib/sample_lda_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a latent dirichlet allocation model with spark.lda
+model <- spark.lda(training, k=10, maxIter=10)
--- End diff --

nit: please put space, ie. `k = 10, maxIter = 10`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...

2016-12-05 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16148
  
this is great, thanks! btw, how are these examples getting run? is there a 
way to know if the examples are broken because of API changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...

2016-12-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16148#discussion_r91011772
  
--- Diff: examples/src/main/r/ml/randomForest.R ---
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/randomForest.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-randomForest-example")
+
+# Random forest classification model
+
+# $example on:classification$
+# Load training data
+df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a random forest classification model with spark.randomForest
+model <- spark.randomForest(training, label ~ features, "classification", 
numTrees=10)
--- End diff --

ditto below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...

2016-12-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16148#discussion_r91011734
  
--- Diff: examples/src/main/r/ml/randomForest.R ---
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/randomForest.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-randomForest-example")
+
+# Random forest classification model
+
+# $example on:classification$
+# Load training data
+df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a random forest classification model with spark.randomForest
+model <- spark.randomForest(training, label ~ features, "classification", 
numTrees=10)
--- End diff --

nit: I would put space around, ie. `numTrees = 10` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...

2016-12-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16148#discussion_r91011578
  
--- Diff: docs/sparkr.md ---
@@ -512,39 +512,33 @@ head(teenagers)
 
 # Machine Learning
 
-SparkR supports the following machine learning algorithms currently: 
`Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression 
Model`, `Naive Bayes Model` and `KMeans Model`.
-Under the hood, SparkR uses MLlib to train the model.
-Users can call `summary` to print a summary of the fitted model, 
[predict](api/R/predict.html) to make predictions on new data, and 
[write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load 
fitted models.
-SparkR supports a subset of the available R formula operators for model 
fitting, including â~â, â.â, â:â, â+â, and â-â.
-
 ## Algorithms
 
-### Generalized Linear Model
-
-[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits 
generalized linear model against a Spark DataFrame.
-Currently "gaussian", "binomial", "poisson" and "gamma" families are 
supported.
-{% include_example glm r/ml.R %}
-
-### Accelerated Failure Time (AFT) Survival Regression Model
-
-[spark.survreg()](api/R/spark.survreg.html) fits an accelerated failure 
time (AFT) survival regression model on a SparkDataFrame.
-Note that the formula of [spark.survreg()](api/R/spark.survreg.html) does 
not support operator '.' currently.
--- End diff --

another R specific info that would be deleted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers exam...

2016-12-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16148#discussion_r91011559
  
--- Diff: docs/sparkr.md ---
@@ -512,39 +512,33 @@ head(teenagers)
 
 # Machine Learning
 
-SparkR supports the following machine learning algorithms currently: 
`Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression 
Model`, `Naive Bayes Model` and `KMeans Model`.
-Under the hood, SparkR uses MLlib to train the model.
-Users can call `summary` to print a summary of the fitted model, 
[predict](api/R/predict.html) to make predictions on new data, and 
[write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load 
fitted models.
-SparkR supports a subset of the available R formula operators for model 
fitting, including â~â, â.â, â:â, â+â, and â-â.
-
 ## Algorithms
 
-### Generalized Linear Model
-
-[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits 
generalized linear model against a Spark DataFrame.
-Currently "gaussian", "binomial", "poisson" and "gamma" families are 
supported.
--- End diff --

looks like we would be missing out some R specific things from this delete?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...

2016-12-05 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16150
  
there is also this form
`\code{apriori} (the label distribution)`

and this form
`\item{\code{docConcentration}}{concentration parameter commonly named 
\code{alpha} `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...

2016-12-05 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16150
  
thanks, there is also the issue with `\code{numOfInputs}` vs `number of 
iterations IRLS takes`
- should it be a "variable" (and thus wrapped with `\code{something}`
- or should it be a description?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16160: [SPARK-18721][SS]Fix ForeachSink with watermark +...

2016-12-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16160


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16164
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16164
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69708/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16164
  
**[Test build #69708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69708/consoleFull)**
 for PR 16164 at commit 
[`4d71250`](https://github.com/apache/spark/commit/4d712503cd413a94827bf41942d3b90dc52e4905).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16159
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69704/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16159: [SPARK-18697][BUILD] Upgrade sbt plugins

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16159
  
**[Test build #69704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69704/consoleFull)**
 for PR 16159 at commit 
[`ce2aa99`](https://github.com/apache/spark/commit/ce2aa99194e0f25843e74697429674807670).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91010366
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Add a file to be downloaded with this Spark job on every node.
-   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
-   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
-   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access 
the file in Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download 
location.
*/
   def addFile(path: String): Unit = {
 addFile(path, false)
   }
 
   /**
-   * Returns a list of file paths that are added to resources.
+   * A list of file paths that are added to resources.
--- End diff --

We shouldn't duplicate the documentation. converting this to a `@return` is 
fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91010077
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Add a file to be downloaded with this Spark job on every node.
-   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
-   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
-   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access 
the file in Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download 
location.
*/
   def addFile(path: String): Unit = {
 addFile(path, false)
   }
 
   /**
-   * Returns a list of file paths that are added to resources.
+   * A list of file paths that are added to resources.
--- End diff --

For me, I personally think we just better leave them or duplicate the 
description into `@return`. I think I am not supposed to decide this. cc 
@srowen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91010001
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1401,8 +1532,11 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Broadcast a read-only variable to the cluster, returning a
-   * [[org.apache.spark.broadcast.Broadcast]] object for reading it in 
distributed functions.
+   * `org.apache.spark.broadcast.Broadcast` object for reading it in 
distributed functions.
--- End diff --

Actually brackets are better because they make links. Some were backquoted 
because javadoc8 complains about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16098: [SPARK-18672][CORE] Close recordwriter in SparkHa...

2016-12-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16098


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16160
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69706/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16160: [SPARK-18721][SS]Fix ForeachSink with watermark + append

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16160
  
**[Test build #69706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69706/consoleFull)**
 for PR 16160 at commit 
[`3a7afe7`](https://github.com/apache/spark/commit/3a7afe7f428b996fb5367f1f213a8d0072912ec0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16163
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16098
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69703/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91009699
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1620,8 +1766,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
 
   /**
* :: DeveloperApi ::
-   * Return information about what RDDs are cached, if they are in mem or 
on disk, how much space
-   * they take, etc.
--- End diff --

If you want to remove `Return` or make the description into `@returns`, I 
guess it should be at least consistent. It seem 
https://github.com/apache/spark/pull/16137/files#r91009309 is a bit different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread Mironor

Github user Mironor commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91009631
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Add a file to be downloaded with this Spark job on every node.
-   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
-   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
-   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access 
the file in Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download 
location.
*/
   def addFile(path: String): Unit = {
 addFile(path, false)
   }
 
   /**
-   * Returns a list of file paths that are added to resources.
+   * A list of file paths that are added to resources.
--- End diff --

The second line is redundant here, my question here is whether it's worth 
to replace `Return` with `@return` or just to leave it as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16163
  
**[Test build #69703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69703/testReport)**
 for PR 16163 at commit 
[`6aa9f34`](https://github.com/apache/spark/commit/6aa9f34fa2abd02ae07dea5c0a404d67f7ae5998).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread Mironor

Github user Mironor commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91009155
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1401,8 +1532,11 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Broadcast a read-only variable to the cluster, returning a
-   * [[org.apache.spark.broadcast.Broadcast]] object for reading it in 
distributed functions.
+   * `org.apache.spark.broadcast.Broadcast` object for reading it in 
distributed functions.
--- End diff --

Yes, the initial intent was to make it the same everywhere (either 
backquotes or brackets)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16139: [SPARK-18705][ML][DOC] Update user guide to reflect one ...

2016-12-05 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16139
  
ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91008727
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging 
{
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI
-   * @param conf a [[org.apache.spark.SparkConf]] object specifying other 
Spark parameters
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI
+   * @param conf a `org.apache.spark.SparkConf` object specifying other 
Spark parameters
*/
   def this(master: String, appName: String, conf: SparkConf) =
 this(SparkContext.updatedConf(conf, master, appName))
 
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI.
-   * @param sparkHome Location where Spark is installed on cluster nodes.
-   * @param jars Collection of JARs to send to the cluster. These can be 
paths on the local file
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI.
+   * @param sparkHome location where Spark is installed on cluster nodes.
+   * @param jars collection of JARs to send to the cluster. These can be 
paths on the local file
* system or HDFS, HTTP, HTTPS, or FTP URLs.
-   * @param environment Environment variables to set on worker nodes.
+   * @param environment environment variables to set on worker nodes.
--- End diff --

Oh, I am sorry it was mentioned in 
https://github.com/apache/spark/pull/16137/files#r90948641.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91008621
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging 
{
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI
-   * @param conf a [[org.apache.spark.SparkConf]] object specifying other 
Spark parameters
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI
+   * @param conf a `org.apache.spark.SparkConf` object specifying other 
Spark parameters
*/
   def this(master: String, appName: String, conf: SparkConf) =
 this(SparkContext.updatedConf(conf, master, appName))
 
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI.
-   * @param sparkHome Location where Spark is installed on cluster nodes.
-   * @param jars Collection of JARs to send to the cluster. These can be 
paths on the local file
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI.
+   * @param sparkHome location where Spark is installed on cluster nodes.
+   * @param jars collection of JARs to send to the cluster. These can be 
paths on the local file
* system or HDFS, HTTP, HTTPS, or FTP URLs.
-   * @param environment Environment variables to set on worker nodes.
+   * @param environment environment variables to set on worker nodes.
--- End diff --

Do we have a rule (or other references) to make them lower-cased? I am 
worried of similar changes in the future and it might be great if we have a 
concrete reason here to change so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91008657
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -119,22 +119,22 @@ class SparkContext(config: SparkConf) extends Logging 
{
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI
-   * @param conf a [[org.apache.spark.SparkConf]] object specifying other 
Spark parameters
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI
+   * @param conf a `org.apache.spark.SparkConf` object specifying other 
Spark parameters
*/
   def this(master: String, appName: String, conf: SparkConf) =
 this(SparkContext.updatedConf(conf, master, appName))
 
   /**
* Alternative constructor that allows setting common Spark properties 
directly
*
-   * @param master Cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
-   * @param appName A name for your application, to display on the cluster 
web UI.
-   * @param sparkHome Location where Spark is installed on cluster nodes.
-   * @param jars Collection of JARs to send to the cluster. These can be 
paths on the local file
+   * @param master cluster URL to connect to (e.g. mesos://host:port, 
spark://host:port, local[4]).
+   * @param appName a name for your application, to display on the cluster 
web UI.
+   * @param sparkHome location where Spark is installed on cluster nodes.
+   * @param jars collection of JARs to send to the cluster. These can be 
paths on the local file
* system or HDFS, HTTP, HTTPS, or FTP URLs.
-   * @param environment Environment variables to set on worker nodes.
+   * @param environment environment variables to set on worker nodes.
--- End diff --

I am fine if there is not too if there can be coherent and this can be 
decided here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91008467
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -923,15 +971,13 @@ class SparkContext(config: SparkConf) extends Logging 
{
   /**
* Load data from a flat binary file, assuming the length of each record 
is constant.
*
-   * @note We ensure that the byte array for each record in the resulting 
RDD
+   * @note we ensure that the byte array for each record in the resulting 
RDD
--- End diff --

I am not sure of making this lower-cased because we will see this as the 
start of sentence 
- scaladoc
  ![2016-12-06 1 02 
12](https://cloud.githubusercontent.com/assets/6477701/20912647/38ccf9e4-bbb4-11e6-8346-42cd6297d075.png)

- javadoc
  ![2016-12-06 1 02 
05](https://cloud.githubusercontent.com/assets/6477701/20912648/38d2160e-bbb4-11e6-8d72-1c4480ca2276.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16137: [SPARK-18708][CORE] Improvement/improve docs in s...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16137#discussion_r91008183
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1417,27 +1551,31 @@ class SparkContext(config: SparkConf) extends 
Logging {
 
   /**
* Add a file to be downloaded with this Spark job on every node.
-   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
-   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
-   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access 
the file in Spark jobs,
--- End diff --

nit: we could make this single spaced here` URI.  To` and also same 
instances, at least for the lines this PR changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16128
  
**[Test build #3468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3468/consoleFull)**
 for PR 16128 at commit 
[`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

2016-12-05 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16142
  
@srowen If I have understand what you mean correctly,  the **"log 
rotation"** is different with **"job event log clean up"**.  The "job event 
log" is used to reply to build spark history ui. Rightï¼ï¼ï¼


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16128
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16128
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69705/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16128
  
**[Test build #69705 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69705/consoleFull)**
 for PR 16128 at commit 
[`8d4ca5e`](https://github.com/apache/spark/commit/8d4ca5e5d58c01050ac3ca13e4e9b004f67c3009).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16159: [SPARK-18697][BUILD] Upgrade sbt plugins

2016-12-05 Thread weiqingy

Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16159#discussion_r91006592
  
--- Diff: project/SparkBuild.scala ---
@@ -596,19 +596,17 @@ object Hive {
 }
 
 object Assembly {
-  import sbtassembly.AssemblyUtils._
-  import sbtassembly.Plugin._
-  import AssemblyKeys._
+  import sbtassembly.AssemblyPlugin.autoImport._
 
   val hadoopVersion = taskKey[String]("The version of hadoop that spark is 
compiled against.")
 
-  lazy val settings = assemblySettings ++ Seq(
+  lazy val settings = Seq(
--- End diff --

Hi, @srowen Thanks for reviewing this PR. Yes, removing `assemblySettings 
++` is on purpose. [Quote from 
[sbt-assembly/Migration.md](https://github.com/sbt/sbt-assembly/blob/master/Migration.md):
 "Remove `assemblySettings.` The settings are now auto injected to all projects 
with JvmPlugin" ] 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15998


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15998
  
thanks, merging to master/2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r91006319
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -346,6 +346,31 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 assert(new Path(partitionLocation) == defaultPartitionLocation)
   }
 
+  test("list partition names") {
+val catalog = newBasicCatalog()
+val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), 
storageFormat)
+catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = 
false)
+
+val partitionNames = catalog.listPartitionNames("db2", "tbl2")
+assert(partitionNames == Seq("a=1/b=%25%3D", "a=1/b=2", "a=3/b=4"))
+  }
+
+  test("list partition names with partial partition spec") {
+val catalog = newBasicCatalog()
+val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), 
storageFormat)
+catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = 
false)
+
+val partitionNames1 = catalog.listPartitionNames("db2", "tbl2", 
Some(Map("a" -> "1")))
+assert(partitionNames1 == Seq("a=1/b=%25%3D", "a=1/b=2"))
--- End diff --

ok, maybe we should consider diverging from Hive here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16164
  
@srowen Indeed, it is not a normal case. And I found this problem when the 
streaming job went wrong. As you said

> one can compare the graphs visually.

It still may mislead users in some cases, like 'scheduling delay' in 'ms' 
and 'processing time' in 's' or 'min'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r91006034
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -346,6 +346,31 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 assert(new Path(partitionLocation) == defaultPartitionLocation)
   }
 
+  test("list partition names") {
+val catalog = newBasicCatalog()
+val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), 
storageFormat)
+catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = 
false)
+
+val partitionNames = catalog.listPartitionNames("db2", "tbl2")
+assert(partitionNames == Seq("a=1/b=%25%3D", "a=1/b=2", "a=3/b=4"))
+  }
+
+  test("list partition names with partial partition spec") {
+val catalog = newBasicCatalog()
+val newPart = CatalogTablePartition(Map("a" -> "1", "b" -> "%="), 
storageFormat)
+catalog.createPartitions("db2", "tbl2", Seq(newPart), ignoreIfExists = 
false)
+
+val partitionNames1 = catalog.listPartitionNames("db2", "tbl2", 
Some(Map("a" -> "1")))
+assert(partitionNames1 == Seq("a=1/b=%25%3D", "a=1/b=2"))
--- End diff --

Yeah, I tried Hive 1.2. It actually returns the weird value. 
```
hive> create table partTab (col1 int, col2 int) partitioned by (pcol1 
String, pcol2 String);
OK
hive> insert into table partTab partition(pcol1='1', pcol2='2') select 3, 4 
from dummy;
OK
hive> insert into table partTab partition(pcol1='1', pcol2='%=') select 3, 
4 from dummy;
OK
hive> show partitions partTab;
OK
pcol1=1/pcol2=%25%3D
pcol1=1/pcol2=2
hive> show partitions partTab PARTITION(pcol1=1);
OK
pcol1=1/pcol2=2
pcol1=1/pcol2=%25%3D
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16037#discussion_r91005791
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala ---
@@ -241,16 +241,27 @@ object LBFGS extends Logging {
   val bcW = data.context.broadcast(w)
   val localGradient = gradient
 
-  val (gradientSum, lossSum) = data.treeAggregate((Vectors.zeros(n), 
0.0))(
-  seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, 
features)) =>
-val l = localGradient.compute(
-  features, label, bcW.value, grad)
-(grad, loss + l)
-  },
-  combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), 
(grad2, loss2)) =>
-axpy(1.0, grad2, grad1)
-(grad1, loss1 + loss2)
-  })
+  // Given (current accumulated gradient, current loss) and (label, 
features)
+  // tuples, updates the current gradient and current loss
+  val seqOp = (c: (Vector, Double), v: (Double, Vector)) =>
+(c, v) match {
+  case ((grad, loss), (label, features)) =>
+val denseGrad = grad.toDense
+val l = localGradient.compute(features, label, bcW.value, 
denseGrad)
+(denseGrad, loss + l)
+}
+
+  // Adds two (gradient, loss) tuples
+  val combOp = (c1: (Vector, Double), c2: (Vector, Double)) =>
+(c1, c2) match { case ((grad1, loss1), (grad2, loss2)) =>
+  val denseGrad1 = grad1.toDense
--- End diff --

Meaning, when would the args ever not be dense? I agree, shouldn't be 
sparse at this stage, but doing this defensively seems fine since it's a no-op 
for dense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

2016-12-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16142
  
Yes, but the alternative is reimplementing an ad-hoc log rotation system 
here, which isn't great either. Are you saying the history server already 
manages logs? pardon, I don't know it at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

2016-12-05 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16142
  
@srowen Spark History Server may do the clean-up work. The precondition is 
we start it and it keeps running. Besides, if there are abundant applications 
constantly, the event log may still take up abundant storage space. This PR 
gives system another chance to clean-up work before each application begins 
saving event log. What's more, if you are more concerned with storage cost, 
this PR provides 'space' mode to restrict the disk usage of spark event log.

> It's something you often leave to a cron job or something to archive and 
clean up.

IMHO, I do not think it is a stable way to do this work. We may make sure 
it works and keeps working, just like Spark history server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69707/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...

2016-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16146
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16146
  
**[Test build #69707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69707/consoleFull)**
 for PR 16146 at commit 
[`8672343`](https://github.com/apache/spark/commit/86723436ba2b711d0eb6f2de92f3651006e3bff4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69709/consoleFull)**
 for PR 16129 at commit 
[`b4a197a`](https://github.com/apache/spark/commit/b4a197ac09e19693f6dc0ce9d50c32ce5064786f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16164
  
CC @zsxwing because it works this way on purpose, so that one can compare 
the graphs visually. Usually these values aren't too different in scale; it's a 
problem here because scheduling delay is unusually large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16147: [SPARK-18718][TESTS] Skip some test failures due to path...

2016-12-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16147
  
Just FYI, I ran some more tests for each package for myself and grepped 
`local-cluster` before submitting this PR and It seems there are not many same 
instances. If I face the same problems frequently a lot, I would definitely try 
to work around within the test codes in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16164: [SPARK-18732][WEB-UI] The Y axis ranges of "schedulingDe...

2016-12-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16164
  
**[Test build #69708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69708/consoleFull)**
 for PR 16164 at commit 
[`4d71250`](https://github.com/apache/spark/commit/4d712503cd413a94827bf41942d3b90dc52e4905).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

2016-12-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16142
  
Hm, does Spark generally  manage log rotation? I confess ignorance. It's 
something you often leave to a cron job or something to archive and clean up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16128: [SPARK-18671][SS][TEST] Added tests to ensure sta...

2016-12-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16128#discussion_r91003428
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -1022,6 +1021,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 val options = new FileStreamOptions(Map("maxfilespertrigger" -> "1"))
 assert(options.maxFilesPerTrigger == Some(1))
   }
+
+  test("FileStreamSource offset - read Spark 2.1.0 log format") {
+val offset = 
readOffsetFromResource("file-source-offset-version-2.1.0.txt")
--- End diff --

same comment as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16128: [SPARK-18671][SS][TEST] Added tests to ensure sta...

2016-12-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16128#discussion_r91003243
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
 ---
@@ -81,7 +81,14 @@ private object JsonUtils {
*/
   def partitionOffsets(partitionOffsets: Map[TopicPartition, Long]): 
String = {
 val result = new HashMap[String, HashMap[Int, Long]]()
-partitionOffsets.foreach { case (tp, off) =>
+implicit val ordering = new Ordering[TopicPartition] {
+  override def compare(x: TopicPartition, y: TopicPartition): Int = {
+Ordering.Tuple2[String, Int].compare((x.topic, x.partition), 
(y.topic, y.partition))
+  }
+}
+val partitions = partitionOffsets.keySet.toSeq.sorted  // sort for 
more determinism
+partitions.foreach { tp =>
--- End diff --

I want to sort by topic and partitions together. so that partitions are 
ordered when json is generated (currently is not) and hard to read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 511 matches

Mail list logo