date:20160212

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/11181

[SPARK-13296][SQL] Move UserDefinedFunction into sql.expressions.

This is more consistent with how we structure the packages for window 
functions and UDAFs. I will have a follow-up pull request to remove 
UserDefinedPythonFunction, since it merely duplicates the existing PythonUDF 
class.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-13296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11181


commit 1a42847e5eb735f7490224c194fc323bdad661fa
Author: Reynold Xin 
Date:   2016-02-12T09:13:33Z

[SPARK-13296][SQL] Move UserDefinedFunction into sql.expressions package.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183245123
  
mima will probably fail


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183260673
  
**[Test build #51181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51181/consoleFull)**
 for PR 10876 at commit 
[`374931a`](https://github.com/apache/spark/commit/374931a285806f92fac4a7483f3594362927cb50).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714801
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderTest.scala ---
@@ -50,10 +50,8 @@ abstract class SQLBuilderTest extends QueryTest with 
TestHiveSingleton {
  """.stripMargin)
 }
 
-val actualSQL = maybeSQL.get
-
 try {
-  assert(actualSQL === expectedSQL)
+  assert(generatedSql === expectedSQL)
--- End diff --

`generatedSQL`? Do we have any naming rules related to acronyms (`Sql` or 
`SQL`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714791
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderTest.scala ---
@@ -50,10 +50,8 @@ abstract class SQLBuilderTest extends QueryTest with 
TestHiveSingleton {
  """.stripMargin)
 }
 
-val actualSQL = maybeSQL.get
-
 try {
-  assert(actualSQL === expectedSQL)
+  assert(generatedSql === expectedSQL)
--- End diff --

i think both are acceptable according to java conventions. if the acronym 
is longer than certain characters, than the preference is to camel case.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714855
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderTest.scala ---
@@ -50,10 +50,8 @@ abstract class SQLBuilderTest extends QueryTest with 
TestHiveSingleton {
  """.stripMargin)
 }
 
-val actualSQL = maybeSQL.get
-
 try {
-  assert(actualSQL === expectedSQL)
+  assert(generatedSql === expectedSQL)
--- End diff --

but i'm going to change this one to be consistent with expectedSQL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183225951
  
**[Test build #51177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51177/consoleFull)**
 for PR 9893 at commit 
[`c099fd5`](https://github.com/apache/spark/commit/c099fd54386cd399cbf1b76d977df0e89f1e37ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11171#issuecomment-183226591
  
Updated.

cc @gatorsmile unfortunately you will have to rebase your pull requests, 
although it should be easy to do.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Update cache()'s storage level to be co...

2016-02-12 Thread wjur

Github user wjur commented on the pull request:

https://github.com/apache/spark/pull/11172#issuecomment-183226607
  
@mateiz, I'm not sure who is responsible for the docs, but it it seems that 
you are the last one who edited the affected lines. I would be grateful if you 
could take a look at the PR. The inconsistency was introduced by PR #2686 (if 
that matters).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11140#issuecomment-183227808
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11140#issuecomment-183227812
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51175/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183256955
  
**[Test build #51180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51180/consoleFull)**
 for PR 11181 at commit 
[`2911532`](https://github.com/apache/spark/commit/291153294feceb52c95a810a93a425a2f6822d9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714511
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
@@ -37,157 +39,137 @@ import 
org.apache.spark.sql.execution.datasources.LogicalRelation
 class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends 
Logging {
   def this(df: DataFrame) = this(df.queryExecution.analyzed, df.sqlContext)
 
-  def toSQL: Option[String] = {
+  def toSQL: String = {
 val canonicalizedPlan = Canonicalizer.execute(logicalPlan)
-val maybeSQL = try {
-  toSQL(canonicalizedPlan)
-} catch { case cause: UnsupportedOperationException =>
-  logInfo(s"Failed to build SQL query string because: 
${cause.getMessage}")
-  None
-}
-
-if (maybeSQL.isDefined) {
+try {
+  val generatedSQL = toSQL(canonicalizedPlan)
   logDebug(
 s"""Built SQL query string successfully from given logical plan:
-   |
-   |# Original logical plan:
-   |${logicalPlan.treeString}
-   |# Canonicalized logical plan:
-   |${canonicalizedPlan.treeString}
-   |# Built SQL query string:
-   |${maybeSQL.get}
+|
+|# Original logical plan:
+|${logicalPlan.treeString}
+|# Canonicalized logical plan:
+|${canonicalizedPlan.treeString}
+|# Generated SQL:
+|$generatedSQL
--- End diff --

Nit: Please revert the indentation change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714530
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
@@ -37,157 +39,137 @@ import 
org.apache.spark.sql.execution.datasources.LogicalRelation
 class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends 
Logging {
   def this(df: DataFrame) = this(df.queryExecution.analyzed, df.sqlContext)
 
-  def toSQL: Option[String] = {
+  def toSQL: String = {
 val canonicalizedPlan = Canonicalizer.execute(logicalPlan)
-val maybeSQL = try {
-  toSQL(canonicalizedPlan)
-} catch { case cause: UnsupportedOperationException =>
-  logInfo(s"Failed to build SQL query string because: 
${cause.getMessage}")
-  None
-}
-
-if (maybeSQL.isDefined) {
+try {
+  val generatedSQL = toSQL(canonicalizedPlan)
   logDebug(
 s"""Built SQL query string successfully from given logical plan:
-   |
-   |# Original logical plan:
-   |${logicalPlan.treeString}
-   |# Canonicalized logical plan:
-   |${canonicalizedPlan.treeString}
-   |# Built SQL query string:
-   |${maybeSQL.get}
+|
+|# Original logical plan:
+|${logicalPlan.treeString}
+|# Canonicalized logical plan:
+|${canonicalizedPlan.treeString}
+|# Generated SQL:
+|$generatedSQL
  """.stripMargin)
-} else {
+  generatedSQL
+} catch { case NonFatal(e) =>
   logDebug(
 s"""Failed to build SQL query string from given logical plan:
-   |
+|
|# Original logical plan:
-   |${logicalPlan.treeString}
-   |# Canonicalized logical plan:
-   |${canonicalizedPlan.treeString}
+|${logicalPlan.treeString}
+|# Canonicalized logical plan:
+|${canonicalizedPlan.treeString}
--- End diff --

Nit: Indentation is off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714504
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
@@ -37,157 +39,137 @@ import 
org.apache.spark.sql.execution.datasources.LogicalRelation
 class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends 
Logging {
   def this(df: DataFrame) = this(df.queryExecution.analyzed, df.sqlContext)
 
-  def toSQL: Option[String] = {
+  def toSQL: String = {
 val canonicalizedPlan = Canonicalizer.execute(logicalPlan)
-val maybeSQL = try {
-  toSQL(canonicalizedPlan)
-} catch { case cause: UnsupportedOperationException =>
-  logInfo(s"Failed to build SQL query string because: 
${cause.getMessage}")
-  None
-}
-
-if (maybeSQL.isDefined) {
+try {
+  val generatedSQL = toSQL(canonicalizedPlan)
   logDebug(
 s"""Built SQL query string successfully from given logical plan:
-   |
-   |# Original logical plan:
-   |${logicalPlan.treeString}
-   |# Canonicalized logical plan:
-   |${canonicalizedPlan.treeString}
-   |# Built SQL query string:
-   |${maybeSQL.get}
+|
+|# Original logical plan:
+|${logicalPlan.treeString}
+|# Canonicalized logical plan:
+|${canonicalizedPlan.treeString}
+|# Generated SQL:
+|$generatedSQL
  """.stripMargin)
-} else {
+  generatedSQL
+} catch { case NonFatal(e) =>
   logDebug(
 s"""Failed to build SQL query string from given logical plan:
-   |
+|
|# Original logical plan:
-   |${logicalPlan.treeString}
-   |# Canonicalized logical plan:
-   |${canonicalizedPlan.treeString}
+|${logicalPlan.treeString}
+|# Canonicalized logical plan:
+|${canonicalizedPlan.treeString}
  """.stripMargin)
+  throw e
 }
-
-maybeSQL
   }
 
-  private def projectToSQL(
-  projectList: Seq[NamedExpression],
-  child: LogicalPlan,
-  isDistinct: Boolean): Option[String] = {
-for {
-  childSQL <- toSQL(child)
-  listSQL = projectList.map(_.sql).mkString(", ")
-  maybeFrom = child match {
-case OneRowRelation => " "
-case _ => " FROM "
-  }
-  distinct = if (isDistinct) " DISTINCT " else " "
-} yield s"SELECT$distinct$listSQL$maybeFrom$childSQL"
-  }
+  private def toSQL(node: LogicalPlan): String = node match {
+case Distinct(p: Project) =>
+  projectToSQL(p, isDistinct = true)
 
-  private def aggregateToSQL(
-  groupingExprs: Seq[Expression],
-  aggExprs: Seq[Expression],
-  child: LogicalPlan): Option[String] = {
-val aggSQL = aggExprs.map(_.sql).mkString(", ")
-val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
-val maybeGroupBy = if (groupingSQL.isEmpty) "" else " GROUP BY "
-val maybeFrom = child match {
-  case OneRowRelation => " "
-  case _ => " FROM "
-}
+case p: Project =>
+  projectToSQL(p, isDistinct = false)
 
-toSQL(child).map { childSQL =>
-  s"SELECT $aggSQL$maybeFrom$childSQL$maybeGroupBy$groupingSQL"
-}
-  }
+case p: Aggregate =>
+  aggregateToSQL(p)
+
+case p: Limit =>
+  s"${toSQL(p.child)} LIMIT ${p.limitExpr.sql}"
 
-  private def toSQL(node: LogicalPlan): Option[String] = node match {
-case Distinct(Project(list, child)) =>
-  projectToSQL(list, child, isDistinct = true)
-
-case Project(list, child) =>
-  projectToSQL(list, child, isDistinct = false)
-
-case Aggregate(groupingExprs, aggExprs, child) =>
-  aggregateToSQL(groupingExprs, aggExprs, child)
-
-case Limit(limit, child) =>
-  for {
-childSQL <- toSQL(child)
-limitSQL = limit.sql
-  } yield s"$childSQL LIMIT $limitSQL"
-
-case Filter(condition, child) =>
-  for {
-childSQL <- toSQL(child)
-whereOrHaving = child match {
-  case _: Aggregate => "HAVING"
-  case _ => "WHERE"
-}
-conditionSQL = condition.sql
-  } yield s"$childSQL $whereOrHaving $conditionSQL"
-
-case Union(children) if children.length > 1 =>
-  val childrenSql = children.map(toSQL(_))
-  if (childrenSql.exists(_.isEmpty)) {
-None
-  } else {
-

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11171#discussion_r52714696
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderTest.scala ---
@@ -50,10 +50,8 @@ abstract class SQLBuilderTest extends QueryTest with 
TestHiveSingleton {
  """.stripMargin)
 }
 
-val actualSQL = maybeSQL.get
-
 try {
-  assert(actualSQL === expectedSQL)
+  assert(generatedSql === expectedSQL)
--- End diff --

`generatedSQL`? Do we have any naming rules related to acronyms (`Sql` or 
`SQL`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-12 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-183248930
  
@yinxusen I'll be away for Spark summit east. Gonna work on this again when 
I'm back. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-12 Thread yinxusen

Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-183249963
  
Never mind, take your time.

2016å¹´2æ12æ¥ææäºï¼DB Tsai  åéï¼

> @yinxusen  I'll be away for Spark summit
> east. Gonna work on this again when I'm back. Thanks.
>
> â
> Reply to this email directly or view it on GitHub
> .
>


-- 
Cheers
---
Xusen Yin(å°¹ç»ªæ£®)
LinkedIn: https://cn.linkedin.com/in/xusenyin



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11171#issuecomment-183257659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51178/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183261650
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183261517
  
**[Test build #51177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51177/consoleFull)**
 for PR 9893 at commit 
[`c099fd5`](https://github.com/apache/spark/commit/c099fd54386cd399cbf1b76d977df0e89f1e37ca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11140#issuecomment-183227646
  
**[Test build #51175 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51175/consoleFull)**
 for PR 11140 at commit 
[`837252a`](https://github.com/apache/spark/commit/837252a74ec87e8f1ac07e80406bf0410c9088d7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11153#discussion_r52717321
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -572,98 +572,64 @@ class Analyzer(
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
 
-  case s @ Sort(_, _, child) if !s.resolved && child.resolved =>
-val (newOrdering, missingResolvableAttrs) = 
collectResolvableMissingAttrs(s.order, child)
-
-if (missingResolvableAttrs.isEmpty) {
-  val unresolvableAttrs = s.order.filterNot(_.resolved)
-  logDebug(s"Failed to find $unresolvableAttrs in 
${child.output.mkString(", ")}")
-  s // Nothing we can do here. Return original plan.
-} else {
-  // Add the missing attributes into projectList of Project/Window 
or
-  //   aggregateExpressions of Aggregate, if they are in the 
inputSet
-  //   but not in the outputSet of the plan.
-  val newChild = child transformUp {
-case p: Project =>
-  p.copy(projectList = p.projectList ++
-missingResolvableAttrs.filter((p.inputSet -- 
p.outputSet).contains))
-case w: Window =>
-  w.copy(projectList = w.projectList ++
-missingResolvableAttrs.filter((w.inputSet -- 
w.outputSet).contains))
-case a: Aggregate =>
-  val resolvableAttrs = 
missingResolvableAttrs.filter(a.groupingExpressions.contains)
-  val notResolvedAttrs = 
resolvableAttrs.filterNot(a.aggregateExpressions.contains)
-  val newAggregateExpressions = a.aggregateExpressions ++ 
notResolvedAttrs
-  a.copy(aggregateExpressions = newAggregateExpressions)
-case o => o
-  }
-
+  case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
+val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
+val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
+val missingAttrs = requiredAttrs -- child.outputSet
+if (missingAttrs.nonEmpty) {
   // Add missing attributes and then project them away after the 
sort.
   Project(child.output,
-Sort(newOrdering, s.global, newChild))
+Sort(newOrder, s.global, addMissingAttr(child, missingAttrs)))
+} else if (newOrder != order) {
+  s.copy(order = newOrder)
+} else {
+  s
 }
 }
 
 /**
- * Traverse the tree until resolving the sorting attributes
- * Return all the resolvable missing sorting attributes
- */
-@tailrec
-private def collectResolvableMissingAttrs(
-ordering: Seq[SortOrder],
-plan: LogicalPlan): (Seq[SortOrder], Seq[Attribute]) = {
+  * Add the missing attributes into projectList of Project/Window or 
aggregateExpressions of
+  * Aggregate.
+  */
+private def addMissingAttr(plan: LogicalPlan, missingAttrs: 
AttributeSet): LogicalPlan = {
+  if (missingAttrs.isEmpty) {
+return plan
+  }
   plan match {
-// Only Windows and Project have projectList-like attribute.
-case un: UnaryNode if un.isInstanceOf[Project] || 
un.isInstanceOf[Window] =>
-  val (newOrdering, missingAttrs) = 
resolveAndFindMissing(ordering, un, un.child)
-  // If missingAttrs is non empty, that means we got it and return 
it;
-  // Otherwise, continue to traverse the tree.
-  if (missingAttrs.nonEmpty) {
-(newOrdering, missingAttrs)
-  } else {
-collectResolvableMissingAttrs(ordering, un.child)
-  }
+case p: Project =>
+  val missing = missingAttrs -- p.child.outputSet
+  Project(p.projectList ++ missingAttrs, addMissingAttr(p.child, 
missing))
+case w: Window =>
+  val missing = missingAttrs -- w.child.outputSet
+  w.copy(projectList = w.projectList ++ missingAttrs,
+child = addMissingAttr(w.child, missing))
 case a: Aggregate =>
-  val (newOrdering, missingAttrs) = 
resolveAndFindMissing(ordering, a, a.child)
-  // For Aggregate, all the order by columns must be specified in 
group by clauses
-  if (missingAttrs.nonEmpty &&
-  missingAttrs.forall(ar => 
a.groupingExpressions.exists(_.semanticEquals(ar {
-(newOrdering, missingAttrs)
-  } else {
-// If missingAttrs is empty, we are unable to

[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-12 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11153#issuecomment-183238642
  
LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183251557
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183251551
  
**[Test build #51179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51179/consoleFull)**
 for PR 11181 at commit 
[`1a42847`](https://github.com/apache/spark/commit/1a42847e5eb735f7490224c194fc323bdad661fa).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183254208
  
@viirya I sent you an update at https://github.com/viirya/spark-1/pull/2. 
Could you review and merge it if it looks good to you? That should fix the PIC 
implementation. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183261652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51177/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183263016
  
**[Test build #51181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51181/consoleFull)**
 for PR 10876 at commit 
[`374931a`](https://github.com/apache/spark/commit/374931a285806f92fac4a7483f3594362927cb50).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183263040
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183263041
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51181/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2016-02-12 Thread steveloughran

Github user steveloughran closed the pull request at:

https://github.com/apache/spark/pull/6935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11171#issuecomment-183227971
  
**[Test build #51178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51178/consoleFull)**
 for PR 11171 at commit 
[`cfccf21`](https://github.com/apache/spark/commit/cfccf213a1bb89f68553f883c205e6b626f5be67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-12 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11153#issuecomment-183228279
  
cc @cloud-fan for review too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503][SPARK-12505] Limit pushdown in U...

2016-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11121#discussion_r52715983
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,14 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the maximum number of rows that this plan may compute.
+   *
+   * Any operator that a Limit can be pushed passed should override this 
function (e.g., Union).
+   * Any operator that can push through a Limit should override this 
function (e.g., Project).
+   */
+  def maxRows: Option[Expression] = None
--- End diff --

are we going to handle non-literal `maxRow` in the future? If not, maybe 
define it as `Option[Long]` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12503][SPARK-12505] Limit pushdown in U...

2016-02-12 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11121#issuecomment-183244562
  
LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183251558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51179/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-02-12 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-183255063
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-12 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52722338
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging {
 val expTable = sc.broadcast(createExpTable())
 val bcVocab = sc.broadcast(vocab)
 val bcVocabHash = sc.broadcast(vocabHash)
-
-val sentences: RDD[Array[Int]] = words.mapPartitions { iter =>
-  new Iterator[Array[Int]] {
-def hasNext: Boolean = iter.hasNext
-
-def next(): Array[Int] = {
-  val sentence = ArrayBuilder.make[Int]
-  var sentenceLength = 0
-  while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) {
-val word = bcVocabHash.value.get(iter.next())
-word match {
-  case Some(w) =>
-sentence += w
-sentenceLength += 1
-  case None =>
-}
+// each partition is a collection of sentences,
+// will be translated into arrays of Index integer
+val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter 
=>
+  // Each sentence will map to 0 or more Array[Int]
+  sentenceIter.flatMap { sentence => {
+  // Sentence of words, some of which map to a word index
+  val wordIndexes = sentence.flatMap(bcVocabHash.value.get)
+  if (wordIndexes.nonEmpty) {
--- End diff --

That is because `"".split(" ") = Array("")`, which has nothing to do with 
`grouped`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-12 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52722467
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging {
 val expTable = sc.broadcast(createExpTable())
 val bcVocab = sc.broadcast(vocab)
 val bcVocabHash = sc.broadcast(vocabHash)
-
-val sentences: RDD[Array[Int]] = words.mapPartitions { iter =>
-  new Iterator[Array[Int]] {
-def hasNext: Boolean = iter.hasNext
-
-def next(): Array[Int] = {
-  val sentence = ArrayBuilder.make[Int]
-  var sentenceLength = 0
-  while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) {
-val word = bcVocabHash.value.get(iter.next())
-word match {
-  case Some(w) =>
-sentence += w
-sentenceLength += 1
-  case None =>
-}
+// each partition is a collection of sentences,
+// will be translated into arrays of Index integer
+val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter 
=>
+  // Each sentence will map to 0 or more Array[Int]
+  sentenceIter.flatMap { sentence => {
+  // Sentence of words, some of which map to a word index
+  val wordIndexes = sentence.flatMap(bcVocabHash.value.get)
+  if (wordIndexes.nonEmpty) {
+// break wordIndexes into trunks of maxSentenceLength when has 
more
+val sentenceSplit = wordIndexes.grouped(maxSentenceLength)
+sentenceSplit.map(_.toArray)
--- End diff --

`sentenceSplit` should be an `Iterator[Array[Int]]`. So this line might be 
unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183262309
  
**[Test build #51180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51180/consoleFull)**
 for PR 11181 at commit 
[`2911532`](https://github.com/apache/spark/commit/291153294feceb52c95a810a93a425a2f6822d9e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183262342
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51180/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13296][SQL] Move UserDefinedFunction in...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11181#issuecomment-183262340
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-02-12 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-183240342
  
@mengxr Actually, I have already updated this PR after #10216 get merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13263] [SQL] SQL Generation Support for...

2016-02-12 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11148#discussion_r52715605
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
@@ -119,6 +120,18 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: 
SQLContext) extends Loggi
 limitSQL = limit.sql
   } yield s"$childSQL LIMIT $limitSQL"
 
+// TABLESAMPLE is part of tableSource clause in the parser,
+// and thus we must handle it with subquery.
+case Sample(lb, ub, withReplacement, _, child @ Subquery(alias, 
grandChild))
--- End diff --

this seems pretty brittle. i wonder if we can change sample to make it very 
clear this is either a table sample, or a normal sample.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Documentation] Added pygments.rb dependancy

2016-02-12 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11180#issuecomment-183255397
  
@amitdev The PR also contains ml-guide changes. Could you remove that 
commit from your branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13154][PYTHON] Add linting for pydocs

2016-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13154][PYTHON] Add linting for pydocs

2016-02-12 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11109#issuecomment-183261151
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183265195
  
**[Test build #51182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51182/consoleFull)**
 for PR 6042 at commit 
[`437aaea`](https://github.com/apache/spark/commit/437aaeac220d9ffeaa671c96e453c4254c376a1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183265613
  
**[Test build #51182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51182/consoleFull)**
 for PR 6042 at commit 
[`437aaea`](https://github.com/apache/spark/commit/437aaeac220d9ffeaa671c96e453c4254c376a1e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-12 Thread Earthson

Github user Earthson commented on the pull request:

https://github.com/apache/spark/pull/10697#issuecomment-183275317
  
@mengxr ok, I'll have a look:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13172] [CORE] [SQL] Stop using RichExce...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11182#issuecomment-183275157
  
**[Test build #51184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51184/consoleFull)**
 for PR 11182 at commit 
[`a18092b`](https://github.com/apache/spark/commit/a18092bc1a5863994436dd88f9fd04c59706ac76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52728479
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* (e.g. views,
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to 
deal with such data is taken
+from
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
+Essentially instead of trying to model the matrix of ratings directly, 
this approach treats the data
+as a combination of binary preferences and *confidence values*. The 
ratings are then related to the
+level of confidence in observed user preferences, rather than explicit 
ratings given to items.  The
+model then tries to find latent factors that can be used to predict the 
expected preference of a
+user for an item.
+
+### Scaling of the regularization parameter
+
+We scale the regularization parameter `regParam` in solving each least 
squares problem by
+the number of ratings the user generated in updating user factors,
+or the number of ratings the product received in updating product factors.
+This approach is named "ALS-WR" and discussed in the paper
+"[Large-Scale Parallel Collaborative Filtering for the Netflix 
Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
+It makes `regParam` less dependent on the scale of the dataset.
+So we can apply the best parameter learned from a sampled subset to the 
full dataset
+and expect similar performance.
+
+## Examples
+
+
+
+
+In the following example, we load rating data from the
+[MovieLens dataset](http://grouplens.org/datasets/movielens/), each row
--- End diff --

Do people need to download this now? which file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added missing utility method

2016-02-12 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11173#issuecomment-183266153
  
I don't think we want yet another huge overload. Can't you pass an empty 
map of offsets? if not, let's just support that. @koeninger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13172] [CORE] [SQL] Stop using RichExce...

2016-02-12 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/11182

[SPARK-13172] [CORE] [SQL] Stop using RichException.getStackTrace it is 
deprecated

Replace `getStackTraceString` with `Utils.exceptionString`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-13172

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11182


commit a18092bc1a5863994436dd88f9fd04c59706ac76
Author: Sean Owen 
Date:   2016-02-12T10:49:35Z

Replace getStackTraceString with Utils.exceptionString




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52728448
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* (e.g. views,
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to 
deal with such data is taken
+from
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
+Essentially instead of trying to model the matrix of ratings directly, 
this approach treats the data
+as a combination of binary preferences and *confidence values*. The 
ratings are then related to the
+level of confidence in observed user preferences, rather than explicit 
ratings given to items.  The
+model then tries to find latent factors that can be used to predict the 
expected preference of a
+user for an item.
+
+### Scaling of the regularization parameter
+
+We scale the regularization parameter `regParam` in solving each least 
squares problem by
+the number of ratings the user generated in updating user factors,
+or the number of ratings the product received in updating product factors.
+This approach is named "ALS-WR" and discussed in the paper
+"[Large-Scale Parallel Collaborative Filtering for the Netflix 
Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
+It makes `regParam` less dependent on the scale of the dataset.
+So we can apply the best parameter learned from a sampled subset to the 
full dataset
+and expect similar performance.
+
+## Examples
+
+
+
+
+In the following example, we load rating data from the
+[MovieLens dataset](http://grouplens.org/datasets/movielens/), each row
+consisting of a user, a movie, a rating and a timestamp.
+We then train an ALS model which assumes, by default, that the ratings are
+explicit (`implicitPrefs` is `false`).
+We evaluate the recommendation model by measuring the root-mean-square 
error of
+rating prediction.
+
+Refer to the [`ALS` Scala 
docs](api/scala/index.html#org.apache.spark.ml.recommendation.ALS)
+for more details on the API.
+
+{% include_example scala/org/apache/spark/examples/ml/ALSExample.scala %}
+
+If the rating matrix is derived from another source of information (e.g. 
it is
--- End diff --

Nit: you changed e.g. to i.e. below. Either is arguably fine but keep it 
consistent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52728349
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* (e.g. views,
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to 
deal with such data is taken
+from
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
+Essentially instead of trying to model the matrix of ratings directly, 
this approach treats the data
+as a combination of binary preferences and *confidence values*. The 
ratings are then related to the
--- End diff --

This might just be my own way of wording it, but the input is construed as 
some kind of _strength_ value in implicit data. It's inherently count-like 
(e.g. additive) which is how it differs from ratings. The idea of confidence is 
pretty much an implementation detail. I would not say that "ratings are related 
to.." anything in this model; there are no rating-like quantities. It's not 
predicting the strength of a preference, really, but how much it's likely to 
exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52728386
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* (e.g. views,
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to 
deal with such data is taken
+from
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
+Essentially instead of trying to model the matrix of ratings directly, 
this approach treats the data
+as a combination of binary preferences and *confidence values*. The 
ratings are then related to the
+level of confidence in observed user preferences, rather than explicit 
ratings given to items.  The
+model then tries to find latent factors that can be used to predict the 
expected preference of a
+user for an item.
+
+### Scaling of the regularization parameter
+
+We scale the regularization parameter `regParam` in solving each least 
squares problem by
+the number of ratings the user generated in updating user factors,
+or the number of ratings the product received in updating product factors.
+This approach is named "ALS-WR" and discussed in the paper
+"[Large-Scale Parallel Collaborative Filtering for the Netflix 
Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
+It makes `regParam` less dependent on the scale of the dataset.
+So we can apply the best parameter learned from a sampled subset to the 
full dataset
--- End diff --

Nit: "... dataset, so that we can ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52739112
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* (e.g. views,
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to 
deal with such data is taken
+from
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
+Essentially instead of trying to model the matrix of ratings directly, 
this approach treats the data
+as a combination of binary preferences and *confidence values*. The 
ratings are then related to the
+level of confidence in observed user preferences, rather than explicit 
ratings given to items.  The
+model then tries to find latent factors that can be used to predict the 
expected preference of a
+user for an item.
+
+### Scaling of the regularization parameter
+
+We scale the regularization parameter `regParam` in solving each least 
squares problem by
+the number of ratings the user generated in updating user factors,
+or the number of ratings the product received in updating product factors.
+This approach is named "ALS-WR" and discussed in the paper
+"[Large-Scale Parallel Collaborative Filtering for the Netflix 
Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
+It makes `regParam` less dependent on the scale of the dataset.
+So we can apply the best parameter learned from a sampled subset to the 
full dataset
+and expect similar performance.
+
+## Examples
+
+
+
+
+In the following example, we load rating data from the
+[MovieLens dataset](http://grouplens.org/datasets/movielens/), each row
--- End diff --

Nope, it's in the `data` folder, it's just to tell people where we got the 
dataset from.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread BenFradet

Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10411#issuecomment-183324222
  
@srowen thanks for the review, will make the necessary changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183324518
  
**[Test build #51187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51187/consoleFull)**
 for PR 6042 at commit 
[`d320fd2`](https://github.com/apache/spark/commit/d320fd20afa9873f13794cb0976c7602d6278ee0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183265622
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51182/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52728138
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -0,0 +1,148 @@
+---
+layout: global
+title: Collaborative Filtering - spark.ml
+displayTitle: Collaborative Filtering - spark.ml
+---
+
+* Table of contents
+{:toc}
+
+## Collaborative filtering 
+
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+is commonly used for recommender systems.  These techniques aim to fill in 
the
+missing entries of a user-item association matrix.  `spark.ml` currently 
supports
+model-based collaborative filtering, in which users and products are 
described
+by a small set of latent factors that can be used to predict missing 
entries.
+`spark.ml` uses the [alternating least squares
+(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
+algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
+following parameters:
+
+* *numBlocks* is the number of blocks the users and items will be 
partitioned into in order to parallelize computation (defaults to 10).
+* *rank* is the number of latent factors in the model (defaults to 10).
+* *maxIter* is the maximum number of iterations to run (defaults to 10).
+* *regParam* specifies the regularization parameter in ALS (defaults to 
1.0).
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS 
variant or one adapted for
+  *implicit feedback* data (defaults to `false` which means using 
*explicit feedback*).
+* *alpha* is a parameter applicable to the implicit feedback variant of 
ALS that governs the
+  *baseline* confidence in preference observations (defaults to 1.0).
+* *nonnegative* specifies whether or not to use nonnegative constraints 
for least squares (defaults to `false`).
+
+### Explicit vs. implicit feedback
+
+The standard approach to matrix factorization based collaborative 
filtering treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
--- End diff --

Worth giving "ratings" as the canonical example of explicit feedback?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13263] [SQL] SQL Generation Support for...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11148#discussion_r52728653
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
@@ -119,6 +120,18 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: 
SQLContext) extends Loggi
 limitSQL = limit.sql
   } yield s"$childSQL LIMIT $limitSQL"
 
+// TABLESAMPLE is part of tableSource clause in the parser,
+// and thus we must handle it with subquery.
+case Sample(lb, ub, withReplacement, _, child @ Subquery(alias, 
grandChild))
--- End diff --

Agree with @rxin. Especially, I don't think `MetastoreRelation` is wrapped 
by a `Subquery` in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13172] [CORE] [SQL] Stop using RichExce...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11182#issuecomment-183314618
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183314217
  
**[Test build #51185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51185/consoleFull)**
 for PR 6042 at commit 
[`ac4bc97`](https://github.com/apache/spark/commit/ac4bc97b2cfcfc9823f33ae2e35e65b223d952dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13172] [CORE] [SQL] Stop using RichExce...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11182#issuecomment-183313926
  
**[Test build #51184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51184/consoleFull)**
 for PR 11182 at commit 
[`a18092b`](https://github.com/apache/spark/commit/a18092bc1a5863994436dd88f9fd04c59706ac76).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183315958
  
**[Test build #51186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51186/consoleFull)**
 for PR 10539 at commit 
[`d749f6d`](https://github.com/apache/spark/commit/d749f6d081507877f4f479d09b4357c793219d87).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183315965
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51186/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183315960
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183265619
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183277552
  
**[Test build #51183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51183/consoleFull)**
 for PR 10876 at commit 
[`525d1f2`](https://github.com/apache/spark/commit/525d1f2f00dfd37f69b1eae27cc2005e5a5ce0fe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183277864
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13263] [SQL] SQL Generation Support for...

2016-02-12 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11148#discussion_r52729051
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/LogicalPlanToSQLSuite.scala 
---
@@ -145,6 +145,16 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
 checkHiveQl("SELECT COUNT(DISTINCT id) FROM t0")
   }
 
+  test("TABLESAMPLE") {
+checkHiveQl("SELECT * FROM t0 TABLESAMPLE(100 PERCENT) s")
--- End diff --

Testing tables created in this test suite are all Parquet tables, which are 
by default converted from `MetastoreRelation` to `ParquetRelation` wrapped with 
a `Subquery`. We should probably rename such tables, e.g., `t0` to 
`parquet_t0`, and add real text format `MetastoreRelation` testing tables via 
explicit HiveQL DDLs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183277869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51183/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13172] [CORE] [SQL] Stop using RichExce...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11182#issuecomment-183314626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51184/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread BenFradet

Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52738715
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.evaluation.RegressionEvaluator
+import org.apache.spark.ml.recommendation.ALS
+// $example off$
+import org.apache.spark.sql.SQLContext
+// $example on$
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.DoubleType
+// $example off$
+
+object ALSExample {
+
+  // $example on$
+  case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: 
Long)
+  object Rating {
+def parseRating(str: String): Rating = {
+  val fields = str.split("::")
+  assert(fields.size == 4)
+  Rating(fields(0).toInt, fields(1).toInt, fields(2).toFloat, 
fields(3).toLong)
+}
+  }
+  // $example off$
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName("ALSExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext.implicits._
+
+// $example on$
+val ratings = 
sc.textFile("data/mllib/als/sample_movielens_ratings.txt")
--- End diff --

Nope, the one removed is `sample_movielens_movies.txt`  as it was only used 
in `MovieLens.scala` which has been removed, cf the discussion on the jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...

2016-02-12 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11133#issuecomment-183266743
  
@devaraj-kavali this might be a legitimate failure. Does it pass for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13186][Streaming]Migrate away from Sync...

2016-02-12 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11104#issuecomment-183267563
  
@huaxingao can you rebase this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12962] [SQL] [PySpark] PySpark support ...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10876#issuecomment-183268139
  
**[Test build #51183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51183/consoleFull)**
 for PR 10876 at commit 
[`525d1f2`](https://github.com/apache/spark/commit/525d1f2f00dfd37f69b1eae27cc2005e5a5ce0fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52727772
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.evaluation.RegressionEvaluator
+import org.apache.spark.ml.recommendation.ALS
+// $example off$
+import org.apache.spark.sql.SQLContext
+// $example on$
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.DoubleType
+// $example off$
+
+object ALSExample {
+
+  // $example on$
+  case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: 
Long)
+  object Rating {
+def parseRating(str: String): Rating = {
+  val fields = str.split("::")
+  assert(fields.size == 4)
+  Rating(fields(0).toInt, fields(1).toInt, fields(2).toFloat, 
fields(3).toLong)
+}
+  }
+  // $example off$
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName("ALSExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext.implicits._
+
+// $example on$
+val ratings = 
sc.textFile("data/mllib/als/sample_movielens_ratings.txt")
--- End diff --

It looks like this file was removed though right? is it because we can't 
distribute even a sample of it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Update cache()'s storage level to be co...

2016-02-12 Thread wjur

Github user wjur closed the pull request at:

https://github.com/apache/spark/pull/11172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52727882
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+
+// $example on$
+import java.io.Serializable;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.evaluation.RegressionEvaluator;
+import org.apache.spark.ml.recommendation.ALS;
+import org.apache.spark.ml.recommendation.ALSModel;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.types.DataTypes;
+// $example off$
+
+public class JavaALSExample {
+
+  // $example on$
+  public static class Rating implements Serializable {
+private int userId;
+private int movieId;
+private float rating;
+private long timestamp;
+
+public int getUserId() {
+  return userId;
+}
+
+public void setUserId(int userId) {
--- End diff --

To keep the example simpler, do you really need setters instead of just 
constructor args? I personally am used to that as the default, with final 
fields.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12247] [ML] [DOC] Documentation for spa...

2016-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10411#discussion_r52727907
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+
+// $example on$
+import java.io.Serializable;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.evaluation.RegressionEvaluator;
+import org.apache.spark.ml.recommendation.ALS;
+import org.apache.spark.ml.recommendation.ALSModel;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.types.DataTypes;
+// $example off$
+
+public class JavaALSExample {
+
+  // $example on$
+  public static class Rating implements Serializable {
+private int userId;
+private int movieId;
+private float rating;
+private long timestamp;
+
+public int getUserId() {
+  return userId;
+}
+
+public void setUserId(int userId) {
+  this.userId = userId;
+}
+
+public int getMovieId() {
+  return movieId;
+}
+
+public void setMovieId(int movieId) {
+  this.movieId = movieId;
+}
+
+public float getRating() {
+  return rating;
+}
+
+public void setRating(float rating) {
+  this.rating = rating;
+}
+
+public long getTimestamp() {
+  return timestamp;
+}
+
+public void setTimestamp(long timestamp) {
+  this.timestamp = timestamp;
+}
+
+public static Rating parseRating(String str) {
+  String[] fields = str.split("::");
+  assert(fields.length == 4);
--- End diff --

You don't want to add `assert`s in Java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Update cache()'s storage level to be co...

2016-02-12 Thread wjur

Github user wjur commented on the pull request:

https://github.com/apache/spark/pull/11172#issuecomment-183272710
  
That's confusing... @srowen, you're correct. RDDs are not DataFrames. I'm 
really sorry for taking your time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183309759
  
@mengxr The update looks good, I've merged it. Thanks. As you modified the 
test cases in `PowerIterationClusteringSuite`, a minor question is the original 
test cases in the suite is not working even with using `Graph.apply` to replace 
`GraphImpl.fromExistingRDDs`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12363][Mllib] Remove setRun and fix Pow...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10539#issuecomment-183315690
  
**[Test build #51186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51186/consoleFull)**
 for PR 10539 at commit 
[`d749f6d`](https://github.com/apache/spark/commit/d749f6d081507877f4f479d09b4357c793219d87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Update cache()'s storage level to be co...

2016-02-12 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11172#issuecomment-183265666
  
I don't think that's correct:

```
  /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
  def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)

  /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
  def cache(): this.type = persist()

```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183315146
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183315149
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51185/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183315131
  
**[Test build #51185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51185/consoleFull)**
 for PR 6042 at commit 
[`ac4bc97`](https://github.com/apache/spark/commit/ac4bc97b2cfcfc9823f33ae2e35e65b223d952dd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added missing utility method

2016-02-12 Thread koeninger

Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/11173#issuecomment-183330790
  
Kafkacluster methods are public now, you can do this yourself.
I think its fine to just have two overloads, the easy mode and the flexible 
mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13267] [Web UI] document the ?param arg...

2016-02-12 Thread steveloughran

Github user steveloughran commented on the pull request:

https://github.com/apache/spark/pull/11152#issuecomment-183331075
  

  EndpointMeaning
  
/applications
A list of all applications
  
  
 
 ?status=[completed|running] list only applications in 
the chosen state
  
  
 
 ?minDate=[date] earliest date/time to list. Examples:
 ?minDate=2015-02-10
 ?minDate=2015-02-03T16:42:40.000GMT
  
  
 
 ?maxDate=[date] latest date/time to list; uses same 
format as minDate
  
  
/applications/[app-id]/jobs
A list of all jobs for a given application
  
  
 
 ?status=[complete|succeeded|failed] list only jobs in 
the state
  
  
/applications/[app-id]/jobs/[job-id]
Details for the given job
  
  
/applications/[app-id]/stages
A list of all stages for a given application
  
  
/applications/[app-id]/stages/[stage-id]
A list of all attempts for the given stage
  
  
 
 ?status=[active|complete|pending|failed] list only 
stages in the state
  
  

/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]
Details for the given stage attempt
  
  

/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskSummary
Summary metrics of all tasks in the given stage attempt
  
  
 
 ?quantiles summarize the metrics with the given 
quantiles.
 Example: ?quantiles=0.01,0.5,0.99
  
  

/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskList
A list of all tasks for the given stage attempt
  
  
 
 ?offset=[offset]length=[len] list only tasks in 
the range.
  Example:  ?offset=10length=50
  
  
 
 ?sortBy=[runtime|-runtime] sort the tasks
  
  
/applications/[app-id]/executors
A list of all executors for the given application
  
  
/applications/[app-id]/storage/rdd
A list of stored RDDs for the given application
  
  
/applications/[app-id]/storage/rdd/[rdd-id]
Details for the storage status of a given RDD
  
  
/applications/[app-id]/logs
Download the event logs for all attempts of the given application 
as a zip file
  
  
/applications/[app-id]/[attempt-id]/logs
Download the event logs for the specified attempt of the given 
application as a zip file
  




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6761][SQL] Approximate quantile for Dat...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6042#issuecomment-183356232
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11701][SPARK-13054] dynamic allocation ...

2016-02-12 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-183358808
  
Actually I see now a couple of the failures are due to throwing the 
Exception for commit denied so I'll look at those tests closer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9571#issuecomment-183332471
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51188/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

2016-02-12 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9571#issuecomment-183332463
  
**[Test build #51188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51188/consoleFull)**
 for PR 9571 at commit 
[`de1769b`](https://github.com/apache/spark/commit/de1769b85b7a9ac3fc160eaf7d465176e95d7100).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

2016-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9571#issuecomment-183332469
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Documentation] Added pygments.rb dependancy

2016-02-12 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11180#issuecomment-183351400
  
@amitdev can you connect this to 
https://issues.apache.org/jira/browse/SPARK-13300 in the title? it might be the 
same thing?

It looks like lots of the examples aren't rendered in the current live site?
http://spark.apache.org/examples.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 384 matches

Mail list logo