[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17742
  
**[Test build #76258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76258/testReport)**
 for PR 17742 at commit 
[`8eab55b`](https://github.com/apache/spark/commit/8eab55bccd51706d45e0ccb2281114df4310899c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17793: [SPARK-20484][MLLIB] Add documentation to ALS code

2017-04-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17793
  
+1 for this change. I'll try to take a look sometime, but maybe after the 
QA period. Also cc @MLnick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17556
  
I don't mind the weighted midpoints. However, if for a continuous feature 
we find that many points have the exact same value, we are assuming we may find 
data points in the test set that are close to but not these same values. But 
since our train data was clustered at these particular values, perhaps it's not 
a good assumption. I could live with either method, but maybe a slight 
preference to match the other libraries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17540
  
Personally, I'm fine with this patch, the only concern is we should have a 
follow up for nested query execution ASAP. And we should revert 
https://github.com/apache/spark/pull/17540#discussion_r112601926, which is just 
a hack for the test, as metrics without linking SparkPlan is useless, we should 
just fix the test instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-27 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17556#discussion_r113855186
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends Logging {
   // sort distinct values
   val valueCounts = valueCountMap.toSeq.sortBy(_._1).toArray
 
-  // if possible splits is not enough or just enough, just return all 
possible splits
+  def weightedMean(pre: (Double, Int), cur: (Double, Int)): Double = {
+val (preValue, preCount) = pre
+val (curValue, curCount) = cur
+(preValue * preCount + curValue * curCount) / (preCount.toDouble + 
curCount)
+  }
+
   val possibleSplits = valueCounts.length - 1
-  if (possibleSplits <= numSplits) {
-valueCounts.map(_._1).init
+  if (possibleSplits == 0) {
+// constant feature
+Array.empty[Double]
+
--- End diff --

remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-27 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17556#discussion_r113855243
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -138,9 +169,10 @@ class RandomForestSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 Array(2), Gini, QuantileStrategy.Sort,
 0, 0, 0.0, 0, 0
   )
-  val featureSamples = Array(0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2).map(_.toDouble)
+  val featureSamples = Array(0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2).map(_.toDouble)
   val splits = 
RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
-  assert(splits === Array(1.0))
+  val expSplits = Array((1.0 * 1 + 2.0 * 15) / (1 + 15))  // = (1.9375)
--- End diff --

just call them `expectedSplits`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-27 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17556#discussion_r113854473
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -112,9 +138,11 @@ class RandomForestSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 Array(5), Gini, QuantileStrategy.Sort,
 0, 0, 0.0, 0, 0
   )
-  val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
3).map(_.toDouble)
+  val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 
3).map(_.toDouble)
   val splits = 
RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
-  assert(splits === Array(1.0, 2.0))
+  val expSplits = Array((1.0 * 2 + 2.0 * 8) / (2 + 8),
+(2.0 * 8 + 3.0 * 2) / (8 + 2)) // = (1.8, 2.2)
--- End diff --

don't think the comments are necessary. The actual values don't mean much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-27 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17556#discussion_r113855209
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends Logging {
   // makes the gap between currentCount and targetCount smaller,
   // previous value is a split threshold.
   if (previousGap < currentGap) {
-splitsBuilder += valueCounts(index - 1)._1
+val pre = valueCounts(index - 1)
+val cur = valueCounts(index)
+
--- End diff --

remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113855222
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

Yea, however, for those types, we can't create that field because the check 
via 
[checkType](https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187)
 fails as it is not in 
[`PREMISITVE_TYPES`](https://github.com/apache/spark/blob/bc0a0e6392c4e729d8f0e4caffc0bd05adb0d950/R/pkg/R/types.R#L21-L39)
 as below:

```r
> structField("_col", "character")
Error in checkType(type) : Unsupported type for SparkDataframe: character
> structField("_col", "logical")
Error in checkType(type) : Unsupported type for SparkDataframe: logical
> structField("_col", "numeric")
Error in checkType(type) : Unsupported type for SparkDataframe: numeric
> structField("_col", "raw")
Error in checkType(type) : Unsupported type for SparkDataframe: raw
```

I double-checked this is the only place where we called `getSQLDataType` 
and therefore they look unreachable (I hope you could double-check this when 
you have some time for this one just in case I missed something about this).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17797: [SparkR][DOC]:Document LinearSVC in R programming...

2017-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17797
  
merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113854501
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

thanks for looking into it. if I take the diff,
```
character
logical
numeric
raw
```
these are actually R native type names though, for which if I have to 
guess, is intentional that we support R native type in structField as well as 
Scala/Spark types.

I'm not sure how much coverage we have for something like this but is that 
going to still work with this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76257/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17797
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17797
  
**[Test build #76257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76257/testReport)**
 for PR 17797 at commit 
[`3a59cc2`](https://github.com/apache/spark/commit/3a59cc2a1741a2dae6f20fa71e689a0dcc16c835).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17503
  
I think the benefit of this would be for speed at predict time or for model 
storage. @srowen the nodes don't have to be equal to be merged, they just have 
to output the same prediction. Since this a param that can be turned on or off, 
I don't see a problem.

That said, I'd be interested to know how much of an impact this makes. This 
is a semi-large change and probably not at the top of the list right now. Maybe 
@jkbradley can comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113853971
  
--- Diff: R/pkg/R/serialize.R ---
@@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) {
  Date = writeDate(con, object),
  POSIXlt = writeTime(con, object),
  POSIXct = writeTime(con, object),
+ bigint = writeDouble(con, object),
--- End diff --

I think this is different though, for PRIMITIVE_TYPES, it is used when you 
create a schema with structField in R. In this case you can definitely define a 
column as bigint and then pass a R numeric value to it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/17797
  
@felixcheung  As I checked the SparkR programming guide, it seems that all 
machine learning parts are links to existing documents. So I just add the link 
to Linear SVM document and tested it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113853483
  
--- Diff: R/pkg/R/serialize.R ---
@@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) {
  Date = writeDate(con, object),
  POSIXlt = writeTime(con, object),
  POSIXct = writeTime(con, object),
+ bigint = writeDouble(con, object),
--- End diff --

I see. But as you mentioned, we don't know how to trigger the write path on 
the R side, because both bigint and double are `numeric`. I think we can just 
remove the test in the R side. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17797
  
**[Test build #76257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76257/testReport)**
 for PR 17797 at commit 
[`3a59cc2`](https://github.com/apache/spark/commit/3a59cc2a1741a2dae6f20fa71e689a0dcc16c835).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17797: [SparkR][DOC]:Document LinearSVC in R programming...

2017-04-27 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/17797

[SparkR][DOC]:Document LinearSVC in R programming guide

## What changes were proposed in this pull request?

add link to svmLinear in the SparkR programming document.

## How was this patch tested?

Build doc manually and click the link to the document. It looks good.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17797


commit 3a59cc2a1741a2dae6f20fa71e689a0dcc16c835
Author: wangmiao1981 
Date:   2017-04-28T05:07:46Z

add link to linear svc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...

2017-04-27 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/16609
  
@gatorsmile sure. I will give a PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17303: [SPARK-19112][CORE] add codec for ZStandard

2017-04-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17303
  
I did quick benchmarks by using a TPCDS query (Q4) (I just referred the 
previous work in #10342)
Based on the result, it seems it's a bit earlier to implement this;
```
scaleFactor: 4
AWS instance: c4.4xlarge

-- zstd
Running execution q4-v1.4 iteration: 1, StandardRun=true
Execution time: 53.315878375s
Running execution q4-v1.4 iteration: 2, StandardRun=true
Execution time: 53.468174668s
Running execution q4-v1.4 iteration: 3, StandardRun=true
Execution time: 57.282403146s 

-- lz4
Running execution q4-v1.4 iteration: 1, StandardRun=true
Execution time: 20.779643053s
Running execution q4-v1.4 iteration: 2, StandardRun=true
Execution time: 16.520911319s
Running execution q4-v1.4 iteration: 3, StandardRun=true
Execution time: 15.897124967s

-- snappy
Running execution q4-v1.4 iteration: 1, StandardRun=true
Execution time: 21.13241203698s
Running execution q4-v1.4 iteration: 2, StandardRun=true
Execution time: 15.90886774398s 

Running execution q4-v1.4 iteration: 3, StandardRun=true
Execution time: 15.789648712s

-- lzf
Running execution q4-v1.4 iteration: 1, StandardRun=true
Execution time: 21.339518781s
Running execution q4-v1.4 iteration: 2, StandardRun=true
Execution time: 16.881225328s   

Running execution q4-v1.4 iteration: 3, StandardRun=true
Execution time: 15.813455479s
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113851957
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

I just wrote the details about this at my best. Yes, I think this should be 
targeted to master not 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113851718
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

Up to my knowledge, `getSQLDataType ` supports the types below:

```
binary
boolean
byte
character
date
double
float
integer
logical
numeric
raw
string
timestamp
array<...>
struct<...>
map<...>
```

and these look required to be _case-sensitive_ whereas `parseDataType` 
supports ...

```
bigint
binary
boolean
byte
char
date
decimal
double
float
int
integer
long
short
smallint
string
timestamp
tinyint
varchar
array<...>
struct<...>
map<...>
```

and these look _case-insensitive_.

I think the inital intention for `getSQLDataType` was to support R type 
string conversions but they look unreachable codes now because we were checking 
the type strings before actually calling `getSQLDataType` in 
[`checkType`](https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187).

If the types are not in `!is.null(PRIMITIVE_TYPES[[type]])` 
(_case-sensitive_), it looks throwing an error.

```
bigint
binary
boolean
byte
date
decimal
double
float
int
integer
smallint
string
timestamp
tinyint
array<...>
map<...>
struct<...>
```


In short, I think there should not be a behaviour change below types 
(intersection between `getSQLDataType` and `parseDataType`) ...


```
binary
string
double
float
boolean
timestamp
date
integer
byte
array<...>
map<...>
struct<...>
```

and these should be case-sensitive.


_Additionally_,  we will support the types below (which are written in R's 
[`PREMISITVE_TYPES`](https://github.com/apache/spark/blob/bc0a0e6392c4e729d8f0e4caffc0bd05adb0d950/R/pkg/R/types.R#L21-L39)
 but `getSQLDataType` did not support before):

```
tinyint
smallint
int
bigint
```

**Before**

```r
> structField("_col", "tinyint")
...
Error in handleErrors(returnStatus, conn) :
  

[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113851363
  
--- Diff: R/pkg/R/serialize.R ---
@@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) {
  Date = writeDate(con, object),
  POSIXlt = writeTime(con, object),
  POSIXct = writeTime(con, object),
+ bigint = writeDouble(con, object),
--- End diff --

if you are referring to 
https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25
like it says, `names(PRIMITIVE_TYPES) are Scala types whereas
# values are equivalent R types.` so `bigint` there is Scala type, not R 
type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...

2017-04-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17781
  
cc @cloud-fan @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76256/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17774
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17774
  
**[Test build #76256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76256/testReport)**
 for PR 17774 at commit 
[`d4a7867`](https://github.com/apache/spark/commit/d4a7867d96aa7c4bbed9cbd03b0753adcf79db9d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17783#discussion_r113849536
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1478,6 +1481,13 @@ test_that("column functions", {
 lapply(
   list(list(x = 1, y = -1, z = -2), list(x = 2, y = 3,  z = 5)),
   as.environment))
+
+  df <- as.DataFrame(data.frame(is_true = c(TRUE, FALSE, NA)))
+  expect_equal(
+collect(select(df, alias(SparkR::not(df$is_true), "is_false"))),
--- End diff --

we need `SparkR::` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17783#discussion_r113849200
  
--- Diff: R/pkg/R/column.R ---
@@ -67,8 +67,7 @@ operators <- list(
   "+" = "plus", "-" = "minus", "*" = "multiply", "/" = "divide", "%%" = 
"mod",
   "==" = "equalTo", ">" = "gt", "<" = "lt", "!=" = "notEqual", "<=" = 
"leq", ">=" = "geq",
   # we can not override `&&` and `||`, so use `&` and `|` instead
-  "&" = "and", "|" = "or", #, "!" = "unary_$bang"
-  "^" = "pow"
+  "&" = "and", "|" = "or", "^" = "pow"
--- End diff --

what happens with `#, `?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17783#discussion_r113849582
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1965,6 +1975,16 @@ test_that("filter() on a DataFrame", {
 
   # Test stats::filter is working
   #expect_true(is.ts(filter(1:100, rep(1, 3 # nolint
+
+  # test suites for %<=>%
--- End diff --

can you move this before `# Test stats::filter is working` block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17783#discussion_r113849285
  
--- Diff: R/pkg/R/column.R ---
@@ -302,3 +301,65 @@ setMethod("otherwise",
 jc <- callJMethod(x@jc, "otherwise", value)
 column(jc)
   })
+
+#' \%<=>\%
+#'
+#' Equality test that is safe for null values.
+#'
+#' Can be used, unlike standard equality operator, to perform null-safe 
joins.
+#' Equivalent to Scala \code{Column.<=>} and \code{Column.eqNullSafe}.
+#'
+#' @param x a Column
+#' @param value a value to compare
+#' @rdname eq_null_safe
+#' @name %<=>%
+#' @aliases %<=>%,Column-method
+#' @export
+#' @examples
+#' \dontrun{
+#' df1 <- createDataFrame(data.frame(
+#'   x = c(1, NA, 3, NA), y = c(2, 6, 3, NA)
+#' ))
+#'
+#' head(select(df1, df1$x == df1$y, df1$x %<=>% df1$y))
+#' ##  (x = y) (x <=> y)
+#' ##1   FALSE FALSE
+#' ##2  NA FALSE
+#' ##3TRUE  TRUE
+#' ##4  NA  TRUE
+#'
+#' df2 <- createDataFrame(data.frame(y = c(3, NA)))
+#' count(join(df1, df2, df1$y == df2$y))
+#' ## [1] 1
+#'
+#' count(join(df1, df2, df1$y %<=>% df2$y))
+#' ## [1] 2
+#' }
+#' @note \%<=>\% since 2.3.0
+setMethod("%<=>%",
+  signature(x = "Column", value = "ANY"),
+  function(x, value) {
+value <- if (class(value) == "Column") { value@jc } else { 
value }
+jc <- callJMethod(x@jc, "eqNullSafe", value)
+column(jc)
+  })
+
+#' !
+#'
+#' @rdname not
+#' @aliases !,Column-method
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(data.frame(x = c(-1, 0, 1)))
+#'
+#' head(select(df, !column("x") > 0))
+#' ##  (NOT (x > 0.0))
+#' ##1TRUE
+#' ##2TRUE
+#' ##3   FALSE
+#' }
+#' @note ! since 2.3.0
+setMethod("!",
+  signature(x = "Column"),
+  function(x) not(x))
--- End diff --

maybe this should be single line?
```
setMethod("!", signature(x = "Column"), function(x) not(x))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...

2017-04-27 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17786
  
@mgummelt We tested this in our production env, and it solves our issue. 
Since it seems to be a trivial change, I made my judgement. Gonna wait for more 
feedback. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17783#discussion_r113849107
  
--- Diff: R/pkg/R/column.R ---
@@ -302,3 +301,65 @@ setMethod("otherwise",
 jc <- callJMethod(x@jc, "otherwise", value)
 column(jc)
   })
+
+#' \%<=>\%
+#'
+#' Equality test that is safe for null values.
+#'
+#' Can be used, unlike standard equality operator, to perform null-safe 
joins.
+#' Equivalent to Scala \code{Column.<=>} and \code{Column.eqNullSafe}.
+#'
+#' @param x a Column
+#' @param value a value to compare
+#' @rdname eq_null_safe
+#' @name %<=>%
+#' @aliases %<=>%,Column-method
+#' @export
+#' @examples
+#' \dontrun{
+#' df1 <- createDataFrame(data.frame(
+#'   x = c(1, NA, 3, NA), y = c(2, 6, 3, NA)
+#' ))
+#'
+#' head(select(df1, df1$x == df1$y, df1$x %<=>% df1$y))
+#' ##  (x = y) (x <=> y)
+#' ##1   FALSE FALSE
+#' ##2  NA FALSE
+#' ##3TRUE  TRUE
+#' ##4  NA  TRUE
+#'
+#' df2 <- createDataFrame(data.frame(y = c(3, NA)))
+#' count(join(df1, df2, df1$y == df2$y))
+#' ## [1] 1
+#'
+#' count(join(df1, df2, df1$y %<=>% df2$y))
+#' ## [1] 2
+#' }
+#' @note \%<=>\% since 2.3.0
+setMethod("%<=>%",
+  signature(x = "Column", value = "ANY"),
+  function(x, value) {
+value <- if (class(value) == "Column") { value@jc } else { 
value }
+jc <- callJMethod(x@jc, "eqNullSafe", value)
+column(jc)
+  })
+
+#' !
+#'
+#' @rdname not
+#' @aliases !,Column-method
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(data.frame(x = c(-1, 0, 1)))
+#'
+#' head(select(df, !column("x") > 0))
+#' ##  (NOT (x > 0.0))
+#' ##1TRUE
+#' ##2TRUE
+#' ##3   FALSE
+#' }
+#' @note ! since 2.3.0
+setMethod("!",
--- End diff --

which lintr? the current release is 0.2.0?
but I don't think we have a pattern for including output in example doc. 

I think you could try
```
#' #  (x = y) (x <=> y)
```

or
```
#'  (x = y) (x <=> y)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...

2017-04-27 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17786
  
@mgummelt We tested this in our production env, and it solves our issue. 
Since it seems to be a trivial change, I made my judgement. Gonna wait for more 
feedback.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113847972
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

is it 
> R’s one is stricter because we are checking the types via regular 
expressions in R side ahead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17774
  
**[Test build #76256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76256/testReport)**
 for PR 17774 at commit 
[`d4a7867`](https://github.com/apache/spark/commit/d4a7867d96aa7c4bbed9cbd03b0753adcf79db9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17774
  
Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17774
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17785: [SPARK-20493][R] De-deuplicate parse logics for DDL-like...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17785
  
**[Test build #76255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76255/testReport)**
 for PR 17785 at commit 
[`257e625`](https://github.com/apache/spark/commit/257e62571ed45b028a419d4c6f880572f97dc717).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...

2017-04-27 Thread dgshep
Github user dgshep commented on the issue:

https://github.com/apache/spark/pull/17786
  
Fair point. This felt like a succinct way to handle this corner case, but 
if it makes sense to harden the offer refusal code instead, I can update.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113847076
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging {
 def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => 
"x"): _*)
   }
 
-  def getSQLDataType(dataType: String): DataType = {
-dataType match {
-  case "byte" => org.apache.spark.sql.types.ByteType
-  case "integer" => org.apache.spark.sql.types.IntegerType
-  case "float" => org.apache.spark.sql.types.FloatType
-  case "double" => org.apache.spark.sql.types.DoubleType
-  case "numeric" => org.apache.spark.sql.types.DoubleType
-  case "character" => org.apache.spark.sql.types.StringType
-  case "string" => org.apache.spark.sql.types.StringType
-  case "binary" => org.apache.spark.sql.types.BinaryType
-  case "raw" => org.apache.spark.sql.types.BinaryType
-  case "logical" => org.apache.spark.sql.types.BooleanType
-  case "boolean" => org.apache.spark.sql.types.BooleanType
-  case "timestamp" => org.apache.spark.sql.types.TimestampType
-  case "date" => org.apache.spark.sql.types.DateType
-  case r"\Aarray<(.+)${elemType}>\Z" =>
-org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType))
-  case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" =>
-if (keyType != "string" && keyType != "character") {
-  throw new IllegalArgumentException("Key type of a map must be 
string or character")
-}
-org.apache.spark.sql.types.MapType(getSQLDataType(keyType), 
getSQLDataType(valueType))
-  case r"\Astruct<(.+)${fieldsStr}>\Z" =>
-if (fieldsStr(fieldsStr.length - 1) == ',') {
-  throw new IllegalArgumentException(s"Invalid type $dataType")
-}
-val fields = fieldsStr.split(",")
-val structFields = fields.map { field =>
-  field match {
-case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" =>
-  createStructField(fieldName, fieldType, true)
-
-case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-  }
-}
-createStructType(structFields)
-  case _ => throw new IllegalArgumentException(s"Invalid type 
$dataType")
-}
-  }
-
   def createStructField(name: String, dataType: String, nullable: 
Boolean): StructField = {
-val dtObj = getSQLDataType(dataType)
+val dtObj = CatalystSqlParser.parseDataType(dataType)
--- End diff --

haven't checked myself, what are the differences if any between 
`getSQLDataType` and `CatalystSqlParser.parseDataType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17785#discussion_r113846935
  
--- Diff: R/pkg/R/utils.R ---
@@ -864,6 +864,14 @@ captureJVMException <- function(e, method) {
 # Extract the first message of JVM exception.
 first <- strsplit(msg[2], "\r?\n\tat")[[1]][1]
 stop(paste0(rmsg, "no such table - ", first), call. = FALSE)
+  } else if 
(any(grep("org.apache.spark.sql.catalyst.parser.ParseException: ", 
stacktrace))) {
+msg <- strsplit(stacktrace, 
"org.apache.spark.sql.catalyst.parser.ParseException: ",
+fixed = TRUE)[[1]]
--- End diff --

indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76254/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17742
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17742
  
**[Test build #76254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76254/testReport)**
 for PR 17742 at commit 
[`206a023`](https://github.com/apache/spark/commit/206a023433805e8d55b0cb30eebde130b4245bf9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17130#discussion_r113846602
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -82,8 +81,8 @@ private[fpm] trait FPGrowthParams extends Params with 
HasPredictionCol {
   def getNumPartitions: Int = $(numPartitions)
 
   /**
-   * Minimal confidence for generating Association Rule.
-   * Note that minConfidence has no effect during fitting.
+   * Minimal confidence for generating Association Rule. MinConfidence 
will not affect the mining
--- End diff --

ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17130#discussion_r113846530
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -268,12 +269,8 @@ class FPGrowthModel private[ml] (
 val predictUDF = udf((items: Seq[_]) => {
   if (items != null) {
 val itemset = items.toSet
-brRules.value.flatMap(rule =>
-  if (items != null && rule._1.forall(item => 
itemset.contains(item))) {
-rule._2.filter(item => !itemset.contains(item))
-  } else {
-Seq.empty
-  }).distinct
+brRules.value.filter(_._1.forall(itemset.contains))
+  .flatMap(_._2.filter(!itemset.contains(_))).distinct
--- End diff --

let's update the PR/JIRA if code change is required for the doc change.
otherwise, let's leave code change as a separate PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17130#discussion_r113846563
  
--- Diff: docs/ml-frequent-pattern-mining.md ---
@@ -0,0 +1,87 @@
+---
+layout: global
+title: Frequent Pattern Mining
+displayTitle: Frequent Pattern Mining
+---
+
+Mining frequent items, itemsets, subsequences, or other substructures is 
usually among the
+first steps to analyze a large-scale dataset, which has been an active 
research topic in
+data mining for years.
+We refer users to Wikipedia's [association rule 
learning](http://en.wikipedia.org/wiki/Association_rule_learning)
+for more information.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## FP-Growth
+
+The FP-growth algorithm is described in the paper
+[Han et al., Mining frequent patterns without candidate 
generation](http://dx.doi.org/10.1145/335191.335372),
+where "FP" stands for frequent pattern.
+Given a dataset of transactions, the first step of FP-growth is to 
calculate item frequencies and identify frequent items.
+Different from 
[Apriori-like](http://en.wikipedia.org/wiki/Apriori_algorithm) algorithms 
designed for the same purpose,
+the second step of FP-growth uses a suffix tree (FP-tree) structure to 
encode transactions without generating candidate sets
+explicitly, which are usually expensive to generate.
+After the second step, the frequent itemsets can be extracted from the 
FP-tree.
+In `spark.mllib`, we implemented a parallel version of FP-growth called 
PFP,
+as described in [Li et al., PFP: Parallel FP-growth for query 
recommendation](http://dx.doi.org/10.1145/1454008.1454027).
+PFP distributes the work of growing FP-trees based on the suffixes of 
transactions,
+and hence is more scalable than a single-machine implementation.
+We refer users to the papers for more details.
+
+`spark.ml`'s FP-growth implementation takes the following 
(hyper-)parameters:
+
+* `minSupport`: the minimum support for an itemset to be identified as 
frequent.
+  For example, if an item appears 3 out of 5 transactions, it has a 
support of 3/5=0.6.
+* `minConfidence`: minimum confidence for generating Association Rule. 
Confidence is an indication of how often an
+  association rule has been found to be true. For example, if in the 
transactions itemset `X` appears 4 times, `X`
+  and `Y` co-occur only 2 times, the confidence for the rule `X => Y` is 
then 2/4 = 0.5. The parameter will not
+  affect the mining for frequent itemsets, but specify the minimum 
confidence for generating association rules
+  from frequent itemsets.
+* `numPartitions`: the number of partitions used to distribute the work. 
By default the param is not set, and
+  number of partitions of the input dataset is used.
+
+The `FPGrowthModel` provides:
+
+* `freqItemsets`: frequent itemsets in the format of 
DataFrame("items"[Array], "freq"[Long])
+* `associationRules`: association rules generated with confidence above 
`minConfidence`, in the format of 
+  DataFrame("antecedent"[Array], "consequent"[Array], 
"confidence"[Double]).
+* `transform`: For each transaction in itemsCol, the `transform` method 
will compare its items against the antecedents
--- End diff --

I mean style it as code with backtick


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17796: [SPARK-20519][SQL][CORE]Modify to prevent some possible ...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17796
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17796: [SPARK-20519][SQL][CORE]Modify to prevent some po...

2017-04-27 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/17796

[SPARK-20519][SQL][CORE]Modify to prevent some possible runtime exceptions

Signed-off-by: liuxian 

## What changes were proposed in this pull request?

For some functions,when the input parameter is null, may be a runtime 
exception occurs

## How was this patch tested?
Existing unit tests or add unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark wip_lx_0428

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17796


commit 572b18150dfcfe810d2687b3b8f622b98a4fd5c6
Author: liuxian 
Date:   2017-04-28T02:36:23Z

Modify to prevent some possible runtime exception

Signed-off-by: liuxian 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76251/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76251/testReport)**
 for PR 17765 at commit 
[`6e66638`](https://github.com/apache/spark/commit/6e666386c6ac54279063787b8d6cea618114fdcd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-27 Thread johnc1231
Github user johnc1231 commented on the issue:

https://github.com/apache/spark/pull/17459
  
@viirya Any more feedback on this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17742
  
**[Test build #76254 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76254/testReport)**
 for PR 17742 at commit 
[`206a023`](https://github.com/apache/spark/commit/206a023433805e8d55b0cb30eebde130b4245bf9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17781
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17781
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76252/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17781
  
**[Test build #76252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76252/testReport)**
 for PR 17781 at commit 
[`7839a1b`](https://github.com/apache/spark/commit/7839a1bac8487cb1e1399f892b5dbca05fb42440).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the name of inter...

2017-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17596


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...

2017-04-27 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17596
  
LGTM - merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17795: [SPARK-20517][UI] Fix broken history UI download ...

2017-04-27 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/17795

[SPARK-20517][UI] Fix broken history UI download link

## What changes were proposed in this pull request?

The download link in history server UI is concatenated with:

```
  Download
```

Here `num` filed represents number of attempts, this is not equal to REST 
APIs. In the REST API, if attempt id is not existed the URL should be 
`api/v1/applications//logs`, otherwise the URL should be 
`api/v1/applications///logs`. Using `` to represent 
`` will lead to the issue of "no such app".

## How was this patch tested?

Manual verification.

CC @ajbozarth can you please review this change, since you add this feature 
before? Thanks!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-20517

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17795


commit 3fdba116b6e92802c3e9e89efa8827cef1a0d1f8
Author: jerryshao 
Date:   2017-04-28T02:22:46Z

Fix broken history UI download link

Change-Id: If6d86bb229f352065eccae3d8efa3bdaf9ba755a




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17795: [SPARK-20517][UI] Fix broken history UI download link

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17795
  
**[Test build #76253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76253/testReport)**
 for PR 17795 at commit 
[`3fdba11`](https://github.com/apache/spark/commit/3fdba116b6e92802c3e9e89efa8827cef1a0d1f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2017-04-27 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/17702
  
@HyukjinKwon Can you help me to find a appropriate reviewer about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17794: Supplement the new blockidsuite unit tests

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17794
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76250/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17794: Supplement the new blockidsuite unit tests

2017-04-27 Thread heary-cao
GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/17794

Supplement the new blockidsuite unit tests

## What changes were proposed in this pull request?

This PR adds the new unit tests to support ShuffleDataBlockId , 
ShuffleIndexBlockId , TempShuffleBlockId , TempLocalBlockId

## How was this patch tested?

The new unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark blockidsuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17794


commit 22da759cf6026a21e22cc3ce182bc64e92535520
Author: caoxuewen 
Date:   2017-04-28T02:28:02Z

Supplement the new blockidsuite unit tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76250/testReport)**
 for PR 17765 at commit 
[`f9342c9`](https://github.com/apache/spark/commit/f9342c9c6f8aad75d0578d0f62717ef2a651a0ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17793: [SPARK-20484][MLLIB] Add documentation to ALS code

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17793
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17736
  
@cloud-fan Do you mean `SELECT \\abc`?

Spark 2.x: 

sql("select '\\abc'").show()

+---+
|abc|
+---+
|abc|
+---+

sql("select 'ab\\tc'").show()

++
|ab c|
++
|ab c|
++

sql("select 'ab\tc'").show()

++
|ab c|
++
|ab c|
++


Spark 1.6:

sql("select '\\abc'").show()

++
| _c0|
++
|\abc|
++

sql("select 'ab\\tc'").show()  // 1.6 doesn't perform unescape, so this 
doesn't work.

+-+
|  _c0|
+-+
|ab\tc|
+-+

sql("select 'ab\tc'").show()

++
| _c0|
++
|ab c|
++





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17793: [SPARK-20484][MLLIB] Add documentation to ALS cod...

2017-04-27 Thread danielyli
GitHub user danielyli opened a pull request:

https://github.com/apache/spark/pull/17793

[SPARK-20484][MLLIB] Add documentation to ALS code

## What changes were proposed in this pull request?

This PR adds documentation to the ALS code.

## How was this patch tested?

Existing tests were used.

@mengxr @srowen 

This contribution is my original work.  I have the license to work on this 
project under the Spark project’s open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielyli/spark spark-20484

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17793


commit 4661ddbbe02265333f03becc1a0cd10b29fd4109
Author: Daniel Li 
Date:   2017-04-28T01:26:51Z

Add documentation for the `InBlock` class

commit 7d1491e27dc7d6dbe634f4274584dc1fe9a8ecae
Author: Daniel Li 
Date:   2017-04-28T01:41:05Z

Add documentation for the `OutBlock` data type

commit 2fdbcaa70f7d487cff4885ed87e7ee609aa6b24b
Author: Daniel Li 
Date:   2017-04-28T01:43:37Z

Add documentation for `partitionRatings` method

commit fb8f16df6c5b744a9312226493899ed09bf8d1ce
Author: Daniel Li 
Date:   2017-04-28T01:45:51Z

Add documentation for `ALS.train` method

commit 0a2edf0a09bdbb1ff81f1cde9a8c60b15ce2b68f
Author: Daniel Li 
Date:   2017-04-28T01:50:37Z

Add inline comments to `ALS.train` method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17789: [SPARK-19525][CORE]Add RDD checkpoint compression suppor...

2017-04-27 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17789
  
To add, for non streaming usecases, this will definitely help - but was 
this a recent change for streaming ? (probably after @aramesh117 make the PR ?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-04-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17436
  
 When shall we free a column vector? One is when the iterator is consumed 
up, another one is when we have a `LIMIT n` in the query and stop reading the 
iterator at some point. Is there any other cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17789: [SPARK-19525][CORE]Add RDD checkpoint compression suppor...

2017-04-27 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17789
  
I thought the main reason @aramesh117 did this PR was for compression to be 
enabled for spark streaming usecase.
If compression is already enabled, then am I missing something here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76249/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76249 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76249/testReport)**
 for PR 17765 at commit 
[`915d67b`](https://github.com/apache/spark/commit/915d67b6f6b802e5644f031ef11a2ba49ceedc6d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17771: [SPARK-20471]Remove AggregateBenchmark testsuite warning...

2017-04-27 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/17771
  
@gatorsmile 
ok, 
please review it again.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17792: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...

2017-04-27 Thread anabranch
Github user anabranch closed the pull request at:

https://github.com/apache/spark/pull/17792


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...

2017-04-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17645#discussion_r113836900
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -42,15 +44,35 @@ import org.apache.spark.sql.functions.{col, lit}
 /** Params for linear SVM Classifier. */
 private[classification] trait LinearSVCParams extends ClassifierParams 
with HasRegParam
   with HasMaxIter with HasFitIntercept with HasTol with HasStandardization 
with HasWeightCol
-  with HasThreshold with HasAggregationDepth
+  with HasThreshold with HasAggregationDepth {
+
+  /**
+   * Specifies the loss function. Currently "hinge" and "squared_hinge" 
are supported.
+   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while 
"squared_hinge" is the square of
+   * the hinge loss (a.k.a. L2 loss).
+   *
+   * @see https://en.wikipedia.org/wiki/Hinge_loss;>Hinge loss 
(Wikipedia)
+   *
+   * @group param
+   */
+  @Since("2.3.0")
+  final val lossFunction: Param[String] = new Param(this, "lossFunction", 
"Specifies the loss " +
--- End diff --

Sure we can do it. 
But I'm thinking maybe we should conduct an integrated refactor about the 
common optimization parameters some time in the future, either through shared 
params or other trait or abstract class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17792: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...

2017-04-27 Thread anabranch
GitHub user anabranch opened a pull request:

https://github.com/apache/spark/pull/17792

[SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans

## What changes were proposed in this pull request?

We didn't enforce analyzed plans in Spark 2.1 when writing out to Kafka.

## How was this patch tested?

New unit test.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anabranch/spark SPARK-20496

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17792


commit 5bafdc45d6493f2ea41cc4bce0faa5f93ff3162c
Author: Shixiong Zhu 
Date:   2016-12-23T23:38:41Z

[SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use 
ConcurrentHashMap to make it faster

## What changes were proposed in this pull request?

The time complexity of ConcurrentHashMap's `remove` is O(1). Changing 
ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to 
`ConcurrentHashMap's` will make the removal much faster.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu 

Closes #16390 from zsxwing/SPARK-18991.

(cherry picked from commit a848f0ba84e37fd95d0f47863ec68326e3296b33)
Signed-off-by: Shixiong Zhu 

commit ca25b1e51f036fb837e3fe8218cb04d7360e049d
Author: Kousuke Saruta 
Date:   2016-12-24T13:02:58Z

[SPARK-18837][WEBUI] Very long stage descriptions do not wrap in the UI

## What changes were proposed in this pull request?

This issue was reported by wangyum.

In the AllJobsPage, JobPage and StagePage, the description length was 
limited before like as follows.

![ui-2 0 
0](https://cloud.githubusercontent.com/assets/4736016/21319673/8b225246-c651-11e6-9041-4fcdd04f4dec.gif)

But recently, the limitation seems to have been accidentally removed.

![ui-2 1 
0](https://cloud.githubusercontent.com/assets/4736016/21319825/104779f6-c652-11e6-8bfa-dfd800396352.gif)

The cause is that some tables are no longer `sortable` class although they 
were, and `sortable` class does not only mark tables as sortable but also 
limited the width of their child `td` elements.
The reason why now some tables are not `sortable` class is because another 
sortable mechanism was introduced by #13620 and #13708 with pagination feature.

To fix this issue, I've introduced new class `table-cell-width-limited` 
which limits the description cell width and the description is like what it was.

https://cloud.githubusercontent.com/assets/4736016/21320478/89141c7a-c654-11e6-8494-f8f91325980b.png;>

## How was this patch tested?

Tested manually with my browser.

Author: Kousuke Saruta 

Closes #16338 from sarutak/SPARK-18837.

(cherry picked from commit f2ceb2abe9357942a51bd643683850efd1fc9df7)
Signed-off-by: Sean Owen 

commit ac7107fe70fcd0b584001c10dd624a4d8757109c
Author: Carson Wang 
Date:   2016-12-28T12:12:44Z

[MINOR][DOC] Fix doc of ForeachWriter to use writeStream

## What changes were proposed in this pull request?

Fix the document of `ForeachWriter` to use `writeStream` instead of `write` 
for a streaming dataset.

## How was this patch tested?
Docs only.

Author: Carson Wang 

Closes #16419 from carsonwang/FixDoc.

(cherry picked from commit 2a5f52a7146abc05bf70e65eb2267cd869ac4789)
Signed-off-by: Sean Owen 

commit 7197a7bc7061e2908b6430f494dba378378d5d02
Author: Sean Owen 
Date:   2016-12-28T12:17:33Z

[SPARK-18993][BUILD] Unable to build/compile Spark in IntelliJ due to 
missing Scala deps in spark-tags

## What changes were proposed in this pull request?

This adds back a direct dependency on Scala library classes from spark-tags 
because its Scala annotations need them.

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #16418 from srowen/SPARK-18993.

(cherry picked from commit d7bce3bd31ec193274718042dc017706989d7563)
Signed-off-by: Sean Owen 

commit 80d583bd09de54890cddfcc0c6fd807d7200ea75
Author: Tathagata Das 
Date:   2016-12-28T20:11:25Z

[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming 
regarding watermarking and status

## What changes were proposed 

[GitHub] spark pull request #17787: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...

2017-04-27 Thread anabranch
Github user anabranch closed the pull request at:

https://github.com/apache/spark/pull/17787


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread jtengyp
Github user jtengyp commented on the issue:

https://github.com/apache/spark/pull/17742
  
I did some tests with the PR.
Here is the cluster configure:
3 workers, each has 10 cores and 30G memory.
With the netflix dataset (480,189 users and 17770 movies), the 
recommendProductsForUsers time reduces from 488.36s to 60.93s, 8x faster than 
the original method.

With a larger dataset (3.29million users and 0.21 million products), the 
recommendProductsForUsers time reduces from 48h to 39min, 73x faster than the 
original method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17781
  
**[Test build #76252 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76252/testReport)**
 for PR 17781 at commit 
[`7839a1b`](https://github.com/apache/spark/commit/7839a1bac8487cb1e1399f892b5dbca05fb42440).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-04-27 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/17735
  
@brkyvz please take a another look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-27 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/17540
  
@zsxwing and @cloud-fan, can you have another look at this? I'd really like 
to get it in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76246/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76245/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76245/testReport)**
 for PR 17765 at commit 
[`6ab66e2`](https://github.com/apache/spark/commit/6ab66e202193d8bb6a942207fc42ee8fff580e9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76246/testReport)**
 for PR 17765 at commit 
[`992d68f`](https://github.com/apache/spark/commit/992d68fca1b10abff7e8539925a8af237155cc8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-27 Thread kunalkhamar
Github user kunalkhamar commented on a diff in the pull request:

https://github.com/apache/spark/pull/17765#discussion_r113831182
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -825,6 +832,11 @@ class StreamExecution(
 }
   }
 
+  private def getBatchDescriptionString: String = {
+val batchDescription = if (currentBatchId < 0) "init" else 
currentBatchId.toString
+Option(name).map(_ + "").getOrElse("") +
+  s"id = $idrunId = $runIdbatch = $batchDescription"
--- End diff --

@marmbrus @zsxwing @tdas Updated as per comments, the screenshots are in 
the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17790
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721

2017-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17790
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76244/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-27 Thread kunalkhamar
Github user kunalkhamar commented on a diff in the pull request:

https://github.com/apache/spark/pull/17765#discussion_r113830998
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -825,6 +832,11 @@ class StreamExecution(
 }
   }
 
+  private def getBatchDescriptionString: String = {
+val batchDescription = if (currentBatchId < 0) "init" else 
currentBatchId.toString
+Option(name).map(_ + " ").getOrElse("") +
+  s"[batch = $batchDescription,id = $id,runId = $runId]"
--- End diff --

Yes, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721

2017-04-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17790
  
**[Test build #76244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76244/testReport)**
 for PR 17790 at commit 
[`ecfb8e3`](https://github.com/apache/spark/commit/ecfb8e3f276eeb276ed0a3293a68ff93a6f9e88e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >