[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17742 **[Test build #76258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76258/testReport)** for PR 17742 at commit [`8eab55b`](https://github.com/apache/spark/commit/8eab55bccd51706d45e0ccb2281114df4310899c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17793: [SPARK-20484][MLLIB] Add documentation to ALS code
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17793 +1 for this change. I'll try to take a look sometime, but maybe after the QA period. Also cc @MLnick. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 I don't mind the weighted midpoints. However, if for a continuous feature we find that many points have the exact same value, we are assuming we may find data points in the test set that are close to but not these same values. But since our train data was clustered at these particular values, perhaps it's not a good assumption. I could live with either method, but maybe a slight preference to match the other libraries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17540 Personally, I'm fine with this patch, the only concern is we should have a follow up for nested query execution ASAP. And we should revert https://github.com/apache/spark/pull/17540#discussion_r112601926, which is just a hack for the test, as metrics without linking SparkPlan is useless, we should just fix the test instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r113855186 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends Logging { // sort distinct values val valueCounts = valueCountMap.toSeq.sortBy(_._1).toArray - // if possible splits is not enough or just enough, just return all possible splits + def weightedMean(pre: (Double, Int), cur: (Double, Int)): Double = { +val (preValue, preCount) = pre +val (curValue, curCount) = cur +(preValue * preCount + curValue * curCount) / (preCount.toDouble + curCount) + } + val possibleSplits = valueCounts.length - 1 - if (possibleSplits <= numSplits) { -valueCounts.map(_._1).init + if (possibleSplits == 0) { +// constant feature +Array.empty[Double] + --- End diff -- remove this line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r113855243 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -138,9 +169,10 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { Array(2), Gini, QuantileStrategy.Sort, 0, 0, 0.0, 0, 0 ) - val featureSamples = Array(0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2).map(_.toDouble) + val featureSamples = Array(0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2).map(_.toDouble) val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0) - assert(splits === Array(1.0)) + val expSplits = Array((1.0 * 1 + 2.0 * 15) / (1 + 15)) // = (1.9375) --- End diff -- just call them `expectedSplits` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r113854473 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -112,9 +138,11 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { Array(5), Gini, QuantileStrategy.Sort, 0, 0, 0.0, 0, 0 ) - val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble) + val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3).map(_.toDouble) val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0) - assert(splits === Array(1.0, 2.0)) + val expSplits = Array((1.0 * 2 + 2.0 * 8) / (2 + 8), +(2.0 * 8 + 3.0 * 2) / (8 + 2)) // = (1.8, 2.2) --- End diff -- don't think the comments are necessary. The actual values don't mean much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r113855209 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends Logging { // makes the gap between currentCount and targetCount smaller, // previous value is a split threshold. if (previousGap < currentGap) { -splitsBuilder += valueCounts(index - 1)._1 +val pre = valueCounts(index - 1) +val cur = valueCounts(index) + --- End diff -- remove this line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113855222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- Yea, however, for those types, we can't create that field because the check via [checkType](https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187) fails as it is not in [`PREMISITVE_TYPES`](https://github.com/apache/spark/blob/bc0a0e6392c4e729d8f0e4caffc0bd05adb0d950/R/pkg/R/types.R#L21-L39) as below: ```r > structField("_col", "character") Error in checkType(type) : Unsupported type for SparkDataframe: character > structField("_col", "logical") Error in checkType(type) : Unsupported type for SparkDataframe: logical > structField("_col", "numeric") Error in checkType(type) : Unsupported type for SparkDataframe: numeric > structField("_col", "raw") Error in checkType(type) : Unsupported type for SparkDataframe: raw ``` I double-checked this is the only place where we called `getSQLDataType` and therefore they look unreachable (I hope you could double-check this when you have some time for this one just in case I missed something about this). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17797: [SparkR][DOC]:Document LinearSVC in R programming...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17797 merged to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113854501 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- thanks for looking into it. if I take the diff, ``` character logical numeric raw ``` these are actually R native type names though, for which if I have to guess, is intentional that we support R native type in structField as well as Scala/Spark types. I'm not sure how much coverage we have for something like this but is that going to still work with this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17797 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76257/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17797 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17797 **[Test build #76257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76257/testReport)** for PR 17797 at commit [`3a59cc2`](https://github.com/apache/spark/commit/3a59cc2a1741a2dae6f20fa71e689a0dcc16c835). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17503 I think the benefit of this would be for speed at predict time or for model storage. @srowen the nodes don't have to be equal to be merged, they just have to output the same prediction. Since this a param that can be turned on or off, I don't see a problem. That said, I'd be interested to know how much of an impact this makes. This is a semi-large change and probably not at the top of the list right now. Maybe @jkbradley can comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113853971 --- Diff: R/pkg/R/serialize.R --- @@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) { Date = writeDate(con, object), POSIXlt = writeTime(con, object), POSIXct = writeTime(con, object), + bigint = writeDouble(con, object), --- End diff -- I think this is different though, for PRIMITIVE_TYPES, it is used when you create a schema with structField in R. In this case you can definitely define a column as bigint and then pass a R numeric value to it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17797 @felixcheung As I checked the SparkR programming guide, it seems that all machine learning parts are links to existing documents. So I just add the link to Linear SVM document and tested it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113853483 --- Diff: R/pkg/R/serialize.R --- @@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) { Date = writeDate(con, object), POSIXlt = writeTime(con, object), POSIXct = writeTime(con, object), + bigint = writeDouble(con, object), --- End diff -- I see. But as you mentioned, we don't know how to trigger the write path on the R side, because both bigint and double are `numeric`. I think we can just remove the test in the R side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17797: [SparkR][DOC]:Document LinearSVC in R programming guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17797 **[Test build #76257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76257/testReport)** for PR 17797 at commit [`3a59cc2`](https://github.com/apache/spark/commit/3a59cc2a1741a2dae6f20fa71e689a0dcc16c835). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17797: [SparkR][DOC]:Document LinearSVC in R programming...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17797 [SparkR][DOC]:Document LinearSVC in R programming guide ## What changes were proposed in this pull request? add link to svmLinear in the SparkR programming document. ## How was this patch tested? Build doc manually and click the link to the document. It looks good. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17797.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17797 commit 3a59cc2a1741a2dae6f20fa71e689a0dcc16c835 Author: wangmiao1981Date: 2017-04-28T05:07:46Z add link to linear svc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...
Github user phatak-dev commented on the issue: https://github.com/apache/spark/pull/16609 @gatorsmile sure. I will give a PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17303: [SPARK-19112][CORE] add codec for ZStandard
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17303 I did quick benchmarks by using a TPCDS query (Q4) (I just referred the previous work in #10342) Based on the result, it seems it's a bit earlier to implement this; ``` scaleFactor: 4 AWS instance: c4.4xlarge -- zstd Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 53.315878375s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 53.468174668s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 57.282403146s -- lz4 Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 20.779643053s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 16.520911319s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 15.897124967s -- snappy Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 21.13241203698s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 15.90886774398s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 15.789648712s -- lzf Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 21.339518781s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 16.881225328s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 15.813455479s ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113851957 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- I just wrote the details about this at my best. Yes, I think this should be targeted to master not 2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113851718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- Up to my knowledge, `getSQLDataType ` supports the types below: ``` binary boolean byte character date double float integer logical numeric raw string timestamp array<...> struct<...> map<...> ``` and these look required to be _case-sensitive_ whereas `parseDataType` supports ... ``` bigint binary boolean byte char date decimal double float int integer long short smallint string timestamp tinyint varchar array<...> struct<...> map<...> ``` and these look _case-insensitive_. I think the inital intention for `getSQLDataType` was to support R type string conversions but they look unreachable codes now because we were checking the type strings before actually calling `getSQLDataType` in [`checkType`](https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187). If the types are not in `!is.null(PRIMITIVE_TYPES[[type]])` (_case-sensitive_), it looks throwing an error. ``` bigint binary boolean byte date decimal double float int integer smallint string timestamp tinyint array<...> map<...> struct<...> ``` In short, I think there should not be a behaviour change below types (intersection between `getSQLDataType` and `parseDataType`) ... ``` binary string double float boolean timestamp date integer byte array<...> map<...> struct<...> ``` and these should be case-sensitive. _Additionally_, we will support the types below (which are written in R's [`PREMISITVE_TYPES`](https://github.com/apache/spark/blob/bc0a0e6392c4e729d8f0e4caffc0bd05adb0d950/R/pkg/R/types.R#L21-L39) but `getSQLDataType` did not support before): ``` tinyint smallint int bigint ``` **Before** ```r > structField("_col", "tinyint") ... Error in handleErrors(returnStatus, conn) :
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113851363 --- Diff: R/pkg/R/serialize.R --- @@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) { Date = writeDate(con, object), POSIXlt = writeTime(con, object), POSIXct = writeTime(con, object), + bigint = writeDouble(con, object), --- End diff -- if you are referring to https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25 like it says, `names(PRIMITIVE_TYPES) are Scala types whereas # values are equivalent R types.` so `bigint` there is Scala type, not R type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17781 cc @cloud-fan @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17774 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76256/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17774 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17774 **[Test build #76256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76256/testReport)** for PR 17774 at commit [`d4a7867`](https://github.com/apache/spark/commit/d4a7867d96aa7c4bbed9cbd03b0753adcf79db9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17783#discussion_r113849536 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1478,6 +1481,13 @@ test_that("column functions", { lapply( list(list(x = 1, y = -1, z = -2), list(x = 2, y = 3, z = 5)), as.environment)) + + df <- as.DataFrame(data.frame(is_true = c(TRUE, FALSE, NA))) + expect_equal( +collect(select(df, alias(SparkR::not(df$is_true), "is_false"))), --- End diff -- we need `SparkR::` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17783#discussion_r113849200 --- Diff: R/pkg/R/column.R --- @@ -67,8 +67,7 @@ operators <- list( "+" = "plus", "-" = "minus", "*" = "multiply", "/" = "divide", "%%" = "mod", "==" = "equalTo", ">" = "gt", "<" = "lt", "!=" = "notEqual", "<=" = "leq", ">=" = "geq", # we can not override `&&` and `||`, so use `&` and `|` instead - "&" = "and", "|" = "or", #, "!" = "unary_$bang" - "^" = "pow" + "&" = "and", "|" = "or", "^" = "pow" --- End diff -- what happens with `#, `? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17783#discussion_r113849582 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1965,6 +1975,16 @@ test_that("filter() on a DataFrame", { # Test stats::filter is working #expect_true(is.ts(filter(1:100, rep(1, 3 # nolint + + # test suites for %<=>% --- End diff -- can you move this before `# Test stats::filter is working` block --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17783#discussion_r113849285 --- Diff: R/pkg/R/column.R --- @@ -302,3 +301,65 @@ setMethod("otherwise", jc <- callJMethod(x@jc, "otherwise", value) column(jc) }) + +#' \%<=>\% +#' +#' Equality test that is safe for null values. +#' +#' Can be used, unlike standard equality operator, to perform null-safe joins. +#' Equivalent to Scala \code{Column.<=>} and \code{Column.eqNullSafe}. +#' +#' @param x a Column +#' @param value a value to compare +#' @rdname eq_null_safe +#' @name %<=>% +#' @aliases %<=>%,Column-method +#' @export +#' @examples +#' \dontrun{ +#' df1 <- createDataFrame(data.frame( +#' x = c(1, NA, 3, NA), y = c(2, 6, 3, NA) +#' )) +#' +#' head(select(df1, df1$x == df1$y, df1$x %<=>% df1$y)) +#' ## (x = y) (x <=> y) +#' ##1 FALSE FALSE +#' ##2 NA FALSE +#' ##3TRUE TRUE +#' ##4 NA TRUE +#' +#' df2 <- createDataFrame(data.frame(y = c(3, NA))) +#' count(join(df1, df2, df1$y == df2$y)) +#' ## [1] 1 +#' +#' count(join(df1, df2, df1$y %<=>% df2$y)) +#' ## [1] 2 +#' } +#' @note \%<=>\% since 2.3.0 +setMethod("%<=>%", + signature(x = "Column", value = "ANY"), + function(x, value) { +value <- if (class(value) == "Column") { value@jc } else { value } +jc <- callJMethod(x@jc, "eqNullSafe", value) +column(jc) + }) + +#' ! +#' +#' @rdname not +#' @aliases !,Column-method +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(data.frame(x = c(-1, 0, 1))) +#' +#' head(select(df, !column("x") > 0)) +#' ## (NOT (x > 0.0)) +#' ##1TRUE +#' ##2TRUE +#' ##3 FALSE +#' } +#' @note ! since 2.3.0 +setMethod("!", + signature(x = "Column"), + function(x) not(x)) --- End diff -- maybe this should be single line? ``` setMethod("!", signature(x = "Column"), function(x) not(x)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/17786 @mgummelt We tested this in our production env, and it solves our issue. Since it seems to be a trivial change, I made my judgement. Gonna wait for more feedback. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17783: [SPARK-20490][SPARKR][WIP] Add R wrappers for eqN...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17783#discussion_r113849107 --- Diff: R/pkg/R/column.R --- @@ -302,3 +301,65 @@ setMethod("otherwise", jc <- callJMethod(x@jc, "otherwise", value) column(jc) }) + +#' \%<=>\% +#' +#' Equality test that is safe for null values. +#' +#' Can be used, unlike standard equality operator, to perform null-safe joins. +#' Equivalent to Scala \code{Column.<=>} and \code{Column.eqNullSafe}. +#' +#' @param x a Column +#' @param value a value to compare +#' @rdname eq_null_safe +#' @name %<=>% +#' @aliases %<=>%,Column-method +#' @export +#' @examples +#' \dontrun{ +#' df1 <- createDataFrame(data.frame( +#' x = c(1, NA, 3, NA), y = c(2, 6, 3, NA) +#' )) +#' +#' head(select(df1, df1$x == df1$y, df1$x %<=>% df1$y)) +#' ## (x = y) (x <=> y) +#' ##1 FALSE FALSE +#' ##2 NA FALSE +#' ##3TRUE TRUE +#' ##4 NA TRUE +#' +#' df2 <- createDataFrame(data.frame(y = c(3, NA))) +#' count(join(df1, df2, df1$y == df2$y)) +#' ## [1] 1 +#' +#' count(join(df1, df2, df1$y %<=>% df2$y)) +#' ## [1] 2 +#' } +#' @note \%<=>\% since 2.3.0 +setMethod("%<=>%", + signature(x = "Column", value = "ANY"), + function(x, value) { +value <- if (class(value) == "Column") { value@jc } else { value } +jc <- callJMethod(x@jc, "eqNullSafe", value) +column(jc) + }) + +#' ! +#' +#' @rdname not +#' @aliases !,Column-method +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(data.frame(x = c(-1, 0, 1))) +#' +#' head(select(df, !column("x") > 0)) +#' ## (NOT (x > 0.0)) +#' ##1TRUE +#' ##2TRUE +#' ##3 FALSE +#' } +#' @note ! since 2.3.0 +setMethod("!", --- End diff -- which lintr? the current release is 0.2.0? but I don't think we have a pattern for including output in example doc. I think you could try ``` #' # (x = y) (x <=> y) ``` or ``` #' (x = y) (x <=> y) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/17786 @mgummelt We tested this in our production env, and it solves our issue. Since it seems to be a trivial change, I made my judgement. Gonna wait for more feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113847972 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- is it > Râs one is stricter because we are checking the types via regular expressions in R side ahead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17774 **[Test build #76256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76256/testReport)** for PR 17774 at commit [`d4a7867`](https://github.com/apache/spark/commit/d4a7867d96aa7c4bbed9cbd03b0753adcf79db9d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17774 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17774 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17785: [SPARK-20493][R] De-deuplicate parse logics for DDL-like...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17785 **[Test build #76255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76255/testReport)** for PR 17785 at commit [`257e625`](https://github.com/apache/spark/commit/257e62571ed45b028a419d4c6f880572f97dc717). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17786: [SPARK-20483] Mesos Coarse mode may starve other Mesos f...
Github user dgshep commented on the issue: https://github.com/apache/spark/pull/17786 Fair point. This felt like a succinct way to handle this corner case, but if it makes sense to harden the offer refusal code instead, I can update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113847076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -92,48 +93,8 @@ private[sql] object SQLUtils extends Logging { def r: Regex = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*) } - def getSQLDataType(dataType: String): DataType = { -dataType match { - case "byte" => org.apache.spark.sql.types.ByteType - case "integer" => org.apache.spark.sql.types.IntegerType - case "float" => org.apache.spark.sql.types.FloatType - case "double" => org.apache.spark.sql.types.DoubleType - case "numeric" => org.apache.spark.sql.types.DoubleType - case "character" => org.apache.spark.sql.types.StringType - case "string" => org.apache.spark.sql.types.StringType - case "binary" => org.apache.spark.sql.types.BinaryType - case "raw" => org.apache.spark.sql.types.BinaryType - case "logical" => org.apache.spark.sql.types.BooleanType - case "boolean" => org.apache.spark.sql.types.BooleanType - case "timestamp" => org.apache.spark.sql.types.TimestampType - case "date" => org.apache.spark.sql.types.DateType - case r"\Aarray<(.+)${elemType}>\Z" => -org.apache.spark.sql.types.ArrayType(getSQLDataType(elemType)) - case r"\Amap<(.+)${keyType},(.+)${valueType}>\Z" => -if (keyType != "string" && keyType != "character") { - throw new IllegalArgumentException("Key type of a map must be string or character") -} -org.apache.spark.sql.types.MapType(getSQLDataType(keyType), getSQLDataType(valueType)) - case r"\Astruct<(.+)${fieldsStr}>\Z" => -if (fieldsStr(fieldsStr.length - 1) == ',') { - throw new IllegalArgumentException(s"Invalid type $dataType") -} -val fields = fieldsStr.split(",") -val structFields = fields.map { field => - field match { -case r"\A(.+)${fieldName}:(.+)${fieldType}\Z" => - createStructField(fieldName, fieldType, true) - -case _ => throw new IllegalArgumentException(s"Invalid type $dataType") - } -} -createStructType(structFields) - case _ => throw new IllegalArgumentException(s"Invalid type $dataType") -} - } - def createStructField(name: String, dataType: String, nullable: Boolean): StructField = { -val dtObj = getSQLDataType(dataType) +val dtObj = CatalystSqlParser.parseDataType(dataType) --- End diff -- haven't checked myself, what are the differences if any between `getSQLDataType` and `CatalystSqlParser.parseDataType`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17785: [SPARK-20493][R] De-deuplicate parse logics for D...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17785#discussion_r113846935 --- Diff: R/pkg/R/utils.R --- @@ -864,6 +864,14 @@ captureJVMException <- function(e, method) { # Extract the first message of JVM exception. first <- strsplit(msg[2], "\r?\n\tat")[[1]][1] stop(paste0(rmsg, "no such table - ", first), call. = FALSE) + } else if (any(grep("org.apache.spark.sql.catalyst.parser.ParseException: ", stacktrace))) { +msg <- strsplit(stacktrace, "org.apache.spark.sql.catalyst.parser.ParseException: ", +fixed = TRUE)[[1]] --- End diff -- indent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76254/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17742 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17742 **[Test build #76254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76254/testReport)** for PR 17742 at commit [`206a023`](https://github.com/apache/spark/commit/206a023433805e8d55b0cb30eebde130b4245bf9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113846602 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -82,8 +81,8 @@ private[fpm] trait FPGrowthParams extends Params with HasPredictionCol { def getNumPartitions: Int = $(numPartitions) /** - * Minimal confidence for generating Association Rule. - * Note that minConfidence has no effect during fitting. + * Minimal confidence for generating Association Rule. MinConfidence will not affect the mining --- End diff -- ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113846530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -268,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF = udf((items: Seq[_]) => { if (items != null) { val itemset = items.toSet -brRules.value.flatMap(rule => - if (items != null && rule._1.forall(item => itemset.contains(item))) { -rule._2.filter(item => !itemset.contains(item)) - } else { -Seq.empty - }).distinct +brRules.value.filter(_._1.forall(itemset.contains)) + .flatMap(_._2.filter(!itemset.contains(_))).distinct --- End diff -- let's update the PR/JIRA if code change is required for the doc change. otherwise, let's leave code change as a separate PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113846563 --- Diff: docs/ml-frequent-pattern-mining.md --- @@ -0,0 +1,87 @@ +--- +layout: global +title: Frequent Pattern Mining +displayTitle: Frequent Pattern Mining +--- + +Mining frequent items, itemsets, subsequences, or other substructures is usually among the +first steps to analyze a large-scale dataset, which has been an active research topic in +data mining for years. +We refer users to Wikipedia's [association rule learning](http://en.wikipedia.org/wiki/Association_rule_learning) +for more information. + +**Table of Contents** + +* This will become a table of contents (this text will be scraped). +{:toc} + +## FP-Growth + +The FP-growth algorithm is described in the paper +[Han et al., Mining frequent patterns without candidate generation](http://dx.doi.org/10.1145/335191.335372), +where "FP" stands for frequent pattern. +Given a dataset of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items. +Different from [Apriori-like](http://en.wikipedia.org/wiki/Apriori_algorithm) algorithms designed for the same purpose, +the second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets +explicitly, which are usually expensive to generate. +After the second step, the frequent itemsets can be extracted from the FP-tree. +In `spark.mllib`, we implemented a parallel version of FP-growth called PFP, +as described in [Li et al., PFP: Parallel FP-growth for query recommendation](http://dx.doi.org/10.1145/1454008.1454027). +PFP distributes the work of growing FP-trees based on the suffixes of transactions, +and hence is more scalable than a single-machine implementation. +We refer users to the papers for more details. + +`spark.ml`'s FP-growth implementation takes the following (hyper-)parameters: + +* `minSupport`: the minimum support for an itemset to be identified as frequent. + For example, if an item appears 3 out of 5 transactions, it has a support of 3/5=0.6. +* `minConfidence`: minimum confidence for generating Association Rule. Confidence is an indication of how often an + association rule has been found to be true. For example, if in the transactions itemset `X` appears 4 times, `X` + and `Y` co-occur only 2 times, the confidence for the rule `X => Y` is then 2/4 = 0.5. The parameter will not + affect the mining for frequent itemsets, but specify the minimum confidence for generating association rules + from frequent itemsets. +* `numPartitions`: the number of partitions used to distribute the work. By default the param is not set, and + number of partitions of the input dataset is used. + +The `FPGrowthModel` provides: + +* `freqItemsets`: frequent itemsets in the format of DataFrame("items"[Array], "freq"[Long]) +* `associationRules`: association rules generated with confidence above `minConfidence`, in the format of + DataFrame("antecedent"[Array], "consequent"[Array], "confidence"[Double]). +* `transform`: For each transaction in itemsCol, the `transform` method will compare its items against the antecedents --- End diff -- I mean style it as code with backtick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17796: [SPARK-20519][SQL][CORE]Modify to prevent some possible ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17796 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17796: [SPARK-20519][SQL][CORE]Modify to prevent some po...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/17796 [SPARK-20519][SQL][CORE]Modify to prevent some possible runtime exceptions Signed-off-by: liuxian## What changes were proposed in this pull request? For some functions,when the input parameter is null, may be a runtime exception occurs ## How was this patch tested? Existing unit tests or add unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark wip_lx_0428 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17796 commit 572b18150dfcfe810d2687b3b8f622b98a4fd5c6 Author: liuxian Date: 2017-04-28T02:36:23Z Modify to prevent some possible runtime exception Signed-off-by: liuxian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76251/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76251/testReport)** for PR 17765 at commit [`6e66638`](https://github.com/apache/spark/commit/6e666386c6ac54279063787b8d6cea618114fdcd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Any more feedback on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17742 **[Test build #76254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76254/testReport)** for PR 17742 at commit [`206a023`](https://github.com/apache/spark/commit/206a023433805e8d55b0cb30eebde130b4245bf9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76252/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17781 **[Test build #76252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76252/testReport)** for PR 17781 at commit [`7839a1b`](https://github.com/apache/spark/commit/7839a1bac8487cb1e1399f892b5dbca05fb42440). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the name of inter...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17596 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17596 LGTM - merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17795: [SPARK-20517][UI] Fix broken history UI download ...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/17795 [SPARK-20517][UI] Fix broken history UI download link ## What changes were proposed in this pull request? The download link in history server UI is concatenated with: ``` Download ``` Here `num` filed represents number of attempts, this is not equal to REST APIs. In the REST API, if attempt id is not existed the URL should be `api/v1/applications//logs`, otherwise the URL should be `api/v1/applications///logs`. Using `` to represent `` will lead to the issue of "no such app". ## How was this patch tested? Manual verification. CC @ajbozarth can you please review this change, since you add this feature before? Thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-20517 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17795 commit 3fdba116b6e92802c3e9e89efa8827cef1a0d1f8 Author: jerryshaoDate: 2017-04-28T02:22:46Z Fix broken history UI download link Change-Id: If6d86bb229f352065eccae3d8efa3bdaf9ba755a --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17795: [SPARK-20517][UI] Fix broken history UI download link
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17795 **[Test build #76253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76253/testReport)** for PR 17795 at commit [`3fdba11`](https://github.com/apache/spark/commit/3fdba116b6e92802c3e9e89efa8827cef1a0d1f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/17702 @HyukjinKwon Can you help me to find a appropriate reviewer about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17794: Supplement the new blockidsuite unit tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17794 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76250/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17794: Supplement the new blockidsuite unit tests
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/17794 Supplement the new blockidsuite unit tests ## What changes were proposed in this pull request? This PR adds the new unit tests to support ShuffleDataBlockId , ShuffleIndexBlockId , TempShuffleBlockId , TempLocalBlockId ## How was this patch tested? The new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark blockidsuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17794 commit 22da759cf6026a21e22cc3ce182bc64e92535520 Author: caoxuewenDate: 2017-04-28T02:28:02Z Supplement the new blockidsuite unit tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76250/testReport)** for PR 17765 at commit [`f9342c9`](https://github.com/apache/spark/commit/f9342c9c6f8aad75d0578d0f62717ef2a651a0ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17793: [SPARK-20484][MLLIB] Add documentation to ALS code
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17793 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @cloud-fan Do you mean `SELECT \\abc`? Spark 2.x: sql("select '\\abc'").show() +---+ |abc| +---+ |abc| +---+ sql("select 'ab\\tc'").show() ++ |ab c| ++ |ab c| ++ sql("select 'ab\tc'").show() ++ |ab c| ++ |ab c| ++ Spark 1.6: sql("select '\\abc'").show() ++ | _c0| ++ |\abc| ++ sql("select 'ab\\tc'").show() // 1.6 doesn't perform unescape, so this doesn't work. +-+ | _c0| +-+ |ab\tc| +-+ sql("select 'ab\tc'").show() ++ | _c0| ++ |ab c| ++ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17793: [SPARK-20484][MLLIB] Add documentation to ALS cod...
GitHub user danielyli opened a pull request: https://github.com/apache/spark/pull/17793 [SPARK-20484][MLLIB] Add documentation to ALS code ## What changes were proposed in this pull request? This PR adds documentation to the ALS code. ## How was this patch tested? Existing tests were used. @mengxr @srowen This contribution is my original work. I have the license to work on this project under the Spark projectâs open source license. You can merge this pull request into a Git repository by running: $ git pull https://github.com/danielyli/spark spark-20484 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17793 commit 4661ddbbe02265333f03becc1a0cd10b29fd4109 Author: Daniel LiDate: 2017-04-28T01:26:51Z Add documentation for the `InBlock` class commit 7d1491e27dc7d6dbe634f4274584dc1fe9a8ecae Author: Daniel Li Date: 2017-04-28T01:41:05Z Add documentation for the `OutBlock` data type commit 2fdbcaa70f7d487cff4885ed87e7ee609aa6b24b Author: Daniel Li Date: 2017-04-28T01:43:37Z Add documentation for `partitionRatings` method commit fb8f16df6c5b744a9312226493899ed09bf8d1ce Author: Daniel Li Date: 2017-04-28T01:45:51Z Add documentation for `ALS.train` method commit 0a2edf0a09bdbb1ff81f1cde9a8c60b15ce2b68f Author: Daniel Li Date: 2017-04-28T01:50:37Z Add inline comments to `ALS.train` method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17789: [SPARK-19525][CORE]Add RDD checkpoint compression suppor...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17789 To add, for non streaming usecases, this will definitely help - but was this a recent change for streaming ? (probably after @aramesh117 make the PR ?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17436 When shall we free a column vector? One is when the iterator is consumed up, another one is when we have a `LIMIT n` in the query and stop reading the iterator at some point. Is there any other cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17789: [SPARK-19525][CORE]Add RDD checkpoint compression suppor...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17789 I thought the main reason @aramesh117 did this PR was for compression to be enabled for spark streaming usecase. If compression is already enabled, then am I missing something here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76249/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76249/testReport)** for PR 17765 at commit [`915d67b`](https://github.com/apache/spark/commit/915d67b6f6b802e5644f031ef11a2ba49ceedc6d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17771: [SPARK-20471]Remove AggregateBenchmark testsuite warning...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/17771 @gatorsmile ok, please review it again. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17792: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...
Github user anabranch closed the pull request at: https://github.com/apache/spark/pull/17792 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17645#discussion_r113836900 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,15 +44,35 @@ import org.apache.spark.sql.functions.{col, lit} /** Params for linear SVM Classifier. */ private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol - with HasThreshold with HasAggregationDepth + with HasThreshold with HasAggregationDepth { + + /** + * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported. + * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of + * the hinge loss (a.k.a. L2 loss). + * + * @see https://en.wikipedia.org/wiki/Hinge_loss;>Hinge loss (Wikipedia) + * + * @group param + */ + @Since("2.3.0") + final val lossFunction: Param[String] = new Param(this, "lossFunction", "Specifies the loss " + --- End diff -- Sure we can do it. But I'm thinking maybe we should conduct an integrated refactor about the common optimization parameters some time in the future, either through shared params or other trait or abstract class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17792: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...
GitHub user anabranch opened a pull request: https://github.com/apache/spark/pull/17792 [SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans ## What changes were proposed in this pull request? We didn't enforce analyzed plans in Spark 2.1 when writing out to Kafka. ## How was this patch tested? New unit test. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/anabranch/spark SPARK-20496 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17792.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17792 commit 5bafdc45d6493f2ea41cc4bce0faa5f93ff3162c Author: Shixiong ZhuDate: 2016-12-23T23:38:41Z [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use ConcurrentHashMap to make it faster ## What changes were proposed in this pull request? The time complexity of ConcurrentHashMap's `remove` is O(1). Changing ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to `ConcurrentHashMap's` will make the removal much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #16390 from zsxwing/SPARK-18991. (cherry picked from commit a848f0ba84e37fd95d0f47863ec68326e3296b33) Signed-off-by: Shixiong Zhu commit ca25b1e51f036fb837e3fe8218cb04d7360e049d Author: Kousuke Saruta Date: 2016-12-24T13:02:58Z [SPARK-18837][WEBUI] Very long stage descriptions do not wrap in the UI ## What changes were proposed in this pull request? This issue was reported by wangyum. In the AllJobsPage, JobPage and StagePage, the description length was limited before like as follows. ![ui-2 0 0](https://cloud.githubusercontent.com/assets/4736016/21319673/8b225246-c651-11e6-9041-4fcdd04f4dec.gif) But recently, the limitation seems to have been accidentally removed. ![ui-2 1 0](https://cloud.githubusercontent.com/assets/4736016/21319825/104779f6-c652-11e6-8bfa-dfd800396352.gif) The cause is that some tables are no longer `sortable` class although they were, and `sortable` class does not only mark tables as sortable but also limited the width of their child `td` elements. The reason why now some tables are not `sortable` class is because another sortable mechanism was introduced by #13620 and #13708 with pagination feature. To fix this issue, I've introduced new class `table-cell-width-limited` which limits the description cell width and the description is like what it was. https://cloud.githubusercontent.com/assets/4736016/21320478/89141c7a-c654-11e6-8494-f8f91325980b.png;> ## How was this patch tested? Tested manually with my browser. Author: Kousuke Saruta Closes #16338 from sarutak/SPARK-18837. (cherry picked from commit f2ceb2abe9357942a51bd643683850efd1fc9df7) Signed-off-by: Sean Owen commit ac7107fe70fcd0b584001c10dd624a4d8757109c Author: Carson Wang Date: 2016-12-28T12:12:44Z [MINOR][DOC] Fix doc of ForeachWriter to use writeStream ## What changes were proposed in this pull request? Fix the document of `ForeachWriter` to use `writeStream` instead of `write` for a streaming dataset. ## How was this patch tested? Docs only. Author: Carson Wang Closes #16419 from carsonwang/FixDoc. (cherry picked from commit 2a5f52a7146abc05bf70e65eb2267cd869ac4789) Signed-off-by: Sean Owen commit 7197a7bc7061e2908b6430f494dba378378d5d02 Author: Sean Owen Date: 2016-12-28T12:17:33Z [SPARK-18993][BUILD] Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags ## What changes were proposed in this pull request? This adds back a direct dependency on Scala library classes from spark-tags because its Scala annotations need them. ## How was this patch tested? Existing tests Author: Sean Owen Closes #16418 from srowen/SPARK-18993. (cherry picked from commit d7bce3bd31ec193274718042dc017706989d7563) Signed-off-by: Sean Owen commit 80d583bd09de54890cddfcc0c6fd807d7200ea75 Author: Tathagata Das Date: 2016-12-28T20:11:25Z [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status ## What changes were proposed
[GitHub] spark pull request #17787: [SPARK-20496][SS] Bug in KafkaWriter Looks at Una...
Github user anabranch closed the pull request at: https://github.com/apache/spark/pull/17787 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...
Github user jtengyp commented on the issue: https://github.com/apache/spark/pull/17742 I did some tests with the PR. Here is the cluster configure: 3 workers, each has 10 cores and 30G memory. With the netflix dataset (480,189 users and 17770 movies), the recommendProductsForUsers time reduces from 488.36s to 60.93s, 8x faster than the original method. With a larger dataset (3.29million users and 0.21 million products), the recommendProductsForUsers time reduces from 48h to 39min, 73x faster than the original method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17781: [SPARK-20476] [SQL] Block users to create a table that u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17781 **[Test build #76252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76252/testReport)** for PR 17781 at commit [`7839a1b`](https://github.com/apache/spark/commit/7839a1bac8487cb1e1399f892b5dbca05fb42440). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 @brkyvz please take a another look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @zsxwing and @cloud-fan, can you have another look at this? I'd really like to get it in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76246/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76245/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76245/testReport)** for PR 17765 at commit [`6ab66e2`](https://github.com/apache/spark/commit/6ab66e202193d8bb6a942207fc42ee8fff580e9c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76246/testReport)** for PR 17765 at commit [`992d68f`](https://github.com/apache/spark/commit/992d68fca1b10abff7e8539925a8af237155cc8e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...
Github user kunalkhamar commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113831182 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +832,11 @@ class StreamExecution( } } + private def getBatchDescriptionString: String = { +val batchDescription = if (currentBatchId < 0) "init" else currentBatchId.toString +Option(name).map(_ + "").getOrElse("") + + s"id = $idrunId = $runIdbatch = $batchDescription" --- End diff -- @marmbrus @zsxwing @tdas Updated as per comments, the screenshots are in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17790 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17790 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76244/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...
Github user kunalkhamar commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113830998 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +832,11 @@ class StreamExecution( } } + private def getBatchDescriptionString: String = { +val batchDescription = if (currentBatchId < 0) "init" else currentBatchId.toString +Option(name).map(_ + " ").getOrElse("") + + s"[batch = $batchDescription,id = $id,runId = $runId]" --- End diff -- Yes, updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17790: [SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17790 **[Test build #76244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76244/testReport)** for PR 17790 at commit [`ecfb8e3`](https://github.com/apache/spark/commit/ecfb8e3f276eeb276ed0a3293a68ff93a6f9e88e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org