[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71303/testReport)** for PR 16464 at commit [`882c70d`](https://github.com/apache/spark/commit/882c70da32756e7603bd293b2ba010a585fdc0c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71302/testReport)** for PR 16464 at commit [`b72592c`](https://github.com/apache/spark/commit/b72592ce02e9a8af518a103ab81a2dfe8a103d51). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16565 LGTM except one comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16512 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16512 **[Test build #71300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71300/testReport)** for PR 16512 at commit [`4fa1998`](https://github.com/apache/spark/commit/4fa19987433be48fa006e86b5f9e140f2c297c1c). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71300/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16565 I checked the change history. Actually, you also backported https://github.com/apache/spark/pull/15111. Could you please update your PR description and PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71301/testReport)** for PR 16464 at commit [`0134a26`](https://github.com/apache/spark/commit/0134a2693f6abfc51d0c11d693b97971072affaa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16512 **[Test build #71300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71300/testReport)** for PR 16512 at commit [`4fa1998`](https://github.com/apache/spark/commit/4fa19987433be48fa006e86b5f9e140f2c297c1c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16565 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71290/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16565 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16565 **[Test build #71290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71290/consoleFull)** for PR 16565 at commit [`e2c2fae`](https://github.com/apache/spark/commit/e2c2fae70204a2f5891fdfd8d516c273b2d72648). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r95948100 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"), vocabSize <- callJMethod(jobj, "vocabSize") topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic)) vocabulary <- callJMethod(jobj, "vocabulary") +trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood") +logPrior <- callJMethod(jobj, "logPrior") --- End diff -- I think it's more appropriate to return ```NULL``` rather than ```NaN``` for local LDA model, since the ```logPrior``` is not existing rather than not a number. BTW, I think we can return NULL directly according to ```isDistributed```, otherwise, call corresponding Scala methods. This should reduce the complexity of ```LDAWrapper``` and reduce communication between R and Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r95948289 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -388,6 +388,13 @@ setMethod("spark.lda", signature(data = "SparkDataFrame"), #' \item{\code{topics}}{top 10 terms and their weights of all topics} #' \item{\code{vocabulary}}{whole terms of the training corpus, NULL if libsvm format file #' used as training set} +#' \item{\code{trainingLogLikelihood}}{Log likelihood of the observed tokens in the training set, +#' given the current parameter estimates: +#' log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters) +#' It is only for \code{DistributedLDAModel} (i.e., optimizer = "em")} --- End diff -- ```\code{DistributedLDAModel}``` should convert to text description, since there is no class called ```DistributedLDAModel``` in SparkR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16555 **[Test build #71299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71299/testReport)** for PR 16555 at commit [`7722c4e`](https://github.com/apache/spark/commit/7722c4e233a3ecb6d50db73e8a4040c1ab7dd1b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16555 cc @sameeragarwal @davies @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16555 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r95947533 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule /** * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * with a Project and add an alias for each output attribute by mapping the child output by index, + * if the view output doesn't have the same number of columns with the child output, throw an + * AnalysisException. + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case v @ View(_, output, child) if child.resolved => - val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + if (output.length != child.output.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + val newOutput = output.zip(child.output).map { +case (attr, originAttr) => + if (attr.dataType != originAttr.dataType) { --- End diff -- cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r95947477 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule /** * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * with a Project and add an alias for each output attribute by mapping the child output by index, + * if the view output doesn't have the same number of columns with the child output, throw an + * AnalysisException. + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case v @ View(_, output, child) if child.resolved => - val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + if (output.length != child.output.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + val newOutput = output.zip(child.output).map { +case (attr, originAttr) => + if (attr.dataType != originAttr.dataType) { --- End diff -- ``` hive> explain extended select * from testview; OK ABSTRACT SYNTAX TREE: TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME testview TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR TOK_ALLCOLREF STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: testtable Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE GatherStats: false Select Operator expressions: a (type: bigint), b (type: tinyint) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE ListSink ``` **`expressions: a (type: bigint), b (type: tinyint)`**. I tried to alter the columns in the underlying tables to different types. I can see the types of view columns are always casted to the same one as the altered one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16566: [SparkR]: add bisecting kmeans R wrapper
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16566 **[Test build #71298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71298/testReport)** for PR 16566 at commit [`2ad596e`](https://github.com/apache/spark/commit/2ad596e6f9adb0c3b037c3cc1a379c0019167f08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r95947257 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule /** * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * with a Project and add an alias for each output attribute by mapping the child output by index, + * if the view output doesn't have the same number of columns with the child output, throw an + * AnalysisException. + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case v @ View(_, output, child) if child.resolved => - val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + if (output.length != child.output.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + val newOutput = output.zip(child.output).map { +case (attr, originAttr) => + if (attr.dataType != originAttr.dataType) { --- End diff -- It sounds like Hive just forcefully cast it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16558: Fix missing close-parens for In filter's toString
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16558: Fix missing close-parens for In filter's toString
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16558 Alright i'm going to merge this given JIRA is down ... merging in master/branch-2.1/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16568 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16568 **[Test build #71297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71297/testReport)** for PR 16568 at commit [`cb0c6ce`](https://github.com/apache/spark/commit/cb0c6ce950373c7b8d1191282170e27f96ddd2bf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16550 Thanks! Merging to master. This JIRA is targeting to 2.2.0. Should we merge it to Spark 2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/16555 retest please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16550: [SPARK-19178][SQL] convert string of large number...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16550 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16500 **[Test build #71296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71296/testReport)** for PR 16500 at commit [`203e36c`](https://github.com/apache/spark/commit/203e36c80fb967ed0ba21ec51942bd5bb17cca7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16568 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71294/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16568 **[Test build #71294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71294/testReport)** for PR 16568 at commit [`93f3a41`](https://github.com/apache/spark/commit/93f3a414c41887d7be6938491f2fb70badfe95c7). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16568 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16568 **[Test build #71294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71294/testReport)** for PR 16568 at commit [`93f3a41`](https://github.com/apache/spark/commit/93f3a414c41887d7be6938491f2fb70badfe95c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16500 **[Test build #71295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71295/testReport)** for PR 16500 at commit [`11507cc`](https://github.com/apache/spark/commit/11507ccebd9c48b2e340e5c7baf5b4e0a81c771b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16568 [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final ## What changes were proposed in this pull request? Upgrade Netty to 4.0.43.Final to add the fix for https://github.com/netty/netty/issues/6153 ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-18971 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16568.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16568 commit 93f3a414c41887d7be6938491f2fb70badfe95c7 Author: Shixiong ZhuDate: 2017-01-13T06:45:21Z Upgrade Netty to 4.0.43.Final --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 4 in Of...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/16555 Thanks @aray , good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16500: [SPARK-19120] Refresh Metadata Cache After Loadin...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16500#discussion_r95944437 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -392,7 +392,9 @@ case class InsertIntoHiveTable( // Invalidate the cache. sqlContext.sharedState.cacheManager.invalidateCache(table) - sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +if (partition.nonEmpty) { + sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +} --- End diff -- Agree --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10238: [SPARK-2750][WEB UI] Add https support to the Web UI
Github user LizzyMiao commented on the issue: https://github.com/apache/spark/pull/10238 @vanzin @WangTaoTheTonic @scwf can you provide a doc or something that we can follow to use https for our spark web ui? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16523 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16550 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71287/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16523 LGTM, merged into master. Thanks. We can not update JIRA since it's currently down for maintenance, will do later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16550 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16567: [SPARK-19113][SS][Tests] Ignore StreamingQueryException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16567 **[Test build #71293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71293/testReport)** for PR 16567 at commit [`e5ed096`](https://github.com/apache/spark/commit/e5ed096bbf719bcd34d36e31485f939b633f43f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16550 **[Test build #71287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71287/testReport)** for PR 16550 at commit [`7448e8c`](https://github.com/apache/spark/commit/7448e8cff72c4510ab1b6f341c587a403779d5e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16567: [SPARK-19113][SS][Tests] Ignore StreamingQueryExc...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16567 [SPARK-19113][SS][Tests] Ignore StreamingQueryException thrown from awaitInitialization to avoid breaking tests ## What changes were proposed in this pull request? `StreamExecution.awaitInitialization` may throw fatal errors and fail the test. This PR just ignores `StreamingQueryException` thrown from `awaitInitialization` so that we can verify the exception in the `ExpectFailure` action later. It's fine since `StopStream` or `ExpectFailure` will catch `StreamingQueryException` as well. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-19113-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16567 commit e5ed096bbf719bcd34d36e31485f939b633f43f4 Author: Shixiong ZhuDate: 2017-01-13T06:22:41Z Ignore exception from awaitInitialization to avoid breaking tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16564#discussion_r95942753 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { (1, 2), (1, 1), (2, 1), (2, 2)) } - test("dropDuplicates should not change child plan output") { -val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS() -checkDataset( - ds.dropDuplicates("_1").select(ds("_1").as[String], ds("_2").as[Int]), - ("a", 1), ("b", 1)) + test("SPARK-19065 dropDuplicates should not create expressions using the same id") { --- End diff -- This may introduce other unknown issues because I saw right now SQL rules that replace attributes don't deal with `Alias`s. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16564#discussion_r95942576 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { (1, 2), (1, 1), (2, 1), (2, 2)) } - test("dropDuplicates should not change child plan output") { -val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS() -checkDataset( - ds.dropDuplicates("_1").select(ds("_1").as[String], ds("_2").as[Int]), - ("a", 1), ("b", 1)) + test("SPARK-19065 dropDuplicates should not create expressions using the same id") { --- End diff -- It's in my fist commit: https://github.com/apache/spark/pull/16564/commits/13f54a93c0cf31a38455e90aec722e890af980c6 I removed it because it's not a Structured Streaming issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15324 @jkbradley What's your opinion about whether GNB should be a separated Classifier or a modeltype in existing NB? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 @jkbradley Updated. Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71286/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15505 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #71286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71286/testReport)** for PR 15505 at commit [`a4499a8`](https://github.com/apache/spark/commit/a4499a8da953d55b8909c1d17df794ca3f357c17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71289/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16355 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16355 **[Test build #71289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)** for PR 16355 at commit [`138ab34`](https://github.com/apache/spark/commit/138ab3478fb8b0f4f4569bb3b0e66c04d3d5cac1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16559: [WIP] Add expression index and test cases
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16559 we already have `GetArrayItem` and `GetMapValue`, and we have special parser rules to support it, e.g. `SELECT array_col[3], map_co['key']`. We can just treat `index` as an alias if `UnresolvedExtractValue`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16528 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16528 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71292/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16528 **[Test build #71292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71292/testReport)** for PR 16528 at commit [`2e1d378`](https://github.com/apache/spark/commit/2e1d378456011269ddb1fc451aaa9221ce4996a9). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71288/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16528 **[Test build #71292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71292/testReport)** for PR 16528 at commit [`2e1d378`](https://github.com/apache/spark/commit/2e1d378456011269ddb1fc451aaa9221ce4996a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16542 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16542 **[Test build #71288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)** for PR 16542 at commit [`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71284/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71284/testReport)** for PR 16503 at commit [`aba406d`](https://github.com/apache/spark/commit/aba406d4833e7f01040a01f1d6e2b368da852f92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16557 **[Test build #71291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71291/testReport)** for PR 16557 at commit [`397c26b`](https://github.com/apache/spark/commit/397c26b3498eed775621a83f122d1b2b517ba0ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16481 Sure, will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16565 **[Test build #71290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71290/consoleFull)** for PR 16565 at commit [`e2c2fae`](https://github.com/apache/spark/commit/e2c2fae70204a2f5891fdfd8d516c273b2d72648). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16500: [SPARK-19120] [SPARK-19121] Refresh Metadata Cach...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16500#discussion_r95939337 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -392,7 +392,9 @@ case class InsertIntoHiveTable( // Invalidate the cache. sqlContext.sharedState.cacheManager.invalidateCache(table) - sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +if (partition.nonEmpty) { + sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +} --- End diff -- let's revert it first, we should think about cache and refresh more thorough later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16512: [SPARK-18335][SPARKR] createDataFrame to support ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16512#discussion_r95939102 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -196,6 +196,12 @@ test_that("create DataFrame from RDD", { expect_equal(dtypes(df), list(c("name", "string"), c("age", "int"), c("height", "float"))) expect_equal(as.list(collect(where(df, df$name == "John"))), list(name = "John", age = 19L, height = 176.5)) + expect_equal(getNumPartitions(toRDD(df)), 1) --- End diff -- And so we this subtlety is significant we could change to this. It's a slightly more involved change but it would match Scala exactly. ``` splits <- unlist(lapply(0: (numSlices - 1), function(x) { start <- trunc((x * length)/numSlices) end <- trunc(((x + 1) * length)/numSlices) rep(start, end - start) })) ``` And you get this sequence for length <- 50, numSlices <- 22 ``` [1] 0 0 2 2 4 4 6 6 6 9 9 11 11 13 13 15 15 15 18 18 20 20 22 22 22 [26] 25 25 27 27 29 29 31 31 31 34 34 36 36 38 38 40 40 40 43 43 45 45 47 47 47 ``` For calling split() with this sequence is used as.factor - so the numeric values are not significant --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16512: [SPARK-18335][SPARKR] createDataFrame to support ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16512#discussion_r95938844 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -196,6 +196,12 @@ test_that("create DataFrame from RDD", { expect_equal(dtypes(df), list(c("name", "string"), c("age", "int"), c("height", "float"))) expect_equal(as.list(collect(where(df, df$name == "John"))), list(name = "John", age = 19L, height = 176.5)) + expect_equal(getNumPartitions(toRDD(df)), 1) --- End diff -- Ops I thought we were talking about `numSlices`. Great point about `positions`, and here're what I'm seeing (it's going to be a bit long) ``` postions(50, 20) (0,2) 0 1 (2,5) 2 3 4 (5,7) 5 6 (7,10) 7 8 9 (10,12) 10 11 (12,15) 12 13 14 (15,17) 15 16 (17,20) 17 18 19 (20,22) 20 21 (22,25) 22 23 24 (25,27) 25 26 (27,30) 27 28 29 (30,32) 30 31 (32,35) 32 33 34 (35,37) 35 36 (37,40) 37 38 39 (40,42) 40 41 (42,45) 42 43 44 (45,47) 45 46 (47,50) 47 48 49 sort(rep(1: 20, each = 1, length.out = 50)) [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 [26] 9 9 10 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 ``` As you can see, `positions` attempts to evenly distribute the "extras". ``` positions(50, 24) (0,2) 0 1 (2,4) 2 3 (4,6) 4 5 (6,8) 6 7 (8,10) 8 9 (10,12) 10 11 (12,14) 12 13 (14,16) 14 15 (16,18) 16 17 (18,20) 18 19 (20,22) 20 21 (22,25) 22 23 24 (25,27) 25 26 (27,29) 27 28 (29,31) 29 30 (31,33) 31 32 (33,35) 33 34 (35,37) 35 36 (37,39) 37 38 (39,41) 39 40 (41,43) 41 42 (43,45) 43 44 (45,47) 45 46 (47,50) 47 48 49 sort(rep(1: 24, each = 1, length.out = 50)) [1] 1 1 1 2 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 [26] 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 ``` You see if there're only 2, it puts one in the middle and one at the end. ``` positions(50, 22) (0,2) 0 1 (2,4) 2 3 (4,6) 4 5 (6,9) 6 7 8 (9,11) 9 10 (11,13) 11 12 (13,15) 13 14 (15,18) 15 16 17 (18,20) 18 19 (20,22) 20 21 (22,25) 22 23 24 (25,27) 25 26 (27,29) 27 28 (29,31) 29 30 (31,34) 31 32 33 (34,36) 34 35 (36,38) 36 37 (38,40) 38 39 (40,43) 40 41 42 (43,45) 43 44 (45,47) 45 46 (47,50) 47 48 49 sort(rep(1: 22, each = 1, length.out = 50)) [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 8 8 9 9 10 [26] 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 ``` When there're only a few it is still roughly evenly spaced out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16564#discussion_r95938778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { (1, 2), (1, 1), (2, 1), (2, 2)) } - test("dropDuplicates should not change child plan output") { -val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS() -checkDataset( - ds.dropDuplicates("_1").select(ds("_1").as[String], ds("_2").as[Int]), - ("a", 1), ("b", 1)) + test("SPARK-19065 dropDuplicates should not create expressions using the same id") { --- End diff -- do you have an end-to-end test to show that using same id when alias will cause troubles? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16481 I'll update JIRA once the service is back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16481 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16481 LGTM, merging to master! It conflicts with branch-2.1, can you send a new PR? thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16355 **[Test build #71289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)** for PR 16355 at commit [`138ab34`](https://github.com/apache/spark/commit/138ab3478fb8b0f4f4569bb3b0e66c04d3d5cac1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley thanks, I've updated the code based on your latest comments - I removed k and the verification for the setters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95937613 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -160,6 +162,17 @@ object KMeansSuite { spark.createDataFrame(rdd) } + def generateSparseData(spark: SparkSession, rows: Int, dim: Int, k: Int, seed: Int): DataFrame = { +val sc = spark.sparkContext +val random = new Random(seed) +val nnz = random.nextInt(dim) +val rdd = sc.parallelize(1 to rows) + .map(i => Vectors.sparse(dim, random.shuffle(0 to dim - 1).slice(0, nnz).sorted.toArray, +Array.fill(nnz)(random.nextInt(k).toDouble))) --- End diff -- done, removed k --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95937532 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,21 @@ class BisectingKMeansSuite assert(copiedModel.hasSummary) } + test("SPARK-16473: Verify Bisecting K-Means does not fail in edge case where" + +"one cluster is empty after split") { +val bkm = new BisectingKMeans().setK(k).setMinDivisibleClusterSize(4).setMaxIter(4) + +assert(bkm.getK === k) --- End diff -- done, removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16566: [SparkR]: add bisecting kmeans R wrapper
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16566 ``` * checking Rd \usage sections ... WARNING Duplicated \argument entries in documentation object 'fitted': 'object' 'method' '...' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16500: [SPARK-19120] [SPARK-19121] Refresh Metadata Cach...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16500#discussion_r95937423 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -392,7 +392,9 @@ case class InsertIntoHiveTable( // Invalidate the cache. sqlContext.sharedState.cacheManager.invalidateCache(table) - sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +if (partition.nonEmpty) { + sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier) +} --- End diff -- @cloud-fan @ericl @mallman For non-partitioned parquet/orc tables, we convert them to the data source tables. Thus, it will not call `InsertIntoHiveTable`. I know it is a little bit confusing, but I am fine to keep it unchanged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15671 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71285/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15671 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15671 **[Test build #71285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71285/testReport)** for PR 15671 at commit [`c8188b0`](https://github.com/apache/spark/commit/c8188b03c49912ab2ee9f7dc0f5aae5a9ddc1a1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16542 **[Test build #71288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)** for PR 16542 at commit [`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16523 sounds good! @yanboliang any more comment before we merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16542: [SPARK-18905][STREAMING] Fix the issue of removin...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/16542#discussion_r95935489 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala --- @@ -200,19 +200,19 @@ class JobScheduler(val ssc: StreamingContext) extends Logging { job.setEndTime(completedTime) listenerBus.post(StreamingListenerOutputOperationCompleted(job.toOutputOperationInfo)) logInfo("Finished job " + job.id + " from job set of time " + jobSet.time) -if (jobSet.hasCompleted) { - jobSets.remove(jobSet.time) - jobGenerator.onBatchCompletion(jobSet.time) - logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format( -jobSet.totalDelay / 1000.0, jobSet.time.toString, -jobSet.processingDelay / 1000.0 - )) - listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo)) -} job.result match { case Failure(e) => reportError("Error running job " + job, e) case _ => +if (jobSet.hasCompleted) { + jobSets.remove(jobSet.time) + jobGenerator.onBatchCompletion(jobSet.time) + logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format( +jobSet.totalDelay / 1000.0, jobSet.time.toString, +jobSet.processingDelay / 1000.0 + )) + listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo)) --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r95935452 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -116,6 +116,12 @@ case class Filter(condition: Expression, child: LogicalPlan) .filterNot(SubqueryExpression.hasCorrelatedSubquery) child.constraints.union(predicates.toSet) } + + override lazy val statistics: Statistics = { --- End diff -- OK. fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16550 **[Test build #71287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71287/testReport)** for PR 16550 at commit [`7448e8c`](https://github.com/apache/spark/commit/7448e8cff72c4510ab1b6f341c587a403779d5e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #71286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71286/testReport)** for PR 15505 at commit [`a4499a8`](https://github.com/apache/spark/commit/a4499a8da953d55b8909c1d17df794ca3f357c17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15505: [SPARK-18890][CORE] Move task serialization from ...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15505#discussion_r95933009 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala --- @@ -52,7 +55,43 @@ private[spark] class TaskDescription( val addedFiles: Map[String, Long], val addedJars: Map[String, Long], val properties: Properties, -val serializedTask: ByteBuffer) { +private var serializedTask_ : ByteBuffer) extends Logging { --- End diff -- Another implementation: https://github.com/witgo/spark/commit/4fbf30a568ed61982e17757f9df3c35cb9d64871 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16467 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71283/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16467 **[Test build #71283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71283/testReport)** for PR 16467 at commit [`6a1a415`](https://github.com/apache/spark/commit/6a1a4159f54397ef81baaf618e3b816866f589e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15671 **[Test build #71285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71285/testReport)** for PR 15671 at commit [`c8188b0`](https://github.com/apache/spark/commit/c8188b03c49912ab2ee9f7dc0f5aae5a9ddc1a1c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71284/testReport)** for PR 16503 at commit [`aba406d`](https://github.com/apache/spark/commit/aba406d4833e7f01040a01f1d6e2b368da852f92). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @ash211 Thanks a lot for your comment. I've already fixed the failing Scala style tests. Running `./dev/scalastyle` passed. Could you give another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] Improvement: filter ...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16547 Thanks for the feedback! Ah, sure, let me update accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/16527 I use following code log trim stages/jobs time consuming: ```:scala /** If stages is too large, remove and garbage collect old stages */ private def trimStagesIfNecessary(stages: ListBuffer[StageInfo]) = synchronized { if (stages.size > retainedStages) { val start = System.currentTimeMillis() val toRemove = (stages.size - retainedStages) stages.take(toRemove).foreach { s => stageIdToData.remove((s.stageId, s.attemptId)) stageIdToInfo.remove(s.stageId) } stages.trimStart(toRemove) logInfo(s"Trim stages time consuming: ${System.currentTimeMillis() - start}") } } /** If jobs is too large, remove and garbage collect old jobs */ private def trimJobsIfNecessary(jobs: ListBuffer[JobUIData]) = synchronized { if (jobs.size > retainedJobs) { val start = System.currentTimeMillis() val toRemove = (jobs.size - retainedJobs) jobs.take(toRemove).foreach { job => // Remove the job's UI data, if it exists jobIdToData.remove(job.jobId).foreach { removedJob => // A null jobGroupId is used for jobs that are run without a job group val jobGroupId = removedJob.jobGroup.orNull // Remove the job group -> job mapping entry, if it exists jobGroupToJobIds.get(jobGroupId).foreach { jobsInGroup => jobsInGroup.remove(job.jobId) // If this was the last job in this job group, remove the map entry for the job group if (jobsInGroup.isEmpty) { jobGroupToJobIds.remove(jobGroupId) } } } } jobs.trimStart(toRemove) logInfo(s"Trim jobs time consuming: ${System.currentTimeMillis() - start}") } } ``` and the result is: ``` tail -f test-time-consuming.log | grep time 17/01/13 10:03:39 INFO JobProgressListener: Trim stages time consuming: 3 17/01/13 10:03:39 INFO JobProgressListener: Trim jobs time consuming: 4 17/01/13 10:03:39 INFO JobProgressListener: Trim stages time consuming: 0 17/01/13 10:03:47 INFO JobProgressListener: Trim stages time consuming: 0 17/01/13 10:03:47 INFO JobProgressListener: Trim jobs time consuming: 0 17/01/13 10:03:47 INFO JobProgressListener: Trim stages time consuming: 0 17/01/13 10:03:56 INFO JobProgressListener: Trim stages time consuming: 1 17/01/13 10:03:56 INFO JobProgressListener: Trim jobs time consuming: 0 17/01/13 10:03:56 INFO JobProgressListener: Trim stages time consuming: 0 17/01/13 10:04:04 INFO JobProgressListener: Trim stages time consuming: 0 17/01/13 10:04:04 INFO JobProgressListener: Trim jobs time consuming: 0 17/01/13 10:04:04 INFO JobProgressListener: Trim stages time consuming: 0 ``` It may be fine just change `retainedTasks`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org