[GitHub] spark issue #15122: [SPARK-17569] Make StructuredStreaming FileStreamSource ...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/15122 I looked into this. I think there are two ways that you can intercept any calls to HDFS. The first way is slightly hacky but pretty simple. FileSystem.addFileSystemForTesting is a package private method that can be used to inject a mock file system. You can create an implementation of FilterFileSystem and pass it in as "file" schema. Then all accesses to local file system will go through your implementation. Of course, you can also use a mocking library to do that, but that is not as clean since FilterFileSystem is a public interface. The second way is more robust and does not depend on any private APIs. Create an implementation of FilterFileSystem by pointing to LocalFileSystem, e.g. call it MockFileSystem. MockFileSystem.getScheme should return "mockfs://". You can then use this as the path when passing to structured streaming. This is probably a more robust, generic solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15133 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15133 **[Test build #65552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65552/consoleFull)** for PR 15133 at commit [`339d5d4`](https://github.com/apache/spark/commit/339d5d4f7afb110e17b01e3355fb68ef6d12200d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65552/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #65551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65551/consoleFull)** for PR 14959 at commit [`d85bf36`](https://github.com/apache/spark/commit/d85bf36850b7e97056889fbd273749e1d8144cc6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65551/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15133 Also change this one according to that now default value of app name is (random) for session and context ![random](https://cloud.githubusercontent.com/assets/8075390/18613106/c5e3b420-7d8f-11e6-8763-9d7d16d2eafa.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15133 **[Test build #65552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65552/consoleFull)** for PR 15133 at commit [`339d5d4`](https://github.com/apache/spark/commit/339d5d4f7afb110e17b01e3355fb68ef6d12200d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79297872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { ctx.identifier != null && ctx.identifier.getText.toLowerCase == "noscan") { AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString) -} else { +} else if (ctx.identifierSeq() == null) { --- End diff -- As mentioned in [the comment](https://github.com/apache/spark/pull/15090#r78687294), we are going to change the "ANALYZE" syntax in SqlBase.g4, i.e. make the identifierSeq non-optional, which it's different from Hive. Is this ok? @rxin @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #65551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65551/consoleFull)** for PR 14959 at commit [`d85bf36`](https://github.com/apache/spark/commit/d85bf36850b7e97056889fbd273749e1d8144cc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/14959 @vanzin Thanks for your reviews. I just update the PR, but don't get your following statement mean. Can you explain it ? Thanks ``` Especially since the Scala SparkContext clones the original user config - and if I read your code correctly, you're not doing that here. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15134 **[Test build #65550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65550/consoleFull)** for PR 15134 at commit [`7a5946d`](https://github.com/apache/spark/commit/7a5946d90e1d1816964baf724b4e3422ade99b3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15133 @andrewor14 what is think about this https://github.com/apache/spark/pull/15134 we add random UUID of app name while creating spark context if its not define. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15134: [SPARK-17580][CORE]Add random UUID as app name wh...
GitHub user phalodi opened a pull request: https://github.com/apache/spark/pull/15134 [SPARK-17580][CORE]Add random UUID as app name while app name not define while creating ⦠## What changes were proposed in this pull request? Assign Random UUID as a app name while app name not define while creating spark context its also same in SparkSession so we should make this behaviour same. ## How was this patch tested? Run all test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/phalodi/spark SPARK-17580 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15134 commit 7a5946d90e1d1816964baf724b4e3422ade99b3d Author: sandyDate: 2016-09-18T05:17:07Z add random UUID as app name while app name not define while creating spark context --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15122: [SPARK-17569] Make StructuredStreaming FileStreamSource ...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/15122 Can you test this by deleting the file on purpose, and see what kind of exceptions are thrown? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65548/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65548/consoleFull)** for PR 14971 at commit [`3376bd6`](https://github.com/apache/spark/commit/3376bd6a57a65fa004abd43237f8f3c87f07064a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15127: [SPARK-17571][SQL] AssertOnQuery.condition should always...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15127 **[Test build #65549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65549/consoleFull)** for PR 15127 at commit [`d013acf`](https://github.com/apache/spark/commit/d013acf3b8a258d12dbe61a2d348ccfc4f099fb6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15129: [SPARK-17546] [DEPLOY] start-* scripts should use hostna...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15129 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15051: [SPARK-17499][SparkR][ML][MLLib] make the default...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r79297392 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,14 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note spark.mlp since 2.1.0 setMethod("spark.mlp", signature(data = "SparkDataFrame"), - function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, - tol = 0.5, stepSize = 1, seed = 1) { + function(data, layers, blockSize = 128, solver = "l-bfgs", maxIter = 100, + tol = 1E-6, stepSize = 0.03, seed = 0x7FFF) { +if (length(layers) <= 1) { + stop("layers vector require length > 0.") +} +if (any(sapply(layers, function(e) !is.numeric(e { --- End diff -- oh, its a clever way using `as.intege(x) != x` to check whether it is an integer. here the mlp require layers to be integer vector, is it better to force user pass integer vector, if not call `stop`, or just print a warning ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15051: [SPARK-17499][SparkR][ML][MLLib] make the default...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r79297229 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,14 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note spark.mlp since 2.1.0 setMethod("spark.mlp", signature(data = "SparkDataFrame"), - function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, - tol = 0.5, stepSize = 1, seed = 1) { + function(data, layers, blockSize = 128, solver = "l-bfgs", maxIter = 100, + tol = 1E-6, stepSize = 0.03, seed = 0x7FFF) { +if (length(layers) <= 1) { + stop("layers vector require length > 0.") +} +if (any(sapply(layers, function(e) !is.numeric(e { --- End diff -- You can use `numToInt` from https://github.com/apache/spark/blob/master/R/pkg/R/utils.R#L368 -- It'll print a warning if its not an integer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15131 @shivaram Thanks for cc'ing me. I will try to look closely within today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65547/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65547/consoleFull)** for PR 13513 at commit [`be1abfa`](https://github.com/apache/spark/commit/be1abfa0e902fca3ed945bfbb6e0573909d55e2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14600: [SPARK-15899] [SQL] Fix the construction of the file pat...
Github user Praveenmail2him commented on the issue: https://github.com/apache/spark/pull/14600 Can anyone post the sample usage for this exception in Spark 2.0, I'm still facing this exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79296707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import scala.collection.mutable + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier} +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, BasicColStats, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types._ + + +/** + * Analyzes the given columns of the given table in the current database to generate statistics, + * which will be used in query optimizations. + */ +case class AnalyzeColumnCommand( +tableIdent: TableIdentifier, +columnNames: Seq[String]) extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val sessionState = sparkSession.sessionState +val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent)) + +// check correctness of column names +val validColumns = mutable.MutableList[NamedExpression]() +val resolver = sessionState.conf.resolver +columnNames.foreach { col => + val exprOption = relation.resolve(col.split("\\."), resolver) + if (exprOption.isEmpty) { +throw new AnalysisException(s"Invalid column name: $col") + } + if (validColumns.map(_.exprId).contains(exprOption.get.exprId)) { +throw new AnalysisException(s"Duplicate column name: $col") + } + validColumns += exprOption.get +} + +relation match { + case catalogRel: CatalogRelation => +updateStats(catalogRel.catalogTable, + AnalyzeTableCommand.calculateTotalSize(sparkSession, catalogRel.catalogTable)) + + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes) + + case otherRelation => +throw new AnalysisException("ANALYZE TABLE is not supported for " + + s"${otherRelation.nodeName}.") +} + +def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit = { + // Collect statistics per column. + // The first element in the result will be the overall row count, the following elements + // will be structs containing all column stats. + // The layout of each struct follows the layout of the BasicColStats. + val ndvMaxErr = sessionState.conf.ndvMaxError + val expressions = Count(Literal(1)).toAggregateExpression() +: +validColumns.map(ColumnStatsStruct(_, ndvMaxErr)) + val namedExpressions = expressions.map(e => Alias(e, e.toString)()) + val statsRow = Dataset.ofRows(sparkSession, Aggregate(Nil, namedExpressions, relation)) +.queryExecution.toRdd.collect().head + + // unwrap the result + val rowCount = statsRow.getLong(0) + val colStats = validColumns.zipWithIndex.map { case (expr, i) => +val colInfo = statsRow.getStruct(i + 1, ColumnStatsStruct.statsNumber) +val colStats = ColumnStatsStruct.unwrapRow(expr, colInfo) +(expr.name, colStats) + }.toMap + + val statistics = +Statistics(sizeInBytes = newTotalSize, rowCount = Some(rowCount), basicColStats = colStats) + sessionState.catalog.alterTable(catalogTable.copy(stats = Some(statistics))) +
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65548/consoleFull)** for PR 14971 at commit [`3376bd6`](https://github.com/apache/spark/commit/3376bd6a57a65fa004abd43237f8f3c87f07064a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r7929 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with SharedSQLContext { checkTableStats(tableName, expectedRowCount = Some(2)) } } + + test("test column-level statistics for data source table created in InMemoryCatalog") { +def checkColStats(colStats: BasicColStats, expectedColStats: BasicColStats): Unit = { + assert(colStats.dataType == expectedColStats.dataType) + assert(colStats.numNulls == expectedColStats.numNulls) + assert(colStats.max == expectedColStats.max) + assert(colStats.min == expectedColStats.min) + if (expectedColStats.ndv.isDefined) { +// ndv is an approximate value, so we just make sure we have the value +assert(colStats.ndv.get >= 0) --- End diff -- How to get the standard deviations? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79296668 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -563,6 +563,13 @@ object SQLConf { .timeConf(TimeUnit.MILLISECONDS) .createWithDefault(10L) + val NDV_MAX_ERROR = +SQLConfigBuilder("spark.sql.ndv.maxError") --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79296634 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -87,19 +87,23 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** - * Create an [[AnalyzeTableCommand]] command. This currently only implements the NOSCAN - * option (other options are passed on to Hive) e.g.: - * {{{ - * ANALYZE TABLE table COMPUTE STATISTICS NOSCAN; - * }}} + * Create an [[AnalyzeTableCommand]] command or an [[AnalyzeColumnCommand]] command. */ override def visitAnalyze(ctx: AnalyzeContext): LogicalPlan = withOrigin(ctx) { if (ctx.partitionSpec == null && ctx.identifier != null && ctx.identifier.getText.toLowerCase == "noscan") { - AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString) + AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier)) +} else if (ctx.identifierSeq() == null) { --- End diff -- Yeah, good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79296629 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsColumnSuite.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.catalyst.plans.logical.BasicColStats +import org.apache.spark.sql.execution.command.AnalyzeColumnCommand +import org.apache.spark.sql.types._ + +class StatisticsColumnSuite extends StatisticsTest { + + test("parse analyze column commands") { +val table = "table" +assertAnalyzeCommand( + s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value", + classOf[AnalyzeColumnCommand]) + +val noColumnError = intercept[AnalysisException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS") +} +assert(noColumnError.message == "Need to specify the columns to analyze. Usage: " + + "ANALYZE TABLE tbl COMPUTE STATISTICS FOR COLUMNS key, value") + +withTable(table) { + sql(s"CREATE TABLE $table (key INT, value STRING)") + val invalidColError = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS k") + } + assert(invalidColError.message == s"Invalid column name: k") + + val duplicateColError = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value, key") + } + assert(duplicateColError.message == s"Duplicate column name: key") + + withSQLConf("spark.sql.caseSensitive" -> "true") { +val invalidErr = intercept[AnalysisException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS keY") +} +assert(invalidErr.message == s"Invalid column name: keY") + } + + withSQLConf("spark.sql.caseSensitive" -> "false") { +val duplicateErr = intercept[AnalysisException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value, vaLue") +} +assert(duplicateErr.message == s"Duplicate column name: vaLue") + } +} + } + + test("basic statistics for integral type columns") { +val rdd = sparkContext.parallelize(Seq("1", null, "2", "3", null)).map { i => + if (i != null) Row(i.toByte, i.toShort, i.toInt, i.toLong) else Row(i, i, i, i) +} +val schema = StructType( + StructField(name = "c1", dataType = ByteType, nullable = true) :: +StructField(name = "c2", dataType = ShortType, nullable = true) :: +StructField(name = "c3", dataType = IntegerType, nullable = true) :: +StructField(name = "c4", dataType = LongType, nullable = true) :: Nil) +val expectedBasicStats = BasicColStats( + dataType = ByteType, numNulls = 2, max = Some(3), min = Some(1), ndv = Some(3)) --- End diff -- Can you explain more about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15051 @felixcheung yeah, in fact 0x7FFF is not ideal because itself also a valid seed. and there is another problem, in scala, seed is `long` type, but in R side, it seems there is no `long` type, so the seed value range in R-side is already smaller than scala-side. but I think it is a trivial problem, because `int` range seed is large enough to be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15051: [SPARK-17499][SparkR][ML][MLLib] make the default...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r79295910 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,14 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note spark.mlp since 2.1.0 setMethod("spark.mlp", signature(data = "SparkDataFrame"), - function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, - tol = 0.5, stepSize = 1, seed = 1) { + function(data, layers, blockSize = 128, solver = "l-bfgs", maxIter = 100, + tol = 1E-6, stepSize = 0.03, seed = 0x7FFF) { +if (length(layers) <= 1) { + stop("layers vector require length > 0.") +} +if (any(sapply(layers, function(e) !is.numeric(e { --- End diff -- layers should be integer, but in R it seems we can't distinguish numeric or integer vector ? to `layers<-c(1,2)` or `layers<-c(1.0, 2.0)`, `is.integer(layers[i])` both return `false` and `as.integer(layers)` both return `true`, so is there some good way to check it is an integer vector but not a numeric vector ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65547/consoleFull)** for PR 13513 at commit [`be1abfa`](https://github.com/apache/spark/commit/be1abfa0e902fca3ed945bfbb6e0573909d55e2b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65546/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65546/consoleFull)** for PR 14971 at commit [`2f40c7f`](https://github.com/apache/spark/commit/2f40c7f5532c8b6e66c786f3b1506bd4efdcf711). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' predict...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/14643 @srowen You can take it over. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65546/consoleFull)** for PR 14971 at commit [`2f40c7f`](https://github.com/apache/spark/commit/2f40c7f5532c8b6e66c786f3b1506bd4efdcf711). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15051 LGTM - just a question above and this: would 0x7FFF be a good placeholder value - is it possible to set seed to this in Scala? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15051: [SPARK-17499][SparkR][ML][MLLib] make the default...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r79294205 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,14 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note spark.mlp since 2.1.0 setMethod("spark.mlp", signature(data = "SparkDataFrame"), - function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, - tol = 0.5, stepSize = 1, seed = 1) { + function(data, layers, blockSize = 128, solver = "l-bfgs", maxIter = 100, + tol = 1E-6, stepSize = 0.03, seed = 0x7FFF) { +if (length(layers) <= 1) { + stop("layers vector require length > 0.") +} +if (any(sapply(layers, function(e) !is.numeric(e { --- End diff -- juts double checking - should layers be integer or numeric? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14338: [SPARK-16701] Make parameters configurable in Blo...
Github user lovexi closed the pull request at: https://github.com/apache/spark/pull/14338 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15093 Yep, saw that. I re-merged this, and yes during conflict resolution QuantileSummaries.scala comes up as a file added only in the master branch, but when I choose to not take the change in the IDE, I see it actually resulted in adding an empty file. I made sure that was not part of the commit and pushed again. Looks as intended now: https://github.com/apache/spark/commit/5fd354b2d628130a74c9d01adc7ab6bef65fbd9a --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15093 I reverted already! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15093 Oh weird! no idea why that happened. Yeah I'll take care of it from here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15093 @HyukjinKwon @srowen This PR when merged into branch-2.0 somehow created an empty file QuantileSummaries.scala that is failing the lint test as the Apache license header does not exist - Commit - https://github.com/apache/spark/commit/a3bba372abce926351335d0a2936b70988f19b23 Empty file - https://github.com/apache/spark/blob/a3bba372abce926351335d0a2936b70988f19b23/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala I am not sure how exactly backporting a patch led to an empty file, but this does not seem right. I am reverting this commit in branch 2.0. Please make a new PR to fix this in branch 2.0 correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15131 It looks like `addFile` isn't working on Windows because we try to convert the windows file path into a URI and that fails. Not sure what the fix is in this case. cc @HyukjinKwon who worked on this for `hadoopFile` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15133 Yeah `SparkSession` will be the new thing moving forward. `SparkContext` is kind of just a legacy thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15133 @andrewor14 So as you suggest we also change it in spark context code because right now we must set app name while creating spark context. So when we add random UUID generate for default value of spark name while creating spark context then it will be consistent for all cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15133 We should probably just make it a random UUID in all cases to be consistent. I don't know if people check whether `spark.app.name` is set, so that might be a backward compatibility concern (though one that we kind of already broke with `SparkSession`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65544/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #65544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65544/consoleFull)** for PR 15132 at commit [`9ff922b`](https://github.com/apache/spark/commit/9ff922bead9805b5d0b7dcb8f9d910e7202ed67b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15043: [SPARK-17491] Close serialization stream to fix w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15043 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15043 I believe that this latest test failure is caused by a known flaky PySpark test, so I'm going to merge this now and will monitor tests afterwards. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15131 **[Test build #65539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65539/consoleFull)** for PR 15131 at commit [`d3dd380`](https://github.com/apache/spark/commit/d3dd3808e88b3f4ba5af683eb7d7709fcc2710f7). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65539/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65545/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15133 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15133 **[Test build #65545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65545/consoleFull)** for PR 15133 at commit [`eade2e2`](https://github.com/apache/spark/commit/eade2e2d5fbb757616a1265d1f2e196fe8799dd9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15043 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65541/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15043 **[Test build #65541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65541/consoleFull)** for PR 15043 at commit [`0d70774`](https://github.com/apache/spark/commit/0d70774e1db04edb46b312efc4b1646d7201fb03). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15133 **[Test build #65545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65545/consoleFull)** for PR 15133 at commit [`eade2e2`](https://github.com/apache/spark/commit/eade2e2d5fbb757616a1265d1f2e196fe8799dd9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15133: [SPARK-17578][Docs] Add spark.app.name default va...
GitHub user phalodi opened a pull request: https://github.com/apache/spark/pull/15133 [SPARK-17578][Docs] Add spark.app.name default value for spark session ## What changes were proposed in this pull request? Modify spark.app.name configuration for spark session ## How was this patch tested? run all test cases and generate documentation ![appname](https://cloud.githubusercontent.com/assets/8075390/18609970/9eba2f2c-7d2c-11e6-8d3b-e45691db59b9.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/phalodi/spark SPARK-17578 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15133 commit eade2e2d5fbb757616a1265d1f2e196fe8799dd9 Author: sandyDate: 2016-09-17T17:43:41Z add spark.app.name default value for spark session --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15073: [SPARK-17518] [SQL] Block Users to Specify the Internal ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15073 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65538/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15073: [SPARK-17518] [SQL] Block Users to Specify the Internal ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15073 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15073: [SPARK-17518] [SQL] Block Users to Specify the Internal ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15073 **[Test build #65538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65538/consoleFull)** for PR 15073 at commit [`ef174c1`](https://github.com/apache/spark/commit/ef174c1fde3b872a2374d8b47b5a28eeb8a13321). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15051 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65542/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15051 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15051 **[Test build #65542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65542/consoleFull)** for PR 15051 at commit [`ce2c2f7`](https://github.com/apache/spark/commit/ce2c2f743e912225416a1f28b0e90d5d88ddaf49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65536/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15093 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15093: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15093 **[Test build #65536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65536/consoleFull)** for PR 15093 at commit [`8a3d293`](https://github.com/apache/spark/commit/8a3d293302ba87629a7a7247a7c3912e294e3752). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14981 I am referring to http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/I don't think it is up to us being 'flexible' or not. I also don't actually see that a source vs binary distinction is drawn here either. Indeed there is a question whether even that is permitted. But I do not see any conclusive argument that this isn't permitted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user lresende commented on the issue: https://github.com/apache/spark/pull/14981 The pointer is exactly your quote on the e-mail to legal-discuss: http://www.apache.org/legal/resolved.html#prohibited says: - CAN APACHE PROJECTS RELY ON COMPONENTS UNDER PROHIBITED LICENSES? **Apache projects cannot distribute any such components**. As with the previous question on platforms, the component can be relied on if the component's licence terms do not affect the Apache product's licensing. For example, using a GPL'ed tool during the build is OK. CAN APACHE PROJECTS RELY ON COMPONENTS WHOSE LICENSING AFFECTS THE APACHE PRODUCT? Apache projects cannot distribute any such components. **However, if the component is only needed for optional features, a project can provide the user with instructions on how to obtain and install the non-included work**. Optional means that the component is not required for standard use of the product or for the product to achieve a desirable level of quality. The question to ask yourself in this situation is: === And I am being flexible here, and agreeing that that is ok to have the source distribution with the kinesis and ganglia modules, as long as we don't publish them into maven and require the users to build with the respective profiles in order to gain access to these modules in their application. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #65544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65544/consoleFull)** for PR 15132 at commit [`9ff922b`](https://github.com/apache/spark/commit/9ff922bead9805b5d0b7dcb8f9d910e7202ed67b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15097: [SPARK-17540][SparkR][Spark Core] fix SparkR array serde...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15097 Please add tests for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65543/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #65543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65543/consoleFull)** for PR 15132 at commit [`9fa1a4f`](https://github.com/apache/spark/commit/9fa1a4f8c8d1027b9c39d087299eeac1ffa11348). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #65543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65543/consoleFull)** for PR 15132 at commit [`9fa1a4f`](https://github.com/apache/spark/commit/9fa1a4f8c8d1027b9c39d087299eeac1ffa11348). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14981 That isn't the conclusion I took from the discussion on legal-discuss - do you have a pointer? I took that it was at best ambiguous but not obviously prohibited to distribute these because they are optional wrt Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15132: [SPARK-17510][STREAMING][KAFKA] config max rate o...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15132 [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis. ## How was this patch tested? Unit tests. The reporter (Jeff Nadler) said he could test on his workload, so let's wait on that report. You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-17510 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15132 commit b282fe1ba1245170e426b62fe7c543b2a26a6488 Author: cody koeningerDate: 2016-09-17T16:32:41Z [SPARK-17510][STREAMING][KAFKA] allow max rate on a per-partition basis commit 9fa1a4f8c8d1027b9c39d087299eeac1ffa11348 Author: cody koeninger Date: 2016-09-17T16:45:58Z [SPARK-17510][STREAMING][KAFKA] test max rate on a per-partition basis --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user lresende commented on the issue: https://github.com/apache/spark/pull/14981 Yes, and this is the intent. It's ok to have these in the source release (similar to ganglia) but we don't publish them in maven repository and it becomes available only if people goes and directly build them locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15051 @felixcheung Now I add some test using default parameter and compare the output prediction with the result generated using scala-side code. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15051 **[Test build #65542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65542/consoleFull)** for PR 15051 at commit [`ce2c2f7`](https://github.com/apache/spark/commit/ce2c2f743e912225416a1f28b0e90d5d88ddaf49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15131 **[Test build #65540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65540/consoleFull)** for PR 15131 at commit [`5c49428`](https://github.com/apache/spark/commit/5c49428738d8817f43f23c60f85850864845e7b9). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR] SparkR support add files to Spark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65540/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15043 I agree that we should add more off-heap tests, but I'd like to do it in another patch so that we can get this one merged faster to unblock the 2.0.1 RC. In terms of testing off-heap, I think that one of the best high-level tests / asserts would be to strengthen the `releaseUnrollMemory()` checks so that inappropriately releasing unroll memory _during_ a task throws an exception during tests. Today there are some circumstances where unroll memory can only be released at the end of a task (such as an iterator backed by an unrolled block that is only partially consumed before the task ends), so the calls to release unroll memory have been tolerant of too much memory being released (it just releases `min(actualMemory, requestedToRelease)`). However, this is only appropriate to do at the end of the task so we should strengthen the asserts to only allow it there; this would have caught the memory mode mixup that I fixed here. I'm going to retest this and if it passes tests then I'll merge to master and branch-2.0. I'll add the new tests described above in a followup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14981 The issue is that this also removes the non assembly artifact from the release. That does not seem to be strictly needed license wise. It is easy and tidy though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15043 **[Test build #65541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65541/consoleFull)** for PR 15043 at commit [`0d70774`](https://github.com/apache/spark/commit/0d70774e1db04edb46b312efc4b1646d7201fb03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...
Github user lresende commented on the issue: https://github.com/apache/spark/pull/14981 @srowen @rxin My understanding is that the mvn deploy is what takes care of actually publishing the files to maven staging repository : ` $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests $PUBLISH_PROFILES deploy ./dev/change-scala-version.sh 2.10 $MVN -DzincPort=$ZINC_PORT -Dscala-2.10 --settings $tmp_settings \ -DskipTests $PUBLISH_PROFILES clean deploy ` So, the suggested fix to remove Kinesis from the $PUBLISH_PROFILES should take care or making sure Kinesis won't show up in the maven staging repository for the release. @srowen Do you have other concerns ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15043 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13324: [SPARK-15559][PYTHON][STREAMING] Add hash method for Top...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65537/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org