[GitHub] spark pull request: [SPARK-14362] [SQL] DDL Native Support: Drop V...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/12146 [SPARK-14362] [SQL] DDL Native Support: Drop View What changes were proposed in this pull request? This PR is to provide a native support for DDL "DROP VIEW". The PR includes native parsing and native analysis. Based on the HIVE DDL document for [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL- DropView ), `DROP VIEW` is defined as, **Syntax:** ```SQL DROP VIEW [IF EXISTS] [db_name.]view_name; ``` - to remove metadata for the specified view. - illegal to use DROP TABLE on a view. - illegal to use DROP VIEW on a table. How was this patch tested? For verifying command parsing, added test cases in `spark/sql/hive/HiveDDLCommandSuite.scala` For verifying command analysis, added test cases in `spark/sql/hive/execution/HiveDDLSuite.scala` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark dropView Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12146 commit 0575f8918ded9deae74b1612f53b53e9fcbea92a Author: gatorsmileDate: 2016-04-04T01:29:13Z improve the test cases. commit 65a5ceb652a001af99625809f03d39cb0a7eafec Author: gatorsmile Date: 2016-04-04T02:28:41Z drop view. commit 6c672ce0f1a3175b6bfaf37ebacfdf4324d851cf Author: gatorsmile Date: 2016-04-04T04:10:07Z added test cases for commands. commit 4e858d104a400045398792b6b0c470e6ca6a7b8d Author: gatorsmile Date: 2016-04-04T05:37:31Z added test cases for DDL execution --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/11419#discussion_r58330615 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -0,0 +1,301 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.clustering + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param.{IntParam, ParamMap, Params} +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.clustering.{GaussianMixture => MLlibGM, GaussianMixtureModel => MLlibGMModel} +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.stat.distribution.MultivariateGaussian +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, udf} +import org.apache.spark.sql.types.{IntegerType, StructType} + + +/** + * Common params for GaussianMixture and GaussianMixtureModel + */ +private[clustering] trait GaussianMixtureParams extends Params with HasMaxIter with HasFeaturesCol + with HasSeed with HasPredictionCol with HasProbabilityCol with HasTol { + + /** + * Set the number of clusters to create (k). Must be > 1. Default: 2. + * @group param + */ + @Since("2.0.0") + final val k = new IntParam(this, "k", "number of clusters to create", (x: Int) => x > 1) + + /** @group getParam */ + @Since("2.0.0") + def getK: Int = $(k) + + /** + * Validates and transforms the input schema. + * @param schema input schema + * @return output schema + */ + protected def validateAndTransformSchema(schema: StructType): StructType = { +SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT) +SchemaUtils.appendColumn(schema, $(predictionCol), IntegerType) +SchemaUtils.appendColumn(schema, $(probabilityCol), new VectorUDT) + } +} + +/** + * :: Experimental :: + * Model fitted by GaussianMixture. + * @param parentModel a model trained by spark.mllib.clustering.GaussianMixture. + */ +@Since("2.0.0") +@Experimental +class GaussianMixtureModel private[ml] ( +@Since("2.0.0") override val uid: String, +private val parentModel: MLlibGMModel) + extends Model[GaussianMixtureModel] with GaussianMixtureParams with MLWritable { + + @Since("2.0.0") + override def copy(extra: ParamMap): GaussianMixtureModel = { +val copied = new GaussianMixtureModel(uid, parentModel) +copyValues(copied, extra).setParent(this.parent) + } + + @Since("2.0.0") + override def transform(dataset: DataFrame): DataFrame = { +val predUDF = udf((vector: Vector) => predict(vector)) +val probUDF = udf((vector: Vector) => predictProbability(vector)) +dataset.withColumn($(predictionCol), predUDF(col($(featuresCol + .withColumn($(probabilityCol), probUDF(col($(featuresCol + } + + @Since("2.0.0") + override def transformSchema(schema: StructType): StructType = { +validateAndTransformSchema(schema) + } + + private[clustering] def predict(features: Vector): Int = parentModel.predict(features) + + private[clustering] def predictProbability(features: Vector): Vector = { +Vectors.dense(parentModel.predictSoft(features)) + } + + @Since("2.0.0") + def weights: Array[Double] = parentModel.weights + + @Since("2.0.0") + def gaussians: Array[MultivariateGaussian] = parentModel.gaussians + + @Since("2.0.0") + override def write: MLWriter = new GaussianMixtureModel.GaussianMixtureModelWriter(this) + + private var trainingSummary: Option[GaussianMixtureSummary] = None + + private[clustering] def setSummary(summary: GaussianMixtureSummary): this.type = { +this.trainingSummary =
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58330599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -332,6 +332,9 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case r: RunnableCommand => ExecutedCommand(r) :: Nil + case _: logical.ScriptTransformation => --- End diff -- Sure, will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58330509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -332,6 +332,9 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case r: RunnableCommand => ExecutedCommand(r) :: Nil + case _: logical.ScriptTransformation => --- End diff -- This is a quite far down the line for catching such an problem. We can also throw an exception in the `CatalystSqlParser`s 'withScriptIOSchema' function. We would have to move the 'transform query spec' tests in the PlanParserSuite to Hive (HiveQlSuite). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58330415 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -134,6 +134,18 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** + * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + */ + override def visitCreateFileFormat( + ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { +if (ctx.storageHandler == null) { + typedVisit[CatalogStorageFormat](ctx.fileFormat) +} else { + typedVisit[CatalogStorageFormat](ctx.storageHandler) --- End diff -- Sure, can I change the return type of `visitStorageHandler` to `CatalogStorageFormat`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user ksakellis commented on a diff in the pull request: https://github.com/apache/spark/pull/12142#discussion_r58330267 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -249,6 +250,8 @@ trait SparkListener { * Called when other events like SQL-specific events are posted. */ def onOtherEvent(event: SparkListenerEvent) { } + + // WHENEVER WE ADD A METHOD HERE, PLEASE ALSO UPDATE SparkFirehoseListener. --- End diff -- We can have both this abstract class and also an interface? That way the SparkFirehoseListener can just implement the interface instead of extending the abstract class. Not sure if that is a better situation or not - it does make it more flexible though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58330212 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -134,6 +134,18 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } /** + * Create a [[CatalogStorageFormat]]. This is part of the [[CreateTableAsSelect]] command. + */ + override def visitCreateFileFormat( + ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { +if (ctx.storageHandler == null) { + typedVisit[CatalogStorageFormat](ctx.fileFormat) +} else { + typedVisit[CatalogStorageFormat](ctx.storageHandler) --- End diff -- MINOR/NIT: you could call straight into `visitStorageHandler` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14042][CORE] Add custom coalescer suppo...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11865#issuecomment-205145273 @nezihyigitbasi, do you plan to add something similar for DF/DS API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DO_NOT_MERGE] Try to reproduce StateStoreSuit...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12145#issuecomment-205145178 **[Test build #54825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54825/consoleFull)** for PR 12145 at commit [`1aa9d57`](https://github.com/apache/spark/commit/1aa9d57d2e942d5ae69ac384a64287de1b6c4a93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12117#discussion_r58329961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -175,13 +175,54 @@ case class DescribeDatabase( } } +/** + * The DDL command that creates a function. + * alias: the class name that implements the created function. + * resources: Jars, files, or archives which need to be added to the environment when the function + *is referenced for the first time by a session. + * isTemp: indicates if it is a temporary function. + */ +// TODO: Use Seq[FunctionResource] instead of Seq[(String, String)] for resources. case class CreateFunction( --- End diff -- can we move the function related commands out, i.e. functions.scala this file is going to become thousands of lines long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205144472 **[Test build #2736 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2736/consoleFull)** for PR 12142 at commit [`7112bf2`](https://github.com/apache/spark/commit/7112bf2fccebeea8b1d3e491743a102658121dac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DO_NOT_MERGE] Try to reproduce StateStoreSuit...
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/12145 [DO_NOT_MERGE] Try to reproduce StateStoreSuite.maintenance failure You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark fix-maintainance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12145 commit 1aa9d57d2e942d5ae69ac384a64287de1b6c4a93 Author: Liwei LinDate: 2016-04-04T05:36:17Z run 1000 times --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205143712 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205143679 **[Test build #54823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54823/consoleFull)** for PR 12142 at commit [`7112bf2`](https://github.com/apache/spark/commit/7112bf2fccebeea8b1d3e491743a102658121dac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205143714 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54823/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12117#discussion_r58329423 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -183,11 +203,12 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { checkAnswer(sql("SHOW functions abc.abs"), Row("abs")) checkAnswer(sql("SHOW functions `abc`.`abs`"), Row("abs")) checkAnswer(sql("SHOW functions `abc`.`abs`"), Row("abs")) -checkAnswer(sql("SHOW functions `~`"), Row("~")) +// TODO: Re-enable this test after we fix SPARK-14335. +// checkAnswer(sql("SHOW functions `~`"), Row("~")) checkAnswer(sql("SHOW functions `a function doens't exist`"), Nil) -checkAnswer(sql("SHOW functions `weekofyea.*`"), Row("weekofyear")) +checkAnswer(sql("SHOW functions `weekofyea*`"), Row("weekofyear")) // this probably will failed if we add more function with `sha` prefixing. -checkAnswer(sql("SHOW functions `sha.*`"), Row("sha") :: Row("sha1") :: Row("sha2") :: Nil) +checkAnswer(sql("SHOW functions `sha*`"), Row("sha") :: Row("sha1") :: Row("sha2") :: Nil) --- End diff -- `*` is the wildcard for any character(s) not `.*`. So, I changed this test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12117#issuecomment-205143016 **[Test build #54824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54824/consoleFull)** for PR 12117 at commit [`9d39a83`](https://github.com/apache/spark/commit/9d39a835a25ea2a509fa490099a7923b70e8bcbd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12117#discussion_r58329395 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -61,8 +61,12 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { .filter(regex.matcher(_).matches()).map(Row(_)) } checkAnswer(sql("SHOW functions"), getFunctions(".*")) -Seq("^c.*", ".*e$", "log.*", ".*date.*").foreach { pattern => - checkAnswer(sql(s"SHOW FUNCTIONS '$pattern'"), getFunctions(pattern)) +Seq("^c*", "*e$", "log*", "*date*").foreach { pattern => --- End diff -- `*` is for any character(s) not `.*`. So, I changed this test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12117#issuecomment-205142877 @andrewor14 @viirya @hvanhovell This one is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12117#discussion_r58329324 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -112,4 +124,121 @@ class HiveSessionCatalog( metastoreCatalog.cachedDataSourceTables.getIfPresent(key) } + override def makeFunctionBuilder(funcName: String, className: String): FunctionBuilder = { +makeFunctionBuilder(funcName, Utils.classForName(className)) + } + + /** + * Construct a [[FunctionBuilder]] based on the provided class that represents a function. + */ + private def makeFunctionBuilder(name: String, clazz: Class[_]): FunctionBuilder = { +// When we instantiate hive UDF wrapper class, we may throw exception if the input +// expressions don't satisfy the hive UDF, such as type mismatch, input number +// mismatch, etc. Here we catch the exception and throw AnalysisException instead. +(children: Seq[Expression]) => { + try { +if (classOf[UDF].isAssignableFrom(clazz)) { + val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), children) + udf.dataType // Force it to check input data types. + udf +} else if (classOf[GenericUDF].isAssignableFrom(clazz)) { + val udf = HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), children) + udf.dataType // Force it to check input data types. + udf +} else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) { + val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), children) + udaf.dataType // Force it to check input data types. + udaf +} else if (classOf[UDAF].isAssignableFrom(clazz)) { + val udaf = HiveUDAFFunction( +name, +new HiveFunctionWrapper(clazz.getName), +children, +isUDAFBridgeRequired = true) + udaf.dataType // Force it to check input data types. + udaf +} else if (classOf[GenericUDTF].isAssignableFrom(clazz)) { + val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), children) + udtf.elementTypes // Force it to check input data types. + udtf +} else { + throw new AnalysisException(s"No handler for Hive UDF '${clazz.getCanonicalName}'") +} + } catch { +case ae: AnalysisException => + throw ae +case NonFatal(e) => + val analysisException = +new AnalysisException(s"No handler for Hive UDF '${clazz.getCanonicalName}': $e") + analysisException.setStackTrace(e.getStackTrace) + throw analysisException + } +} + } + + // We have a list of Hive built-in functions that we do not support. So, we will check + // Hive's function registry and lazily load needed functions into our own function registry. + // Those Hive built-in functions are + // assert_true, collect_list, collect_set, compute_stats, context_ngrams, create_union, + // current_user ,elt, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field, + // histogram_numeric, in_file, index, inline, java_method, map_keys, map_values, + // matchpath, ngrams, noop, noopstreaming, noopwithmap, noopwithmapstreaming, + // parse_url, parse_url_tuple, percentile, percentile_approx, posexplode, reflect, reflect2, + // regexp, sentences, stack, std, str_to_map, windowingtablefunction, xpath, xpath_boolean, + // xpath_double, xpath_float, xpath_int, xpath_long, xpath_number, + // xpath_short, and xpath_string. + override def lookupFunction(name: String, children: Seq[Expression]): Expression = { +Try(super.lookupFunction(name, children)) match { + case Success(expr) => expr + case Failure(error) => +if (functionRegistry.functionExists(name)) { + // If the function actually exists in functionRegistry, it means that there is an + // error when we create the Expression using the given children. + // We need to throw the original exception. + throw error +} else { + // This function is not in functionRegistry, let's try to load it as a Hive's + // built-in function. + val functionName = name.toLowerCase + // TODO: This may not really work for current_user because current_user is not evaluated + // with session info. + // We do not need to use executionHive at here because we only load + // Hive's builtin functions, which do not need current db. +
[GitHub] spark pull request: [SPARK-14123] [SQL] Handle CreateFunction/Drop...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12117#discussion_r58329279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -491,40 +528,31 @@ class SessionCatalog( } /** - * Rename a function. - * - * If a database is specified in `oldName`, this will rename the function in that database. - * If no database is specified, this will first attempt to rename a temporary function with - * the same name, then, if that does not exist, rename the function in the current database. - * - * This assumes the database specified in `oldName` matches the one specified in `newName`. - */ - def renameFunction(oldName: FunctionIdentifier, newName: FunctionIdentifier): Unit = { -if (oldName.database != newName.database) { - throw new AnalysisException("rename does not support moving functions across databases") -} -val db = oldName.database.getOrElse(currentDb) -val oldBuilder = functionRegistry.lookupFunctionBuilder(oldName.funcName) -if (oldName.database.isDefined || oldBuilder.isEmpty) { - externalCatalog.renameFunction(db, oldName.funcName, newName.funcName) -} else { - val oldExpressionInfo = functionRegistry.lookupFunction(oldName.funcName).get - val newExpressionInfo = new ExpressionInfo( -oldExpressionInfo.getClassName, -newName.funcName, -oldExpressionInfo.getUsage, -oldExpressionInfo.getExtended) - functionRegistry.dropFunction(oldName.funcName) - functionRegistry.registerFunction(newName.funcName, newExpressionInfo, oldBuilder.get) -} - } - - /** * Return an [[Expression]] that represents the specified function, assuming it exists. * Note: This is currently only used for temporary functions. */ def lookupFunction(name: String, children: Seq[Expression]): Expression = { -functionRegistry.lookupFunction(name, children) +// TODO: Right now, the name can be qualified or not qualified. +// It will be better to get a FunctionIdentifier. +// TODO: Right now, we assume that name is not qualified! --- End diff -- Added in `org.apache.spark.sql.hive.UDFSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14360][SQL] QueryExecution.debug.codege...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12144#issuecomment-205139817 **[Test build #54822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54822/consoleFull)** for PR 12144 at commit [`35d8673`](https://github.com/apache/spark/commit/35d8673753296db909b0ed3d9e537328354bc63d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205139815 **[Test build #54823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54823/consoleFull)** for PR 12142 at commit [`7112bf2`](https://github.com/apache/spark/commit/7112bf2fccebeea8b1d3e491743a102658121dac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14360][SQL] QueryExecution.debug.codege...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/12144 [SPARK-14360][SQL] QueryExecution.debug.codegen() to dump codegen ## What changes were proposed in this pull request? We recently added the ability to dump the generated code for a given query. However, the method is only available through an implicit after an import. It'd slightly simplify things if it can be called directly in queryExecution. ## How was this patch tested? Manually tested in spark-shell. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-14360 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12144.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12144 commit 35d8673753296db909b0ed3d9e537328354bc63d Author: Reynold XinDate: 2016-04-04T05:12:39Z [SPARK-14360][SQL] QueryExecution.debug.codegen() to dump codegen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8171] [Web UI] Javascript based infinit...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/10910#discussion_r58327924 --- Diff: core/src/main/resources/org/apache/spark/ui/static/log-view.js --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var baseParams + +var curLogLength +var startByte +var endByte +var totalLogLength + +var byteLength +var btnHeight = 30 + +function setLogScroll(oldHeight) { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight - oldHeight); +} + +function tailLog() { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight); +} + +function setLogData() { + $('#log-data').html("Showing " + curLogLength + " Bytes: " + startByte ++ " - " + endByte + " of " + totalLogLength); +} + +function disableMoreButton() { + $(".log-more-btn").attr("disabled", "disabled");; + $(".log-more-btn").html("Top of Log"); +} + +function noNewAlert() { + $(".no-new-alert").css("display", "block"); + window.setTimeout(function () {$(".no-new-alert").css("display", "none");}, 4000); +} + +function loadMore() { + var offset = Math.max(startByte - byteLength, 0); + var newLogLength = Math.min(curLogLength + byteLength, totalLogLength); + + $.ajax({ +type: "GET", +url: "/log" + baseParams + "=" + offset + "=" + byteLength, +success: function (data) { + var oldHeight = $(".log-content")[0].scrollHeight; + var dataInfo = data.substring(0, data.indexOf('\n')).match(/\d+/g); + var retStartByte = dataInfo[0]; + var retLogLength = dataInfo[2]; + + var cleanData = data.substring(data.indexOf('\n') + 1).trim(); + if (retStartByte == 0) { +cleanData = cleanData.substring(0, startByte); +disableMoreButton(); + } + $("pre", ".log-content").prepend(cleanData); + + curLogLength = curLogLength + (startByte - retStartByte); + startByte = retStartByte; + totalLogLength = retLogLength; + setLogScroll(oldHeight); + setLogData(); +} + }); +} + +function loadNew() { + $.ajax({ +type: "GET", +url: "/log" + baseParams, +success: function (data) { + var dataInfo = data.substring(0, data.indexOf('\n')).match(/\d+/g); + var newDataLen = dataInfo[2] - totalLogLength; + if (newDataLen != 0) { +$.ajax({ + type: "GET", + url: "/log" + baseParams + "=" + newDataLen, + success: function (data) { +var dataInfo = data.substring(0, data.indexOf('\n')).match(/\d+/g); +var retStartByte = dataInfo[0]; +var retEndByte = dataInfo[1]; +var retLogLength = dataInfo[2]; + +var cleanData = data.substring(data.indexOf('\n') + 1).trim(); +$("pre", ".log-content").append(cleanData); --- End diff -- Need to append `'\n' + cleanData`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8171] [Web UI] Javascript based infinit...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/10910#discussion_r58327903 --- Diff: core/src/main/resources/org/apache/spark/ui/static/log-view.js --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var baseParams + +var curLogLength +var startByte +var endByte +var totalLogLength + +var byteLength +var btnHeight = 30 + +function setLogScroll(oldHeight) { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight - oldHeight); +} + +function tailLog() { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight); +} + +function setLogData() { + $('#log-data').html("Showing " + curLogLength + " Bytes: " + startByte ++ " - " + endByte + " of " + totalLogLength); +} + +function disableMoreButton() { + $(".log-more-btn").attr("disabled", "disabled");; + $(".log-more-btn").html("Top of Log"); +} + +function noNewAlert() { + $(".no-new-alert").css("display", "block"); + window.setTimeout(function () {$(".no-new-alert").css("display", "none");}, 4000); +} + +function loadMore() { + var offset = Math.max(startByte - byteLength, 0); + var newLogLength = Math.min(curLogLength + byteLength, totalLogLength); + + $.ajax({ +type: "GET", +url: "/log" + baseParams + "=" + offset + "=" + byteLength, +success: function (data) { + var oldHeight = $(".log-content")[0].scrollHeight; + var dataInfo = data.substring(0, data.indexOf('\n')).match(/\d+/g); + var retStartByte = dataInfo[0]; + var retLogLength = dataInfo[2]; + + var cleanData = data.substring(data.indexOf('\n') + 1).trim(); + if (retStartByte == 0) { +cleanData = cleanData.substring(0, startByte); +disableMoreButton(); + } + $("pre", ".log-content").prepend(cleanData); --- End diff -- I think we should prepend `cleanData + \n`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205133324 **[Test build #54821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54821/consoleFull)** for PR 12142 at commit [`05e4c5e`](https://github.com/apache/spark/commit/05e4c5ebd7c2d04610b7f9f7e53564d4fb6fe924). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-20516 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54821/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-20515 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8171] [Web UI] Javascript based infinit...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/10910#discussion_r58327788 --- Diff: core/src/main/resources/org/apache/spark/ui/static/log-view.js --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var baseParams + +var curLogLength +var startByte +var endByte +var totalLogLength + +var byteLength +var btnHeight = 30 + +function setLogScroll(oldHeight) { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight - oldHeight); +} + +function tailLog() { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight); +} + +function setLogData() { + $('#log-data').html("Showing " + curLogLength + " Bytes: " + startByte ++ " - " + endByte + " of " + totalLogLength); +} + +function disableMoreButton() { + $(".log-more-btn").attr("disabled", "disabled");; --- End diff -- `;` is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14301][Examples] Java examples code mer...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12143#issuecomment-205133204 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8171] [Web UI] Javascript based infinit...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/10910#discussion_r58327698 --- Diff: core/src/main/resources/org/apache/spark/ui/static/log-view.js --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var baseParams + +var curLogLength +var startByte +var endByte +var totalLogLength + +var byteLength +var btnHeight = 30 + +function setLogScroll(oldHeight) { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight - oldHeight); +} + +function tailLog() { + $(".log-content").scrollTop($(".log-content")[0].scrollHeight); +} --- End diff -- It's not good to repeat a query for the same dom-objects like `$(.log-content)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14301][Examples] Java examples code mer...
GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/12143 [SPARK-14301][Examples] Java examples code merge and clean up. ## What changes were proposed in this pull request? This fix tries to remove duplicate Java code in examples/mllib and examples/ml. The following changes have been made: ``` deleted: ml/JavaCrossValidatorExample.java (duplicate of JavaModelSelectionViaCrossValidationExample.java) deleted: ml/JavaTrainValidationSplitExample.java (duplicated of JavaModelSelectionViaTrainValidationSplitExample.java) deleted: ml/JavaSimpleTextClassificationPipeline.java (duplicate of JavaModelSelectionViaCrossValidationExample.java) deleted: ml/JavaDeveloperApiExample.java (conform to changes in scala/DeveloperApiExample.scala) deleted: mllib/JavaFPGrowthExample.java (duplicate of JavaSimpleFPGrowth.java) deleted: mllib/JavaLDAExample.java (duplicate of JavaLatentDirichletAllocationExample.java) deleted: mllib/JavaKMeans.java (merged with JavaKMeansExample.java) deleted: mllib/JavaLR.java (duplicate of JavaLinearRegressionWithSGDExample.java) updated: mllib/JavaKMeansExample.java (merged with mllib/JavaKMeans.java) ``` ## How was this patch tested? Existing tests passed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yongtang/spark SPARK-14301 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12143.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12143 commit 1fec5e4d0adb7fd4a5c1f36a967a02dcdb1cd6e5 Author: Yong TangDate: 2016-04-03T03:35:31Z [SPARK-14301][Examples] Java examples code merge and clean up. This fix tries to remove duplicate Java code in examples/mllib and examples/ml. The following changes have been made: deleted: ml/JavaCrossValidatorExample.java (->JavaModelSelectionViaCrossValidationExample.java) deleted: ml/JavaTrainValidationSplitExample.java (-> JavaModelSelectionViaTrainValidationSplitExample.java) deleted: ml/JavaSimpleTextClassificationPipeline.java (-> JavaModelSelectionViaCrossValidationExample.java) deleted: ml/JavaDeveloperApiExample.java (conform to changes in scala/DeveloperApiExample.scala) deleted: mllib/JavaFPGrowthExample.java (-> JavaSimpleFPGrowth.java) deleted: mllib/JavaLDAExample.java (-> JavaLatentDirichletAllocationExample.java) deleted: mllib/JavaKMeans.java (merged with JavaKMeansExample.java) deleted: mllib/JavaLR.java (-> JavaLinearRegressionWithSGDExample.java) updated: mllib/JavaKMeansExample.java (merged with mllib/JavaKMeans.java) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58327554 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala --- @@ -43,52 +43,52 @@ import org.apache.spark.sql.execution.aggregate.TypedAggregateExpression * * Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird * - * @tparam I The input type for the aggregation. - * @tparam B The type of the intermediate value of the reduction. - * @tparam O The type of the final output result. + * @tparam Input The input type for the aggregation. + * @tparam Buffer The type of the intermediate value of the reduction. + * @tparam Result The type of the final output result. * @since 1.6.0 */ -abstract class Aggregator[-I, B, O] extends Serializable { +abstract class Aggregator[-Input, Buffer, Result] extends Serializable { --- End diff -- maybe IN, BUF, and OUT we have mostly used upper case type parameters in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8171] [Web UI] Javascript based infinit...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/10910#discussion_r58327525 --- Diff: core/src/main/resources/org/apache/spark/ui/static/log-view.js --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var baseParams + +var curLogLength +var startByte +var endByte +var totalLogLength + +var byteLength +var btnHeight = 30 + --- End diff -- It's better to append `;` to all the variable declarations above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205131833 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12142#issuecomment-205131573 **[Test build #54821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54821/consoleFull)** for PR 12142 at commit [`05e4c5e`](https://github.com/apache/spark/commit/05e4c5ebd7c2d04610b7f9f7e53564d4fb6fe924). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14358] Change SparkListener from a trai...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/12142 [SPARK-14358] Change SparkListener from a trait to an abstract class ## What changes were proposed in this pull request? Scala traits are difficult to maintain binary compatibility on, and as a result we had to introduce JavaSparkListener. In Spark 2.0 we can change SparkListener from a trait to an abstract class and then remove JavaSparkListener. ## How was this patch tested? Updated related unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-14358 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12142 commit 05e4c5ebd7c2d04610b7f9f7e53564d4fb6fe924 Author: Reynold XinDate: 2016-04-04T04:19:32Z [SPARK-14358] Change SparkListener from a trait to an abstract class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12140 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12140#issuecomment-205128729 LGTM - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12136#issuecomment-205125718 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54820/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12136#issuecomment-205125714 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12140#issuecomment-205125307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12136#issuecomment-205125324 **[Test build #54820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54820/consoleFull)** for PR 12136 at commit [`e2aa616`](https://github.com/apache/spark/commit/e2aa616c342db09add70fb2002646476df887381). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12140#issuecomment-205125301 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12140#issuecomment-205124669 **[Test build #54818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54818/consoleFull)** for PR 12140 at commit [`1273d55`](https://github.com/apache/spark/commit/1273d55957f15529ad8c676dc586540e52a179fe). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` implicit class DebugQuery(query: Dataset[_]) extends Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-205122837 Agreed. Thanks @srowen . Reverted calendar changes in DateTimeUtils in recent commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58325719 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf) protected val mapStatuses = new ConcurrentHashMap[Int, Array[MapStatus]]().asScala private val cachedSerializedStatuses = new ConcurrentHashMap[Int, Array[Byte]]().asScala + private val maxRpcMessageSize = RpcUtils.maxMessageSizeBytes(conf) + + // Kept in sync with cachedSerializedStatuses explicitly + // This is required so that the Broadcast variable remains in scope until we remove + // the shuffleId explicitly or implicitly. + private val cachedSerializedBroadcast = new HashMap[Int, Broadcast[Array[Byte]]]() + + // This is to prevent multiple serializations of the same shuffle - which happens when + // there is a request storm when shuffle start. + private val shuffleIdLocks = new ConcurrentHashMap[Int, AnyRef]() + + // requests for map output statuses + private val mapOutputRequests = new LinkedBlockingQueue[GetMapOutputMessage] + + // Thread pool used for handling map output status requests. This is a separate thread pool + // to ensure we don't block the normal dispatcher threads. + private val threadpool: ThreadPoolExecutor = { +val numThreads = conf.getInt("spark.shuffle.mapOutput.dispatcher.numThreads", 8) +val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "map-output-dispatcher") +for (i <- 0 until numThreads) { + pool.execute(new MessageLoop) +} +pool + } + + def post(message: GetMapOutputMessage): Unit = { +mapOutputRequests.offer(message) + } + + /** Message loop used for dispatching messages. */ + private class MessageLoop extends Runnable { +override def run(): Unit = { + try { +while (true) { + try { +val data = mapOutputRequests.take() + if (data == PoisonPill) { + // Put PoisonPill back so that other MessageLoops can see it. + mapOutputRequests.offer(PoisonPill) + return +} +val context = data.context +val shuffleId = data.shuffleId +val hostPort = context.senderAddress.hostPort +logDebug("Handling request to send map output locations for shuffle " + shuffleId + + " to " + hostPort) +val mapOutputStatuses = getSerializedMapOutputStatuses(shuffleId) --- End diff -- Here is time-consuming and may cause a timeout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12141#issuecomment-205113031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12141#issuecomment-205113029 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12141#issuecomment-205112805 **[Test build #54819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54819/consoleFull)** for PR 12141 at commit [`9c63aeb`](https://github.com/apache/spark/commit/9c63aebcef3603303f70ef5346475ac1dce86116). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58324785 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,93 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf) } } + private def removeBroadcast(bcast: Broadcast[_]): Unit = { +if (null != bcast) { + broadcastManager.unbroadcast(bcast.id, +removeFromDriver = true, blocking = false) +} + } + + private def clearCachedBroadcast(): Unit = { +for (cached <- cachedSerializedBroadcast) removeBroadcast(cached._2) +cachedSerializedBroadcast.clear() + } + def getSerializedMapOutputStatuses(shuffleId: Int): Array[Byte] = { var statuses: Array[MapStatus] = null var epochGotten: Long = -1 epochLock.synchronized { if (epoch > cacheEpoch) { cachedSerializedStatuses.clear() +clearCachedBroadcast() cacheEpoch = epoch } cachedSerializedStatuses.get(shuffleId) match { case Some(bytes) => return bytes case None => + logDebug("cached status not found for : " + shuffleId) statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]()) epochGotten = epoch } } -// If we got here, we failed to find the serialized locations in the cache, so we pulled -// out a snapshot of the locations as "statuses"; let's serialize and return that -val bytes = MapOutputTracker.serializeMapStatuses(statuses) -logInfo("Size of output statuses for shuffle %d is %d bytes".format(shuffleId, bytes.length)) -// Add them into the table only if the epoch hasn't changed while we were working -epochLock.synchronized { - if (epoch == epochGotten) { -cachedSerializedStatuses(shuffleId) = bytes + +var shuffleIdLock = shuffleIdLocks.get(shuffleId) +if (null == shuffleIdLock) { + val newLock = new Object() + // in general, this condition should be false - but good to be paranoid + val prevLock = shuffleIdLocks.putIfAbsent(shuffleId, newLock) + shuffleIdLock = if (null != prevLock) prevLock else newLock +} +val newbytes = shuffleIdLock.synchronized { + + // double check to make sure someone else didn't serialize and cache the same + // mapstatus while we were waiting on the synchronize + epochLock.synchronized { +if (epoch > cacheEpoch) { + cachedSerializedStatuses.clear() + clearCachedBroadcast() + cacheEpoch = epoch +} +cachedSerializedStatuses.get(shuffleId) match { + case Some(bytes) => +return bytes + case None => +logDebug("shuffle lock cached status not found for : " + shuffleId) +statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]()) +epochGotten = epoch +} + } + + // If we got here, we failed to find the serialized locations in the cache, so we pulled + // out a snapshot of the locations as "statuses"; let's serialize and return that + val (bytes, bcast) = MapOutputTracker.serializeMapStatuses(statuses, broadcastManager, +isLocal, minSizeForBroadcast) + logInfo("Size of output statuses for shuffle %d is %d bytes".format(shuffleId, bytes.length)) + // Add them into the table only if the epoch hasn't changed while we were working + epochLock.synchronized { +if (epoch == epochGotten) { + cachedSerializedStatuses(shuffleId) = bytes + if (null != bcast) cachedSerializedBroadcast(shuffleId) = bcast +} else { + logInfo("Epoch changed, not caching!") + removeBroadcast(bcast) +} } + bytes } -bytes +newbytes } override def stop() { +mapOutputRequests.offer(PoisonPill) +threadpool.shutdown() sendTracker(StopMapOutputTracker) mapStatuses.clear() trackerEndpoint = null cachedSerializedStatuses.clear() +clearCachedBroadcast() --- End diff -- `BroadcastManager` has stopped, right? There may be output the following log: `SparkListenerBus has already stopped! Dropping event .` LiveListenerBus: ```scala def post(event: SparkListenerEvent): Unit = { if (stopped.get) { // Drop further events to make `listenerThread` exit ASAP logError(s"$name has already stopped! Dropping event $event") return } val eventAdded =
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12141#issuecomment-205108060 **[Test build #54819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54819/consoleFull)** for PR 12141 at commit [`9c63aeb`](https://github.com/apache/spark/commit/9c63aebcef3603303f70ef5346475ac1dce86116). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12136#issuecomment-205108064 **[Test build #54820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54820/consoleFull)** for PR 12136 at commit [`e2aa616`](https://github.com/apache/spark/commit/e2aa616c342db09add70fb2002646476df887381). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/12141 [SPARK-14353] Dataset Time Window `window` API for R ## What changes were proposed in this pull request? The `window` function was added to Dataset with [this PR](https://github.com/apache/spark/pull/12008). This PR adds the R API for this function. With this PR, SQL, Java, and Scala will share the same APIs as in users can use: - `window(timeColumn, windowDuration)` - `window(timeColumn, windowDuration, slideDuration)` - `window(timeColumn, windowDuration, slideDuration, startTime)` In Python and R, users can access all APIs above, but in addition they can do - In R: `window(timeColumn, windowDuration, startTime=...)` that is, they can provide the startTime without providing the `slideDuration`. In this case, we will generate tumbling windows. ## How was this patch tested? Unit tests + manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/brkyvz/spark R-windows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12141.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12141 commit 9c63aebcef3603303f70ef5346475ac1dce86116 Author: Burak YavuzDate: 2016-04-04T02:39:41Z R support for time windowing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-205107476 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54817/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-205107474 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-205107412 **[Test build #54817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54817/consoleFull)** for PR 10382 at commit [`3bb5ea3`](https://github.com/apache/spark/commit/3bb5ea3007d89a58d9eb1925f3334c7700ab71e3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12140#issuecomment-205106285 **[Test build #54818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54818/consoleFull)** for PR 12140 at commit [`1273d55`](https://github.com/apache/spark/commit/1273d55957f15529ad8c676dc586540e52a179fe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14356] Update spark.sql.execution.debug...
GitHub user mateiz opened a pull request: https://github.com/apache/spark/pull/12140 [SPARK-14356] Update spark.sql.execution.debug to work on Datasets ## What changes were proposed in this pull request? Update DebugQuery to work on Datasets of any type, not just DataFrames. ## How was this patch tested? Added unit tests, checked in spark-shell. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mateiz/spark debug-dataset Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12140 commit 1273d55957f15529ad8c676dc586540e52a179fe Author: Matei ZahariaDate: 2016-04-04T02:24:34Z [SPARK-14356] Update spark.sql.execution.debug to work on Datasets --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205095821 Thank you, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12139 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205094931 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205088936 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205088940 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54815/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205088457 **[Test build #54815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54815/consoleFull)** for PR 12139 at commit [`f173495`](https://github.com/apache/spark/commit/f173495288643f3ce1d28ee9d80e4c985e8803f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-205087601 **[Test build #54816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54816/consoleFull)** for PR 12124 at commit [`b17cc7b`](https://github.com/apache/spark/commit/b17cc7b8cb33af7bebb444832a2b7fd9e961ea93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-205087839 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-205087844 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54816/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-205086680 **[Test build #54817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54817/consoleFull)** for PR 10382 at commit [`3bb5ea3`](https://github.com/apache/spark/commit/3bb5ea3007d89a58d9eb1925f3334c7700ab71e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-205086565 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12016 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12016#issuecomment-205086388 Merged to master, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205086218 Thank you, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-205086068 **[Test build #54816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54816/consoleFull)** for PR 12124 at commit [`b17cc7b`](https://github.com/apache/spark/commit/b17cc7b8cb33af7bebb444832a2b7fd9e961ea93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205086070 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12105#discussion_r58320790 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -59,6 +59,8 @@ object DateTimeUtils { final val toYearZero = to2001 + 7304850 final val TimeZoneGMT = TimeZone.getTimeZone("GMT") + lazy val c = Calendar.getInstance(TimeZone.getTimeZone("GMT")) --- End diff -- Yes if SimpleDateFormat isn't used in multiple threads and you're sure it can't be, then caching it in the instance is OK. This however isn't thread-safe. The Calendar would be shared across threads. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-205082821 SDF declared in the generated code is not shared in multiple threads. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205076952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54814/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205076951 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205076894 **[Test build #54814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54814/consoleFull)** for PR 12134 at commit [`0b786c0`](https://github.com/apache/spark/commit/0b786c037c28f6ba633580d8492380ad59a6b1e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205075604 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205075605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54812/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205075558 **[Test build #54812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54812/consoleFull)** for PR 12134 at commit [`237b73b`](https://github.com/apache/spark/commit/237b73bee8f95feae8fb38d13b2e79c25ef70bc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205074291 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205074292 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54811/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205074223 **[Test build #54811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54811/consoleFull)** for PR 12134 at commit [`6b1c9fb`](https://github.com/apache/spark/commit/6b1c9fb70ddfab957cceda0f65e401500347319e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12117#issuecomment-205070725 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12117#issuecomment-205070732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54813/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12117#issuecomment-205070488 **[Test build #54813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54813/consoleFull)** for PR 12117 at commit [`776c09a`](https://github.com/apache/spark/commit/776c09afd8f448c78b063de510b14434764d72cd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user koertkuipers commented on a diff in the pull request: https://github.com/apache/spark/pull/11980#discussion_r58319194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -120,17 +120,19 @@ object RowEncoder { case StructType(fields) => val convertedFields = fields.zipWithIndex.map { case (f, i) => -val method = if (f.dataType.isInstanceOf[StructType]) { - "getStruct" +val x = extractorsFor( + GetExternalRowField(inputObject, i, externalDataTypeForInput(f.dataType), f.nullable), + f.dataType +) +if (f.nullable) { + If( --- End diff -- if we do the null check inside GetExternalRowField then the code for serializerFor also needs to be pushed into it (to be inside the null check), and i can not figure out how to do that... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12139#issuecomment-205068265 **[Test build #54815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54815/consoleFull)** for PR 12139 at commit [`f173495`](https://github.com/apache/spark/commit/f173495288643f3ce1d28ee9d80e4c985e8803f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14355][BUILD] Fix typos in Exception/Te...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12139 [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results ## What changes were proposed in this pull request? This issue contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines). - Fix typos(exception/log strings, testcase name, comments) in 44 lines. - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011) - Use diamond operators in 40 lines. (New codes after SPARK-13702) - Fix redundant semicolon in 5 lines. - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. ## How was this patch tested? Manual and pass the Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-14355 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12139 commit f173495288643f3ce1d28ee9d80e4c985e8803f5 Author: Dongjoon HyunDate: 2016-04-03T22:20:30Z [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results * Fix typos(exception strings, testcase name, comments) in 44 lines. * Fix lint-java errors (MaxLineLength) in 6 lines. * Use diamond operators in 40 lines. * Fix redundant semicolon in 5 lines. * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-205060651 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-205060653 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54809/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12134#issuecomment-205060446 **[Test build #54814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54814/consoleFull)** for PR 12134 at commit [`0b786c0`](https://github.com/apache/spark/commit/0b786c037c28f6ba633580d8492380ad59a6b1e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-205060396 **[Test build #54809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54809/consoleFull)** for PR 11980 at commit [`e9a9a30`](https://github.com/apache/spark/commit/e9a9a30e1804785d3534bea78cf2ce588f7fc51b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58318012 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -133,6 +133,16 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } } + override def visitCreateFileFormat( + ctx: CreateFileFormatContext): CatalogStorageFormat = withOrigin(ctx) { +// Create the predicate. --- End diff -- NP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58318013 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -133,6 +133,16 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } } + override def visitCreateFileFormat( --- End diff -- Yeah. Multiple test cases failed. For example, ``` test("SPARK-8811: compatibility with array of struct in Hive") ``` in parquetSuites.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12134#discussion_r58317991 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala --- @@ -133,6 +133,16 @@ class HiveSqlAstBuilder extends SparkSqlAstBuilder { } } + override def visitCreateFileFormat( --- End diff -- Nvm I found the test which fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org