[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14476 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14476 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63212/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14476 **[Test build #63212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63212/consoleFull)** for PR 14476 at commit [`2093906`](https://github.com/apache/spark/commit/20939066b99fd5892a123177deafe24bfb7607d0). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit check thre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14470 **[Test build #63213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63213/consoleFull)** for PR 14470 at commit [`df5af72`](https://github.com/apache/spark/commit/df5af7247960e44281ec64bb141c7a499eaa80cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14474: [SPARK-16853][SQL] fixes encoder error in DataSet...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14474#discussion_r73465751 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -184,6 +184,17 @@ class DatasetSuite extends QueryTest with SharedSQLContext { 2, 3, 4) } + test("SPARK-16853: select, case class and tuple") { --- End diff -- how about `typed select that returns case class or tuple`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14467 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14467 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63205/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14467 **[Test build #63205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63205/consoleFull)** for PR 14467 at commit [`cc5f435`](https://github.com/apache/spark/commit/cc5f4352950f338afeecf1e4f5eaceae853b1520). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14476 **[Test build #63212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63212/consoleFull)** for PR 14476 at commit [`2093906`](https://github.com/apache/spark/commit/20939066b99fd5892a123177deafe24bfb7607d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit che...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/14470#discussion_r73465384 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -84,6 +84,13 @@ class DecisionTreeClassifier @Since("1.4.0") ( val categoricalFeatures: Map[Int, Int] = MetadataUtils.getCategoricalFeatures(dataset.schema($(featuresCol))) val numClasses: Int = getNumClasses(dataset) + +if (isDefined(thresholds)) { + require($(thresholds).length == numClasses, this.getClass.getSimpleName + --- End diff -- Because `ProbabilisticClassificationModel.transform` first check like this, so I just follow this style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit che...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/14470#discussion_r73465400 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -101,6 +101,14 @@ class NaiveBayes @Since("1.5.0") ( setDefault(modelType -> OldNaiveBayes.Multinomial) override protected def train(dataset: Dataset[_]): NaiveBayesModel = { +val numClasses: Int = getNumClasses(dataset) --- End diff -- Thanks, I will remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14486: [SQL][SPARK-16888] Implements eval method for expression...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14486 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14486: [SQL][SPARK-16888] Implements eval method for exp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14486 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73465089 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala --- @@ -19,50 +19,25 @@ package org.apache.spark.sql.execution.datasources import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.catalog.CatalogTable import org.apache.spark.sql.catalyst.expressions.Attribute -import org.apache.spark.sql.catalyst.plans.logical import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.command.RunnableCommand import org.apache.spark.sql.types._ +case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan]) + extends LogicalPlan { + assert(tableDesc.provider.isDefined, "The table to be created must have a provider.") -/** - * Used to represent the operation of create table using a data source. - * - * @param allowExisting If it is true, we will do nothing when the table already exists. - * If it is false, an exception will be thrown - */ -case class CreateTableUsing( -tableIdent: TableIdentifier, -userSpecifiedSchema: Option[StructType], -provider: String, -temporary: Boolean, -options: Map[String, String], -partitionColumns: Array[String], -bucketSpec: Option[BucketSpec], -allowExisting: Boolean, -managedIfNoPath: Boolean) extends LogicalPlan with logical.Command { - - override def output: Seq[Attribute] = Seq.empty - override def children: Seq[LogicalPlan] = Seq.empty -} + if (query.isEmpty) { +assert( + mode == SaveMode.ErrorIfExists || mode == SaveMode.Ignore, + "create table without data insertion can only use ErrorIfExists or Ignore as SaveMode.") + } -/** - * A node used to support CTAS statements and saveAsTable for the data source API. - * This node is a [[logical.UnaryNode]] instead of a [[logical.Command]] because we want the - * analyzer can analyze the logical plan that will be used to populate the table. - * So, [[PreWriteCheck]] can detect cases that are not allowed. - */ -case class CreateTableUsingAsSelect( -tableIdent: TableIdentifier, -provider: String, -partitionColumns: Array[String], -bucketSpec: Option[BucketSpec], -mode: SaveMode, -options: Map[String, String], -child: LogicalPlan) extends logical.UnaryNode { override def output: Seq[Attribute] = Seq.empty[Attribute] + + override def children: Seq[LogicalPlan] = query.toSeq --- End diff -- This is great! Sometimes, the plan of `query` could be not analyzed at the end. This resolves an existing bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14478: [SPARK-16875][SQL] Add args checking for DataSet randomS...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/14478 @srowen Right. I just add those checking for RDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73464594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -154,6 +274,21 @@ private[sql] case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) def apply(plan: LogicalPlan): Unit = { plan.foreach { + case c @ CreateTable(tableDesc, mode, query) if c.resolved => +// Since we are saving table metadata to metastore, we should make sure the table name +// and database name don't break some common restrictions, e.g. special chars except +// underscore are not allowed. +val pattern = Pattern.compile("[\\w_]+") --- End diff -- cc @hvanhovell , I think this is the only place that we need this check, as `CreateTable` is the only plan that can save a table metadata into metastore. what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14482 cc @gatorsmile , IIRC, you have some PRs about error handling. After this PR, we can have a centre place for basic error handling, is it good enough for all the error cases you found? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14480: [MINOR][SQL] Fix minor formatting issue of SortAg...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14482 **[Test build #63211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63211/consoleFull)** for PR 14482 at commit [`ec47911`](https://github.com/apache/spark/commit/ec479111f18257286e09723049badf402b1fad1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73464308 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -206,22 +206,22 @@ private[sql] case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) // The relation in l is not an InsertableRelation. failAnalysis(s"$l does not allow insertion.") - case c: CreateTableUsingAsSelect => + case CreateTable(tableDesc, mode, Some(query)) => --- End diff -- Now this rule only checks `if the table is an input table of the query`, it won't do anything for hive serde tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73464228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -62,6 +66,122 @@ private[sql] class ResolveDataSource(sparkSession: SparkSession) extends Rule[Lo } /** + * Preprocess some DDL plans, e.g. [[CreateTable]], to do some normalization and checking. + */ +case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +// When we CREATE TABLE without specifying the table schema, we should fail the query if +// bucketing information is specified, as we can't infer bucketing from data files currently, +// and we should ignore the partition columns if it's specified, as we will infer it later, at +// runtime. +case c @ CreateTable(tableDesc, _, None) if tableDesc.schema.isEmpty => + if (tableDesc.bucketSpec.isDefined) { +failAnalysis("Cannot specify bucketing information if the table schema is not specified " + + "when creating and will be inferred at runtime") + } + + val partitionColumnNames = tableDesc.partitionColumnNames + if (partitionColumnNames.nonEmpty) { +// The table does not have a specified schema, which means that the schema will be inferred +// at runtime. So, we are not expecting partition columns and we will discover partitions +// at runtime. However, if there are specified partition columns, we simply ignore them and +// provide a warning message. +logWarning( + s"Specified partition columns (${partitionColumnNames.mkString(",")}) will be " + +s"ignored. The schema and partition columns of table ${tableDesc.identifier} will " + +"be inferred.") +c.copy(tableDesc = tableDesc.copy(partitionColumnNames = Nil)) + } else { +c + } + +// Here we normalize partition, bucket and sort column names, w.r.t. the case sensitivity +// config, and do various checks: +// * column names in table definition can't be duplicated. +// * partition, bucket and sort column names must exist in table definition. +// * partition, bucket and sort column names can't be duplicated. +// * can't use all table columns as partition columns. +// * partition columns' type must be AtomicType. +// * sort columns' type must be orderable. --- End diff -- cc @gatorsmile , I think all this checks are general and can be applied to hive serde tables too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14480: [MINOR][SQL] Fix minor formatting issue of SortAggregate...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14480 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73464090 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -420,45 +420,40 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { object DDLStrategy extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case c: CreateTableUsing if c.temporary && !c.allowExisting => -logWarning( - s"CREATE TEMPORARY TABLE ${c.tableIdent.identifier} USING... is deprecated, " + -s"please use CREATE TEMPORARY VIEW viewName USING... instead") -ExecutedCommandExec( - CreateTempViewUsing( -c.tableIdent, c.userSpecifiedSchema, replace = true, c.provider, c.options)) :: Nil - - case c: CreateTableUsing if !c.temporary => + case CreateTable(tableDesc, mode, None) if tableDesc.provider.get == "hive" => --- End diff -- no, the `provider` is always defined, see https://github.com/apache/spark/pull/14482/files#diff-ea32a127bbe0c2bab24b0bbc8c333982R30 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14482 **[Test build #63209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63209/consoleFull)** for PR 14482 at commit [`108e385`](https://github.com/apache/spark/commit/108e3859d81391def31a381518a072a21f6c4567). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14478: [SPARK-16875][SQL] Add args checking for DataSet randomS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14478 **[Test build #63210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63210/consoleFull)** for PR 14478 at commit [`4b0efea`](https://github.com/apache/spark/commit/4b0efea99f2d3e48124040b5f1cace28d603e386). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14472 **[Test build #3201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3201/consoleFull)** for PR 14472 at commit [`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63203/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14492 **[Test build #63203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63203/consoleFull)** for PR 14492 at commit [`ffc0e4a`](https://github.com/apache/spark/commit/ffc0e4a363968fa62a592f96e37669ca1bcbf099). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14492 More specifically, the Spark distribution has the jars needed by the launcher in `$SPARK_HOME/jars`, so basically this is extra code in Spark to support non-standard distributions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #63208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63208/consoleFull)** for PR 12135 at commit [`785a667`](https://github.com/apache/spark/commit/785a66703cfe4b2de29047994ab6b0bb38065c43). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14492 But what's the goal of that? If there's nothing in `$SPARK_HOME/jars`, why not create a symlink instead to the location where the jars are? The change itself doesn't really cause any problems, I just don't understand the need. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12135: [SPARK-14352][SQL] approxQuantile should support ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12135#discussion_r73462027 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1181,18 +1181,33 @@ def approxQuantile(self, col, probabilities, relativeError): Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna. -:param col: the name of the numerical column +:param col: the name of the numerical column, or a list/tuple of + numerical columns. :param probabilities: a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum. :param relativeError: The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1. -:return: the approximate quantiles at the given probabilities +:return: the approximate quantiles at the given probabilities. If + the input `col` is a string, the output is a list of float. If the + input `col` is a list or tuple of strings, the output is also a + list, but each element in it is a list of float, i.e., the output + is a list of list of float. """ -if not isinstance(col, str): -raise ValueError("col should be a string.") +if not isinstance(col, (str, list, tuple)): +raise ValueError("col should be a string, list or tuple.") + +isStr = isinstance(col, str) --- End diff -- Thanks for helping to review this PR, it is quite a while. The type of `col` detemine the type of return. If I make `col = [col]` here, I will not know whether to return a `list` or a `list of list`. Like this: ``` >>> dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") >>> dataset.stat.approxQuantile(['label'], [0.1,0.2], 0.1) [[0.0, 1.0]] >>> dataset.stat.approxQuantile('label', [0.1,0.2], 0.1) [0.0, 1.0] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73461786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -233,12 +233,11 @@ private[sql] case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) } PartitioningUtils.validatePartitionColumn( - c.child.schema, c.partitionColumns, conf.caseSensitiveAnalysis) + query.schema, tableDesc.partitionColumnNames, conf.caseSensitiveAnalysis) for { - spec <- c.bucketSpec - sortColumnName <- spec.sortColumnNames - sortColumn <- c.child.schema.find(_.name == sortColumnName) + spec <- tableDesc.bucketSpec + sortColumn <- tableDesc.schema.filter(spec.sortColumnNames.contains) --- End diff -- Below is the logics for bucketing tables. If we do not plan to support Hive bucketing tables, maybe we just issue an exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14490 **[Test build #63207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63207/consoleFull)** for PR 14490 at commit [`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73461643 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -233,12 +233,11 @@ private[sql] case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) } PartitioningUtils.validatePartitionColumn( - c.child.schema, c.partitionColumns, conf.caseSensitiveAnalysis) + query.schema, tableDesc.partitionColumnNames, conf.caseSensitiveAnalysis) --- End diff -- `validatePartitionColumn` is for data source tables only. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14490 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63201/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14490 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73461508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -206,22 +206,22 @@ private[sql] case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) // The relation in l is not an InsertableRelation. failAnalysis(s"$l does not allow insertion.") - case c: CreateTableUsingAsSelect => + case CreateTable(tableDesc, mode, Some(query)) => --- End diff -- Previously, this is only applicable to Data Source tables. After this change, this is also applicable to Create Hive Table As Select. Thus, some validation might not be right to Hive tables. We have to be careful to check them one by one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14490 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14490 **[Test build #63201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63201/consoleFull)** for PR 14490 at commit [`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73460838 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- Yes, I just ran a test after cloning [scalariform](https://github.com/scala-ide/scalariform) which [scalastyle](https://github.com/scalastyle/scalastyle) uses as below: ``` ScalaLexer.rawTokenise("@Override") ``` It seems this becomes ![2016-08-04 1 29 11](https://cloud.githubusercontent.com/assets/6477701/17390365/78afcb96-5a47-11e6-9e9f-8d9a2d6c4ddf.png) different tokens. (BTW, maybe we should avoid to write `@Override` as it is.. I started to feel guilty for cc him/her here and there) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14492 Sure. This change is for putting Spark jars in a different dir than the default dir in `spark/assembly` or `spark/jars`. So, in this case, the main class is not in `SPARK_JARS_DIR`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14065 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63200/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14065 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14065 **[Test build #63200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63200/consoleFull)** for PR 14065 at commit [`127d85e`](https://github.com/apache/spark/commit/127d85ed54f057581a35c88fc7f85e1b8e13de38). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14474 **[Test build #63206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63206/consoleFull)** for PR 14474 at commit [`3d90a68`](https://github.com/apache/spark/commit/3d90a68d84ff55249fb50c463a2bc0674d6fc79b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14467 **[Test build #63205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63205/consoleFull)** for PR 14467 at commit [`cc5f435`](https://github.com/apache/spark/commit/cc5f4352950f338afeecf1e4f5eaceae853b1520). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14474 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63204/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14474 **[Test build #63204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63204/consoleFull)** for PR 14474 at commit [`d9b5a40`](https://github.com/apache/spark/commit/d9b5a40d2d28d9e2fcc0f0605550f57b37634a0c). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14474 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/14473 @MLnick please have a look at this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63202/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73459086 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- I just reprored this locally on my machine as well (doesn't trigger with `@Override` or `\@Override`). One guess is the token checker has split them into separate tokens perhaps? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #63202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63202/consoleFull)** for PR 14452 at commit [`7fe57a0`](https://github.com/apache/spark/commit/7fe57a0666f5d5f489d5b09a6cc20f784611dcf8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14428: [SPARK-16810] Refactor registerSinks with multiple const...
Github user lovexi commented on the issue: https://github.com/apache/spark/pull/14428 I think this one is ready for reviews. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73458908 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- BTW, actually I tried `RegexChecker` as well to grep this case but then I found actually I should come up with a complex regular expression for some exceptional cases such as - `@Override` in comments ```scala /** ... *@Override *public void close(Throwable errorOrNull) { * // close the connection *} ... ``` - `@Override` in codegen ```scala ... " @Override public String toString() { return \"" + toStringValue + "\"; }}" ... ``` So, I had to use `TokenChecker`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14156 That's the question indeed. I'm not sure because the function that's supplied could be anything. I don't see how it could automatically be converted to a vectorized operation automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 yeah, currently it seems to make a little overhead (do a copy), but I think it will take advantage of breeze optimization, in the future, e.g, SIMD instructions or something ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...
Github user sylvinus commented on a diff in the pull request: https://github.com/apache/spark/pull/14488#discussion_r73458498 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression]) Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) } - private def getUrl(url: UTF8String): URL = { + private def getUrl(url: UTF8String): URI = { try { - new URL(url.toString) + new URI(url.toString) } catch { - case e: MalformedURLException => null + case e: URISyntaxException => null } } - private def getExtractPartFunc(partToExtract: UTF8String): URL => String = { + private def getExtractPartFunc(partToExtract: UTF8String): URI => String = { + +// partToExtract match { +// case HOST => _.toURL().getHost +// case PATH => _.toURL().getPath +// case QUERY => _.toURL().getQuery +// case REF => _.toURL().getRef +// case PROTOCOL => _.toURL().getProtocol +// case FILE => _.toURL().getFile +// case AUTHORITY => _.toURL().getAuthority +// case USERINFO => _.toURL().getUserInfo +// case _ => (url: URI) => null +// } + partToExtract match { case HOST => _.getHost - case PATH => _.getPath - case QUERY => _.getQuery - case REF => _.getRef - case PROTOCOL => _.getProtocol - case FILE => _.getFile - case AUTHORITY => _.getAuthority - case USERINFO => _.getUserInfo - case _ => (url: URL) => null + case PATH => _.getRawPath + case QUERY => _.getRawQuery + case REF => _.getRawFragment + case PROTOCOL => _.getScheme + case FILE => +(url: URI) => + if (url.getRawQuery ne null) { --- End diff -- It does seem so: ``` scala> new URL("http://example.com/path%20?query=x%20#hash%20;).getQuery() res1: String = query=x%20 scala> new URL("http://example.com/path%20?query=x%20#hash%20;).getRef() res2: String = hash%20 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73458483 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- Hm so this matches "@Override" but not "Override" now, but would reverse if the regex included "@"? That sounds flipped. @ isn't a special char. You're doubly sure that's right? It also seems like this is matching "Override" on a line alone but should be looking for "@Override" anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14474 **[Test build #63204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63204/consoleFull)** for PR 14474 at commit [`d9b5a40`](https://github.com/apache/spark/commit/d9b5a40d2d28d9e2fcc0f0605550f57b37634a0c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14492 @yhuai can you explain the use case you're trying to cover here with an example? `LAUNCH_CLASSPATH` is the classpath of the launcher process (the process that creates the command line to then run the `SparkSubmit` class). The launcher itself already adds `SPARK_DIST_CLASSPATH` to Spark's classpath: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java#L204 ``` addToClassPath(cp, getenv("HADOOP_CONF_DIR")); addToClassPath(cp, getenv("YARN_CONF_DIR")); addToClassPath(cp, getenv("SPARK_DIST_CLASSPATH")); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14445 @rxin maybe we can still use this Jira Id by @maver1ck's comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14156 I see, this copies x to y then modifies y in place. OK. Is that more efficient? it seems like extra work, but does the transform method make up for it? just seeing if this has actually been observed to speed it up or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14476: [SPARK-16867][SQL] createTable and alterTable in ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14476#discussion_r73458295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -82,7 +82,7 @@ abstract class ExternalCatalog { * Note: If the underlying implementation does not support altering a certain field, * this becomes a no-op. */ - def alterTable(db: String, tableDefinition: CatalogTable): Unit + def alterTable(tableDefinition: CatalogTable): Unit --- End diff -- Let's add doc to explain that it does not support moving a table to another db since a developer want to use it in this way by just looking at this API (tableDefinition has a db field). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73458290 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -367,15 +368,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { throw new AnalysisException(s"Table $tableIdent already exists.") case _ => -val cmd = - CreateTableUsingAsSelect( -tableIdent, -source, - partitioningColumns.map(_.toArray).getOrElse(Array.empty[String]), -getBucketSpec, -mode, -extraOptions.toMap, -df.logicalPlan) +val tableDesc = CatalogTable( + identifier = tableIdent, + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat.empty.copy(properties = extraOptions.toMap), + schema = new StructType, + provider = Some(source), + partitionColumnNames = partitioningColumns.getOrElse(Nil), + bucketSpec = getBucketSpec +) +val cmd = CreateTable(tableDesc, mode, Some(df.logicalPlan)) --- End diff -- hmmm, do we have to use `Option` even though the parameter is guaranteed to be not null? For this case, we can't use `Option`, or the behaviour will be changed. Previously if `df.logicalPlan` is null, it's a bug and we will throw NPE somewhere. If we use `Option` here, then we are silently converting a CTAS to CREATE TABLE, which is not expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73458250 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- I actually tried `^@Override$` first but I found this Scala checking recognise this token as just `Override`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14490#discussion_r73458156 --- Diff: scalastyle-config.xml --- @@ -250,6 +250,14 @@ This file is divided into 3 sections: Omit braces in case clauses. + + +^Override$ --- End diff -- Does this need to look for a leading "@" as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 @srowen The := operator in BDM is simply copy one BDM to another, and it is widely used in breeze source, e.g, we can check DenseMatrix.copy function in Breeze: it first use `DenseMatrix.create` to create a new Matrix with the same dimension `val result = DenseMatrix.create(...)` , and them use `result := this` to copy self into the matrix just created. The mechanism of := operator for DenseMatrix is that the DenseMatrix implements the `OpSet` trait. check `DenseMatrix` source file in breeze, in line 985, there is: implicit val setMV_D:OpSet.InPlaceImpl2[...] = new SetDMDVOp[Double]() so, the implementation code is in `SetDMDVOp` class and we can see that in `SetDMDVOp` it do Type Specialization for Double type so that the compiling code will have high efficiency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14477: [SPARK-16870][docs]Summary:add "spark.sql.broadcastTimeo...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14477 Any objection to documenting it @liancheng ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14488#discussion_r73457928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression]) Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) } - private def getUrl(url: UTF8String): URL = { + private def getUrl(url: UTF8String): URI = { try { - new URL(url.toString) + new URI(url.toString) } catch { - case e: MalformedURLException => null + case e: URISyntaxException => null } } - private def getExtractPartFunc(partToExtract: UTF8String): URL => String = { + private def getExtractPartFunc(partToExtract: UTF8String): URI => String = { + +// partToExtract match { +// case HOST => _.toURL().getHost +// case PATH => _.toURL().getPath +// case QUERY => _.toURL().getQuery +// case REF => _.toURL().getRef +// case PROTOCOL => _.toURL().getProtocol +// case FILE => _.toURL().getFile +// case AUTHORITY => _.toURL().getAuthority +// case USERINFO => _.toURL().getUserInfo +// case _ => (url: URI) => null +// } + partToExtract match { case HOST => _.getHost - case PATH => _.getPath - case QUERY => _.getQuery - case REF => _.getRef - case PROTOCOL => _.getProtocol - case FILE => _.getFile - case AUTHORITY => _.getAuthority - case USERINFO => _.getUserInfo - case _ => (url: URL) => null + case PATH => _.getRawPath + case QUERY => _.getRawQuery + case REF => _.getRawFragment + case PROTOCOL => _.getScheme + case FILE => +(url: URI) => + if (url.getRawQuery ne null) { --- End diff -- We really need the 'raw' elements in each of these? I'd think they need to be parsed. Your tests show this code not parsing escapes. Is that how it behaved before? then OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14488#discussion_r73457856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression]) Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) } - private def getUrl(url: UTF8String): URL = { + private def getUrl(url: UTF8String): URI = { try { - new URL(url.toString) + new URI(url.toString) } catch { - case e: MalformedURLException => null + case e: URISyntaxException => null --- End diff -- Don't change this unless you need to make another change anyway, but this can be `case _: ...` I know it wasn't like that before --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73457639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -61,6 +64,38 @@ trait CheckAnalysis extends PredicateHelper { } } + private def checkColumnNames(tableDesc: CatalogTable): Unit = { +val colNames = tableDesc.schema.map(_.name) +val colNamesSet = colNames.toSet +checkDuplicatedColumnNames(colNames, colNamesSet, "table definition of " + tableDesc.identifier) + +def requireSubsetOfSchema(subColNames: Seq[String], colType: String): Unit = { + val subColNamesSet = subColNames.toSet + checkDuplicatedColumnNames(subColNames, subColNamesSet, colType) + if (!subColNamesSet.subsetOf(colNamesSet)) { +failAnalysis(s"$colType columns (${subColNames.mkString(", ")}) must be a subset of " + + s"schema (${colNames.mkString(", ")}) in table '${tableDesc.identifier}'") + } +} + +// Verify that the provided columns are part of the schema +requireSubsetOfSchema(tableDesc.partitionColumnNames, "partition") + requireSubsetOfSchema(tableDesc.bucketSpec.map(_.bucketColumnNames).getOrElse(Nil), "bucket") + requireSubsetOfSchema(tableDesc.bucketSpec.map(_.sortColumnNames).getOrElse(Nil), "sort") + } + + private def checkDuplicatedColumnNames( + colNames: Seq[String], + colNamesSet: Set[String], + colType: String): Unit = { +if (colNamesSet.size != colNames.length) { + val duplicateColumns = colNames.groupBy(identity).collect { +case (x, ys) if ys.length > 1 => quoteIdentifier(x) + } --- End diff -- we should, but the previous code doesn't consider case sensitivity either, we can do it in follow-ups. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14482#discussion_r73457371 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -349,6 +384,27 @@ trait CheckAnalysis extends PredicateHelper { |${s.catalogTable.identifier} """.stripMargin) + case c @ CreateTable(tableDesc, mode, query) if c.resolved => +// Since we are saving table metadata to metastore, we should make sure the table name +// and database name don't break some common restrictions, e.g. special chars except +// underscore are not allowed. +val pattern = Pattern.compile("[\\w_]+") +if (!pattern.matcher(tableDesc.identifier.table).matches()) { + failAnalysis(s"Table name ${tableDesc.identifier.table} is not a valid name for " + +s"metastore, it only accepts table name containing characters, numbers and _.") +} +if (tableDesc.identifier.database.isDefined && + !pattern.matcher(tableDesc.identifier.database.get).matches()) { + failAnalysis(s"Database name ${tableDesc.identifier.table} is not a valid name for " + --- End diff -- `${tableDesc.identifier.table}` -> `${tableDesc.identifier.database.get}` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...
Github user sylvinus commented on the issue: https://github.com/apache/spark/pull/14488 rebase done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14491 OK, doesn't need a JIRA. Looks like there are similar occurrences in the Python code, and streaming docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13738: [SPARK-11227][CORE] UnknownHostException can be thrown w...
Github user soldiershen commented on the issue: https://github.com/apache/spark/pull/13738 @sarutak got it. I add hdfs conf file to specific the host .Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14156 Is there reasonable evidence this speeds things up? just want to make sure this does not make it slower. Help me understand the := operator? I don't recognize how it's helping compute y as a function of x here. I assume the method below can't use the same mechanism? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14488 Description looks good. You can use `git rebase -i HEAD~4` or similar to `drop` the extra commit here. Pending that and tests passing, looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14492 **[Test build #63203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63203/consoleFull)** for PR 14492 at commit [`ffc0e4a`](https://github.com/apache/spark/commit/ffc0e4a363968fa62a592f96e37669ca1bcbf099). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/14492 [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPATH ## What changes were proposed in this pull request? To deploy Spark, it can be pretty convenient to put all jars (spark jars, hadoop jars, and other libs' jars) that we want to include in the classpath of Spark in the same dir, which may not be spark's assembly dir. So, I am proposing to also add SPARK_DIST_CLASSPATH to the LAUNCH_CLASSPATH. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark SPARK-16887 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14492.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14492 commit ffc0e4a363968fa62a592f96e37669ca1bcbf099 Author: Yin HuaiDate: 2016-08-04T02:57:47Z [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPATH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14490 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14472 **[Test build #3201 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3201/consoleFull)** for PR 14472 at commit [`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14491 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63197/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14472 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14472 **[Test build #63197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63197/consoleFull)** for PR 14472 at commit [`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14491: [SPARK-16886] [EXAMPLES][SQL] structured streamin...
GitHub user ganeshchand opened a pull request: https://github.com/apache/spark/pull/14491 [SPARK-16886] [EXAMPLES][SQL] structured streaming network word count examples ⦠## What changes were proposed in this pull request? Fixed a minor code comment typo by replacing DataFrame with Dataset ## How was this patch tested? Run Locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/ganeshchand/spark SPARK-16886 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14491.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14491 commit 8751e08b18b8f8a4467cf15f88076c0f93294fc2 Author: Ganesh ChandDate: 2016-08-04T02:29:40Z [SPARK-16886] [SQL] structured streaming network word count examples code comments Replaced Dataframe with Dataset in code comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14490 **[Test build #63201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63201/consoleFull)** for PR 14490 at commit [`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #63202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63202/consoleFull)** for PR 14452 at commit [`7fe57a0`](https://github.com/apache/spark/commit/7fe57a0666f5d5f489d5b09a6cc20f784611dcf8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/12913#discussion_r73454617 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -399,6 +399,14 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(!ser2.getAutoReset) } + private def testBothUnsafeAndSafe(f: SparkConf => Unit): Unit = { --- End diff -- Yes will update the pr today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14490 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14490 [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override) ## What changes were proposed in this pull request? This PR adds both rules for preventing to use `@Deprecated` and `@Override`. - Java's `@Override` It seems Scala compiler just ignores this. This can be problematic when traits or abstract classes as inherited. Scala compiles seems `override` modifier is only mandatory for " that override some other **concrete member definition** in a parent class" but not for for **incomplete member definition** (such as ones from trait or abstract), see (http://www.scala-lang.org/files/archive/spec/2.11/05-classes-and-objects.html#override) For a simple example, - Normal class - needs `override` modifier ```bash scala> class A { def say = {}} defined class A scala> class B extends A { def say = {}} :8: error: overriding method say in class A of type => Unit; method say needs `override' modifier class B extends A { def say = {}} ^ ``` - Trait - does not need `override` modifier ```bash scala> trait A { def say } defined trait A scala> class B extends A { def say = {}} defined class B ``` To cut this short, in the latter case, we can write `@Override` annotation (meaning nothing) which might confuse engineers that Java's annotation is working fine. It might be great if we prevent those potential confusion. - Java's `@Deprecated` When `@Deprecated` is used, it seems Scala compiler recognises this correctly but it seems we use Scala one `@deprecated` across codebase. ## How was this patch tested? Manually tested, by inserting both `@Override` and `@Deprecated`. This will shows the error messages as below: ```bash Scalastyle checks failed at following occurrences: [error] ... : @deprecated should be used instead of @java.lang. ``` ```basg Scalastyle checks failed at following occurrences: [error] ... : override modifier should be used instead of @java.lang.Override. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16877 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14490.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14490 commit 79eaef7f55779e949d7e8dc0b4e4749d76f99c9f Author: hyukjinkwonDate: 2016-08-04T02:02:25Z Add rules for preventing to use Java annotations (Deprecated and Override) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...
Github user sylvinus commented on the issue: https://github.com/apache/spark/pull/14488 @rxin is that better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14279: [SPARK-16216][SQL] Write Timestamp and Date in ISO 8601 ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14279 Yes, as shown in https://github.com/apache/spark/pull/14279#issuecomment-236469454 (but we should still manually give the schema as inferring `DateType` and `TimestampType` is not supported in JSON and `DateType` is not inferred in `CSV` ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 cc @mengxr @yanboliang Was this patch Okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14479: [SPARK-16873] [Core] Fix SpillReader NPE when spi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14479 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14479: [SPARK-16873] [Core] Fix SpillReader NPE when spillFile ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14479 LGTM - merging in master/2.0/1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org