[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...
Github user azmras commented on a diff in the pull request: https://github.com/apache/spark/pull/16429#discussion_r94367710 --- Diff: python/pyspark/serializers.py --- @@ -382,18 +382,30 @@ def _hijack_namedtuple(): return global _old_namedtuple # or it will put in closure +global _old_namedtuple_kwdefaults # or it will put in closure too def _copy_func(f): return types.FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__) +def _kwdefaults(f): +kargs = getattr(f, "__kwdefaults__", None) --- End diff -- after applying patch can you try to run sc.parallelize(range(100), 8) and confirm that it is working, because it is not... and serialisation of objects goes crazy.. Thanks for your efforts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16233 **[Test build #70806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70806/testReport)** for PR 16233 at commit [`19bc8eb`](https://github.com/apache/spark/commit/19bc8ebf27a54bf260e92dd3dd7114ded19cacfb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94367053 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,39 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is resolved with Hive support, else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is resolved + * with Hive support, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], +child: Option[LogicalPlan] = None) extends LogicalPlan with MultiInstanceRelation { --- End diff -- When Hive support is not provided, we don't parse the plan from the `CatalogTable.viewText`, so the child will be None. Do you have any suggestions on how should we update the param comment to make it more clear? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94366859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,39 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is resolved with Hive support, else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is resolved + * with Hive support, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], --- End diff -- It may looks a little over-engineering for now, but that enables us to decouple planning of query from the planning of the view, which allows us to cache resolved views in the future. So perhaps we'd better keep this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94366640 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- We use the param `defaultDatabase` to look up the view with an empty database part. After we have added the `AnalysisContext`, I think the param can be removed and we always get the default database from `AnalysisContext.get.defaultDatabase`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 ping @srowen @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #70805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70805/testReport)** for PR 12135 at commit [`1b2df22`](https://github.com/apache/spark/commit/1b2df228050857bc404892aa8aeeb997062795a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16371 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15880 Just for your reference, below is the conversion charts of MS SQL Server. It includes both implicit and explicit conversion rules. ![screenshot 2017-01-02 23 18 56](https://cloud.githubusercontent.com/assets/11567269/21601706/e822a07c-d141-11e6-8dcc-6328835a77dd.png) Source: https://msdn.microsoft.com/en-us/library/ms191530.aspx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94365473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,39 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is resolved with Hive support, else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is resolved + * with Hive support, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], +child: Option[LogicalPlan] = None) extends LogicalPlan with MultiInstanceRelation { --- End diff -- When will the child be `None`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94365429 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -377,6 +378,39 @@ case class InsertIntoTable( override lazy val resolved: Boolean = childrenResolved && table.resolved } +/** Factory for constructing new `View` nodes. */ +object View { + def apply(desc: CatalogTable): View = View(desc, desc.schema.toAttributes, None) +} + +/** + * A container for holding the view description(CatalogTable), and the output of the view. The + * child will be defined if the view is resolved with Hive support, else it should be None. + * This operator will be removed at the end of analysis stage. + * + * @param desc A view description(CatalogTable) that provides necessary information to resolve the + * view. + * @param output The output of a view operator, this is generated during planning the view, so that + * we are able to decouple the output from the underlying structure. + * @param child The logical plan of a view operator, it should be non-empty if the view is resolved + * with Hive support, else it should be None. + */ +case class View( +desc: CatalogTable, +output: Seq[Attribute], --- End diff -- Why can't we just use `def output = child.output`? If we wanna reorder the columns according to the original view schema, we can wrap the child with `Project`, e.g. ``` // The relation is a view, so we wrap the relation by: // 1. Add a [[View]] operator over the relation to keep track of the view desc; // 2. Wrap the logical plan in a [[SubqueryAlias]] which tracks the name of the view. val viewPlan = sparkSession.sessionState.sqlParser.parsePlan(viewText) val child = View(desc = table, child = Some(Project(schema.map(f => UnresolveAttribute(Seq(f.name, viewPlan)) SubqueryAlias(alias.getOrElse(table.identifier.table), child, Option(table.identifier)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15240 **[Test build #70804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70804/testReport)** for PR 15240 at commit [`cdab885`](https://github.com/apache/spark/commit/cdab8854466fe816663b4fa1a981e0654c526658). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70803/testReport)** for PR 15324 at commit [`df29d10`](https://github.com/apache/spark/commit/df29d10e61afe2f5e43346679fe30041b9e46a8f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70799/testReport)** for PR 15324 at commit [`6e2d066`](https://github.com/apache/spark/commit/6e2d06624c2bf6c46b5bc319836a35b488b4b3e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70799/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70802/testReport)** for PR 15324 at commit [`8e0de62`](https://github.com/apache/spark/commit/8e0de623a9fc8a19f0704e7127cd4bc4573d1f59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70801/testReport)** for PR 16371 at commit [`15a10ee`](https://github.com/apache/spark/commit/15a10eebe272428841772a58d06f2e889d70b75c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94364696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,94 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { --- End diff -- Where do we use the `defaultDatabase` parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16233 **[Test build #70800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70800/testReport)** for PR 16233 at commit [`de4a80e`](https://github.com/apache/spark/commit/de4a80e5cd726b8b93c6cc8ac29bb8ec4504b370). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/15730 @WeichenXu123 Thanks! Will take a look once I get back from vacation (in a week). Happy new year! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94363919 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -88,19 +92,19 @@ abstract class Collect extends ImperativeAggregate { case class CollectList( child: Expression, mutableAggBufferOffset: Int = 0, -inputAggBufferOffset: Int = 0) extends Collect { +inputAggBufferOffset: Int = 0) extends Collect[ArrayBuffer[Any]] { def this(child: Expression) = this(child, 0, 0) - override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate = + override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): CollectList = copy(mutableAggBufferOffset = newMutableAggBufferOffset) override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): ImperativeAggregate = copy(inputAggBufferOffset = newInputAggBufferOffset) - override def prettyName: String = "collect_list" + override def createAggregationBuffer(): ArrayBuffer[Any] = new ArrayBuffer[Any]() - override protected[this] val buffer: mutable.ArrayBuffer[Any] = mutable.ArrayBuffer.empty --- End diff -- `mutable.ArrayBuffer.empty` looks better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94363879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -88,19 +92,19 @@ abstract class Collect extends ImperativeAggregate { case class CollectList( child: Expression, mutableAggBufferOffset: Int = 0, -inputAggBufferOffset: Int = 0) extends Collect { +inputAggBufferOffset: Int = 0) extends Collect[ArrayBuffer[Any]] { def this(child: Expression) = this(child, 0, 0) - override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate = + override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): CollectList = --- End diff -- unnecessary change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94363844 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -16,16 +16,16 @@ */ package org.apache.spark.sql.catalyst.optimizer -import org.apache.spark.sql.catalyst.SimpleCatalystConf +import org.apache.spark.sql.catalyst.{InternalRow, SimpleCatalystConf} import org.apache.spark.sql.catalyst.analysis.{Analyzer, EmptyFunctionRegistry} import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ -import org.apache.spark.sql.catalyst.expressions.{If, Literal} -import org.apache.spark.sql.catalyst.expressions.aggregate.{CollectSet, Count} +import org.apache.spark.sql.catalyst.expressions.{Expression, If, Literal} +import org.apache.spark.sql.catalyst.expressions.aggregate.{CollectSet, Count, ImperativeAggregate, TypedImperativeAggregate} import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, LocalRelation, LogicalPlan} -import org.apache.spark.sql.types.{IntegerType, StringType} +import org.apache.spark.sql.types.{DataType, IntegerType, StringType} --- End diff -- please revert unnecessary changes in `import` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16320 The test case coverage in the suite `CSVInferSchemaSuite.scala` looks random. I am afraid the future code changes could easily break the existing type inference rules. Could you improve it in a separate PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70799/testReport)** for PR 15324 at commit [`6e2d066`](https://github.com/apache/spark/commit/6e2d06624c2bf6c46b5bc319836a35b488b4b3e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16320 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70795/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16320 **[Test build #70795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70795/testReport)** for PR 16320 at commit [`393d3a9`](https://github.com/apache/spark/commit/393d3a9ceaa6d92a301b5a2917e28d29518c1638). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15730 @brkyvz I update code and attach a running result screenshot, waiting for your review, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16320 **[Test build #70798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70798/testReport)** for PR 16320 at commit [`e59631b`](https://github.com/apache/spark/commit/e59631bd54872a03eaa63cc74d0e245300bbc781). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16320 Yep. I added the testcase, too. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...
Github user merlintang commented on a diff in the pull request: https://github.com/apache/spark/pull/15819#discussion_r94361979 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging { "as 'COMPACT' WITH DEFERRED REBUILD") client.reset() } + +test(s"$version: CREATE TABLE AS SELECT") { + withTable("tbl") { +sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") +assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1))) + } +} + +test(s"$version: Delete the temporary staging directory and files after each insert") { + withTempDir { tmpDir => +withTable("tab", "tbl") { + sqlContext.sql( +s""" + |CREATE TABLE tab(c1 string) + |location '${tmpDir.toURI.toString}' + """.stripMargin) + + sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") --- End diff -- Sorry Xiao, since one of my best friend is Tao. :). Sorry. It is updated. Thanks again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16255#discussion_r94361717 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { case plan: Project if plan eq proj => plan.child case plan => plan transformExpressions { case a: Attribute if attrMap.contains(a) => attrMap(a) + case b: Alias if attrMap.exists(_._1.exprId == b.exprId) +&& b.child.isInstanceOf[NamedExpression] => b.child --- End diff -- How do you reason about this? why we treat `Alias` differently here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #70797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70797/testReport)** for PR 15880 at commit [`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16255#discussion_r94361709 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { case plan: Project if plan eq proj => plan.child case plan => plan transformExpressions { case a: Attribute if attrMap.contains(a) => attrMap(a) + case b: Alias if attrMap.exists(_._1.exprId == b.exprId) +&& b.child.isInstanceOf[NamedExpression] => b.child --- End diff -- How do you reason about this? why we treat `Alias` differently here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16448: [SPARK-19048] [SQL] Delete Partition Location when Dropp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16448 **[Test build #70796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70796/testReport)** for PR 16448 at commit [`5441f15`](https://github.com/apache/spark/commit/5441f15cc86f0f22dbe766d3bf553a5f8183dc2a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16320 I assumed this one. Right? ```scala val path = "/tmp/test1" Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", "1").toDF().coalesce(1).write.text(path) spark.read.option("inferSchema", true).csv(path).printSchema() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94361604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// Describe column are not supported yet. Return null and let the parser decide -// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null) { - null + if (ctx.partitionSpec != null) { +throw new ParseException("DESC TABLE COLUMN for a specific partition is not supported", ctx) + } else { +val columnName = ctx.describeColName.getText +if (columnName.contains(".")) { + throw new ParseException( +"DESC TABLE COLUMN for an inner column of a nested type is not supported", ctx) --- End diff -- Sure, you can try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15880 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16320 Please add the test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16337#discussion_r94360417 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out --- @@ -0,0 +1,176 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 10 + + +-- !query 0 +create temporary view t1 as select * from values + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') + as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + ("t2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("t1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'), + ("t1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null), + ("t2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("t1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("t1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'), + ("t1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'), + ("t1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'), + ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null) + as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view t3 as select * from values + ("t3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'), + ("t3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'), + ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'), + ("t3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'), + ("t3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'), + ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), + ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), + ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') + as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i) +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3
[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15819#discussion_r94360355 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging { "as 'COMPACT' WITH DEFERRED REBUILD") client.reset() } + +test(s"$version: CREATE TABLE AS SELECT") { + withTable("tbl") { +sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") +assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1))) + } +} + +test(s"$version: Delete the temporary staging directory and files after each insert") { + withTempDir { tmpDir => +withTable("tab", "tbl") { + sqlContext.sql( +s""" + |CREATE TABLE tab(c1 string) + |location '${tmpDir.toURI.toString}' + """.stripMargin) + + sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") --- End diff -- You just want one column. Then, you can do it by ```Scala Seq(Tuple1("a")).toDF("value").registerTempTable("tbl") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15819#discussion_r94360173 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging { "as 'COMPACT' WITH DEFERRED REBUILD") client.reset() } + +test(s"$version: CREATE TABLE AS SELECT") { + withTable("tbl") { +sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") +assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1))) + } +} + +test(s"$version: Delete the temporary staging directory and files after each insert") { + withTempDir { tmpDir => +withTable("tab", "tbl") { + sqlContext.sql( +s""" + |CREATE TABLE tab(c1 string) + |location '${tmpDir.toURI.toString}' + """.stripMargin) + + sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") --- End diff -- How about the following line? ```Scala Seq((1, "a")).toDF("key", "value").registerTempTable("tbl") ``` BTW, I am Xiao Li. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/16337#discussion_r94360158 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out --- @@ -0,0 +1,176 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 10 + + +-- !query 0 +create temporary view t1 as select * from values + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') + as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + ("t2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("t1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'), + ("t1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null), + ("t2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("t1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("t1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'), + ("t1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'), + ("t1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'), + ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null) + as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view t3 as select * from values + ("t3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'), + ("t3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'), + ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'), + ("t3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'), + ("t3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'), + ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), + ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), + ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') + as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i) +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3
[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/16337#discussion_r94359988 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/simple-in.sql.out --- @@ -0,0 +1,176 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 10 + + +-- !query 0 +create temporary view t1 as select * from values + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') --- End diff -- Sorry, I forgot to mention that I have made two changes in the data, we need to re-run the the db2 verification test. 1. I remove the "=" at "date("2014-05-0=4")" 2. I changed the the t1g/t2g/t3g from 26 to 2600(26E2) thanks for checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...
Github user merlintang commented on a diff in the pull request: https://github.com/apache/spark/pull/15819#discussion_r94359244 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging { "as 'COMPACT' WITH DEFERRED REBUILD") client.reset() } + +test(s"$version: CREATE TABLE AS SELECT") { + withTable("tbl") { +sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") +assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1))) + } +} + +test(s"$version: Delete the temporary staging directory and files after each insert") { + withTempDir { tmpDir => +withTable("tab", "tbl") { + sqlContext.sql( +s""" + |CREATE TABLE tab(c1 string) + |location '${tmpDir.toURI.toString}' + """.stripMargin) + + sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a") --- End diff -- thanks Tao, I have created a dataframe, then create registerTempTable as following. val df = sqlContext.createDataFrame((1 to 2).map(i => (i, "a"))).toDF("key", "value") df.select("value").repartition(1).registerTempTable("tbl") it can work, but it looks like fuzzy. what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16456: [SPARK-18994] clean up the local directories for applica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16456 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16456: [SPARK-18994] clean up the local directories for ...
GitHub user liujianhuiouc opened a pull request: https://github.com/apache/spark/pull/16456 [SPARK-18994] clean up the local directories for application in future by annother thread ## What changes were proposed in this pull request? clean up the directories of the app by asynchronous method in future block You can merge this pull request into a Git repository by running: $ git pull https://github.com/liujianhuiouc/spark-1 spark-18994 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16456.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16456 commit 0351f2c5b875ed2da6a17cdff4ac690cf145bb6b Author: liujianhuiDate: 2017-01-03T04:21:25Z [spark-18994] asyn to delete the app directories --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16401 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70794/testReport)** for PR 15324 at commit [`cd4e68e`](https://github.com/apache/spark/commit/cd4e68ebd541734b96aba5c8199e4dd4f4504918). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70794/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16452: [ML] fix getThresholds logic error
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/16452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16452 @sethah , thanks, I got it wrong. I will close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16320 Thank you again, @cloud-fan and @HyukjinKwon . I updated the fallback datatype. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16320 **[Test build #70795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70795/testReport)** for PR 16320 at commit [`393d3a9`](https://github.com/apache/spark/commit/393d3a9ceaa6d92a301b5a2917e28d29518c1638). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16320#discussion_r94358461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -85,7 +85,9 @@ private[csv] object CSVInferSchema { case NullType => tryParseInteger(field, options) case IntegerType => tryParseInteger(field, options) case LongType => tryParseLong(field, options) -case _: DecimalType => tryParseDecimal(field, options) +case _: DecimalType => + // DecimalTypes have different precisions and scales, so we try to find the common type. + findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(NullType) --- End diff -- You're correct. I'll change into `StringType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16320#discussion_r94358447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -85,7 +85,9 @@ private[csv] object CSVInferSchema { case NullType => tryParseInteger(field, options) case IntegerType => tryParseInteger(field, options) case LongType => tryParseLong(field, options) -case _: DecimalType => tryParseDecimal(field, options) +case _: DecimalType => + // DecimalTypes have different precisions and scales, so we try to find the common type. + findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(NullType) --- End diff -- Yes, otherwise, it might end up with an incorrect datatypes. For example, ```scala val path = "/tmp/test1" Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", "1").toDF().coalesce(1).write.text(path) spark.read.option("inferSchema", true).csv(path).printSchema() ``` ``` root |-- _c0: integer (nullable = true) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16454 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70793/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16320#discussion_r94358365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -85,7 +85,9 @@ private[csv] object CSVInferSchema { case NullType => tryParseInteger(field, options) case IntegerType => tryParseInteger(field, options) case LongType => tryParseLong(field, options) -case _: DecimalType => tryParseDecimal(field, options) +case _: DecimalType => + // DecimalTypes have different precisions and scales, so we try to find the common type. + findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(NullType) --- End diff -- Thank you for review, @cloud-fan . I used `NullType` since `mergeRowTypes` does. ```scala def mergeRowTypes(first: Array[DataType], second: Array[DataType]): Array[DataType] = { first.zipAll(second, NullType, NullType).map { case (a, b) => findTightestCommonType(a, b).getOrElse(NullType) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16454 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16454 **[Test build #70793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)** for PR 16454 at commit [`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16453 **[Test build #70792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70792/testReport)** for PR 16453 at commit [`4937b7d`](https://github.com/apache/spark/commit/4937b7dd731893ec4345a57db952cc8a35efd9b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16453 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70792/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16453 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16452 @mpjlu This is the behavior I get: scala scala> import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.classification.LogisticRegression scala> val lr = new LogisticRegression() lr: org.apache.spark.ml.classification.LogisticRegression = logreg_2465e281c48e scala> lr.getThresholds java.util.NoSuchElementException: Failed to find a default value for thresholds ... So, it throws an exception when nothing is set, as intended it seems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16438: [SPARK-19029] [SQL] Remove databaseName from Simp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16438 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r94358078 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator( */ override def next(): (BlockId, InputStream) = { numBlocksProcessed += 1 -val startFetchWait = System.currentTimeMillis() -currentResult = results.take() -val result = currentResult -val stopFetchWait = System.currentTimeMillis() -shuffleMetrics.incFetchWaitTime(stopFetchWait - startFetchWait) - -result match { - case SuccessFetchResult(_, address, size, buf, isNetworkReqDone) => -if (address != blockManager.blockManagerId) { - shuffleMetrics.incRemoteBytesRead(buf.size) - shuffleMetrics.incRemoteBlocksFetched(1) -} -bytesInFlight -= size -if (isNetworkReqDone) { - reqsInFlight -= 1 - logDebug("Number of requests in flight " + reqsInFlight) -} - case _ => -} -// Send fetch requests up to maxBytesInFlight -fetchUpToMaxBytes() -result match { - case FailureFetchResult(blockId, address, e) => -throwFetchFailedException(blockId, address, e) +var result: FetchResult = null +var input: InputStream = null +// Take the next fetched result and try to decompress it to detect data corruption, +// then fetch it one more time if it's corrupt, throw FailureFetchResult if the second fetch --- End diff -- @Tagar Spark doesn't use Netty's Snappy compression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16438: [SPARK-19029] [SQL] Remove databaseName from SimpleCatal...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16438 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94357987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// Describe column are not supported yet. Return null and let the parser decide -// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null) { - null + if (ctx.partitionSpec != null) { +throw new ParseException("DESC TABLE COLUMN for a specific partition is not supported", ctx) + } else { +val columnName = ctx.describeColName.getText +if (columnName.contains(".")) { + throw new ParseException( +"DESC TABLE COLUMN for an inner column of a nested type is not supported", ctx) --- End diff -- In this case, `formatted` becomes table identifier. Should I postpone detection of nested column to `run()` method of DescColumnCommand? Then the existence of table idenfifier will be checked first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16455: [MINOR][DOCS] Remove the duplicated word/ typo in Stream...
Github user neurons commented on the issue: https://github.com/apache/spark/pull/16455 @tdas could you accept this small PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16320#discussion_r94357821 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -85,7 +85,9 @@ private[csv] object CSVInferSchema { case NullType => tryParseInteger(field, options) case IntegerType => tryParseInteger(field, options) case LongType => tryParseLong(field, options) -case _: DecimalType => tryParseDecimal(field, options) +case _: DecimalType => + // DecimalTypes have different precisions and scales, so we try to find the common type. + findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(NullType) --- End diff -- Looks like the fallback policy here is to use `StringType`, shoud we follow? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16455: [MINOR][DOCS] Remove the duplicated word/ typo in Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16455 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 @cloud-fan In the current stage, we have Filter, Agg, Join, Project, etc. Although there are only four plans, the `if` code is still repeated. Moreover, in the future, when we have other kind of statistics, we can support more plans, e.g. union, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70794/testReport)** for PR 15324 at commit [`cd4e68e`](https://github.com/apache/spark/commit/cd4e68ebd541734b96aba5c8199e4dd4f4504918). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16455: [MINOR][DOCS] Remove the duplicated word/ typo in...
GitHub user neurons opened a pull request: https://github.com/apache/spark/pull/16455 [MINOR][DOCS] Remove the duplicated word/ typo in Streaming Docs ## What changes were proposed in this pull request? In the section **Handling Late Data and Watermarking** in Structured Streaming Programming Guide, the word received occurs twice in row. Fixed this typo. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/neurons/spark np.structure_streaming_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16455.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16455 commit 1ccbd79d18ff9c5914c1acda64dce7338a86670f Author: Niranjan PadmanabhanDate: 2017-01-03T03:41:21Z Remove the duplicated word --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16454 **[Test build #70793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)** for PR 16454 at commit [`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16119 The test failure is caused by another issue. I've submitted another PR to fix it: #16454. Once that is fixed, this test can be passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16454 [SPARK-19055][SQL][PySpark] Fix SparkSession initialization when SparkContext is stopped ## What changes were proposed in this pull request? In SparkSession initialization, we store created the instance of SparkSession into a class variable _instantiatedContext. Next time we can use SparkSession.builder.getOrCreate() to retrieve the existing SparkSession instance. However, when the active SparkContext is stopped and we create another new SparkContext to use, the existing SparkSession is still associated with the stopped SparkContext. So the operations with this existing SparkSession will be failed. We need to detect such case in SparkSession and renew the class variable _instantiatedContext if needed. ## How was this patch tested? New test added in PySpark. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix-pyspark-sparksession Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16454.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16454 commit 80bba5ead0601f3ef4b05fff5391d07a61e06341 Author: Liang-Chi HsiehDate: 2017-01-03T03:06:21Z Fix SparkSession initialization when previous SparkContext is stopped and new SparkContext is created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16448: [SPARK-19048] [SQL] Delete Partition Location when Dropp...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16448 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16448: [SPARK-19048] [SQL] Delete Partition Location whe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16448#discussion_r94357410 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -346,6 +346,46 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(new Path(partitionLocation) == defaultPartitionLocation) } + test("create/drop partitions in managed tables with location") { +val catalog = newBasicCatalog() +val table = CatalogTable( + identifier = TableIdentifier("tbl", Some("db1")), + tableType = CatalogTableType.MANAGED, + storage = CatalogStorageFormat(None, None, None, None, false, Map.empty), --- End diff -- nit: `CatalogStorageFormat.empty` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16404 Found a bug filed in a JIRA https://issues.apache.org/jira/browse/SPARK-19035. This PR does not resolves it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 > Then we need to modify all the existing implementation of statistics and do if(cboEnabled) test in each of them. That would be tedious. hm? I think we only need to do `if(cboEnabled)` for a few operators that will estimate statistics, e.g. Filter, Aggregate, Join, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16437: [SPARK-19028] [SQL] Fixed non-thread-safe functions used...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16437 @gatorsmile it conflicts with branch 2.0, please send a new PR, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16452 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16452 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70790/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16452 **[Test build #70790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70790/testReport)** for PR 16452 at commit [`eece313`](https://github.com/apache/spark/commit/eece313b9fe7048f2e9aa260d0e5f183529bac65). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16453 **[Test build #70792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70792/testReport)** for PR 16453 at commit [`4937b7d`](https://github.com/apache/spark/commit/4937b7dd731893ec4345a57db952cc8a35efd9b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16453: [SPARK-19054][ML] Eliminate extra pass in NB
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16453 [SPARK-19054][ML] Eliminate extra pass in NB ## What changes were proposed in this pull request? eliminate unnecessary extra pass in NB's train ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark nb_getNC Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16453.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16453 commit 4937b7dd731893ec4345a57db952cc8a35efd9b2 Author: Zheng RuiFengDate: 2017-01-03T02:52:53Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15829 Sure. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16452 If both threshold and thresholds are not set, the master will return thresholds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16452: [ML] fix getThresholds logic error
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16452 What is not right? Could you be more specific? The behavior for master branch seems to align with the comments, but maybe I'm missing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70791/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70791/testReport)** for PR 15324 at commit [`a59c558`](https://github.com/apache/spark/commit/a59c558625ad6f640a5d417c97770e55f4583e14). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...
Github user Tagar commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r94356127 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator( */ override def next(): (BlockId, InputStream) = { numBlocksProcessed += 1 -val startFetchWait = System.currentTimeMillis() -currentResult = results.take() -val result = currentResult -val stopFetchWait = System.currentTimeMillis() -shuffleMetrics.incFetchWaitTime(stopFetchWait - startFetchWait) - -result match { - case SuccessFetchResult(_, address, size, buf, isNetworkReqDone) => -if (address != blockManager.blockManagerId) { - shuffleMetrics.incRemoteBytesRead(buf.size) - shuffleMetrics.incRemoteBlocksFetched(1) -} -bytesInFlight -= size -if (isNetworkReqDone) { - reqsInFlight -= 1 - logDebug("Number of requests in flight " + reqsInFlight) -} - case _ => -} -// Send fetch requests up to maxBytesInFlight -fetchUpToMaxBytes() -result match { - case FailureFetchResult(blockId, address, e) => -throwFetchFailedException(blockId, address, e) +var result: FetchResult = null +var input: InputStream = null +// Take the next fetched result and try to decompress it to detect data corruption, +// then fetch it one more time if it's corrupt, throw FailureFetchResult if the second fetch --- End diff -- Is netty/shuffle data being compressed using Snappy algorithm by default? If so, might be good to idea to enable checksum checking at Netty level too? https://netty.io/4.0/api/io/netty/handler/codec/compression/SnappyFramedDecoder.html > Note that by default, validation of the checksum header in each chunk is DISABLED for performance improvements. If performance is less of an issue, or if you would prefer the safety that checksum validation brings, please use the SnappyFramedDecoder(boolean) constructor with the argument set to true. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #70791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70791/testReport)** for PR 15324 at commit [`a59c558`](https://github.com/apache/spark/commit/a59c558625ad6f640a5d417c97770e55f4583e14). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org