[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110533216 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -43,17 +43,8 @@ case class LogicalRelation( com.google.common.base.Objects.hashCode(relation, output) } - override def sameResult(otherPlan: LogicalPlan): Boolean = { -otherPlan.canonicalized match { - case LogicalRelation(otherRelation, _, _) => relation == otherRelation - case _ => false -} - } - - // When comparing two LogicalRelations from within LogicalPlan.sameResult, we only need - // LogicalRelation.cleanArgs to return Seq(relation), since expectedOutputAttribute's - // expId can be different but the relation is still the same. - override lazy val cleanArgs: Seq[Any] = Seq(relation) + // Only care about relation when canonicalizing. + override def preCanonicalized: LogicalPlan = copy(catalogTable = None) --- End diff -- The builders of external data sources need to implement `equals` and `hashCode` if they want to utilize our cache management. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75628/testReport)** for PR 17569 at commit [`10cf4be`](https://github.com/apache/spark/commit/10cf4be41d1de37115edc140e1421caf5b23336a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17568 @cloud-fan how about this check for 2.? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110530705 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -225,25 +225,26 @@ case class Invoke( getFuncResult(ev.value, s"${obj.value}.$functionName($argString)") } else { val funcResult = ctx.freshName("funcResult") + // If the function can return null, we do an extra check to make sure our null bit is still + // set correctly. + val postNullCheckAndAssign = if (!returnNullable) { --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110530587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -225,25 +225,26 @@ case class Invoke( getFuncResult(ev.value, s"${obj.value}.$functionName($argString)") } else { val funcResult = ctx.freshName("funcResult") + // If the function can return null, we do an extra check to make sure our null bit is still + // set correctly. + val postNullCheckAndAssign = if (!returnNullable) { --- End diff -- how about just `assignResult`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110530518 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -262,17 +264,18 @@ object RowEncoder { input :: Nil) case _: DecimalType => - Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal])) + Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal]), +returnNullable = false) case StringType => - Invoke(input, "toString", ObjectType(classOf[String])) + Invoke(input, "toString", ObjectType(classOf[String]), returnNullable = false) --- End diff -- ok let's keep the default value unchanged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110530462 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT override protected def innerChildren: Seq[QueryPlan[_]] = subqueries /** - * Canonicalized copy of this query plan. + * Returns a plan where a best effort attempt has been made to transform `this` in a way + * that preserves the result but removes cosmetic variations (case sensitivity, ordering for + * commutative operations, expression id, etc.) + * + * Plans where `this.canonicalized == other.canonicalized` will always evaluate to the same + * result. + * + * Some nodes should overwrite this to provide proper canonicalize logic. + */ + lazy val canonicalized: PlanType = { +val canonicalizedChildren = children.map(_.canonicalized) +var id = -1 +preCanonicalized.mapExpressions { + case a: Alias => +id += 1 +// As the root of the expression, Alias will always take an arbitrary exprId, we need to +// normalize that for equality testing, by assigning expr id from 0 incrementally. The +// alias name doesn't matter and should be erased. +Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, isGenerated = a.isGenerated) + + case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == -1 => +// Top level `AttributeReference` may also be used for output like `Alias`, we should +// normalize the epxrId too. +id += 1 +ar.withExprId(ExprId(id)) + + case other => normalizeExprId(other) +}.withNewChildren(canonicalizedChildren) + } + + /** + * Do some simple transformation on this plan before canonicalizing. Implementations can override + * this method to provide customer canonicalize logic without rewriting the whole logic. */ - protected lazy val canonicalized: PlanType = this + protected def preCanonicalized: PlanType = this + + /** + * Normalize the exprIds in the given expression, by updating the exprId in `AttributeReference` + * with its referenced ordinal from input attributes. It's similar to `BindReferences` but we + * do not use `BindReferences` here as the plan may take the expression as a parameter with type + * `Attribute`, and replace it with `BoundReference` will cause error. + */ + protected def normalizeExprId[T <: Expression](e: T, input: AttributeSeq = allAttributes): T = { +e.transformUp { + case ar: AttributeReference => +val ordinal = input.indexOf(ar.exprId) +if (ordinal == -1) { + ar --- End diff -- no, actually this is unexpected, the attribute should either reference to input attributes, or represent new output at top level. Keep it unchanged so that the equality check will fail later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17572: [SPARK-20260][MLLib] String interpolation required for e...
Github user vijaykramesh commented on the issue: https://github.com/apache/spark/pull/17572 @srowen fixed it in some more places. it seems like everywhere else that regexp matches we actually want the $ outputted. do you want me to squash commits as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17576: Update Dataset to camel case (DataSet) to match D...
Github user kevinmcinerney closed the pull request at: https://github.com/apache/spark/pull/17576 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...
Github user ioana-delaney commented on the issue: https://github.com/apache/spark/pull/17546 @cloud-fan Do you have any comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Yo Sean, I already pushed the requested changes in case it's the correct place to do so. (I can just revert them, if not) I added two new methods to allow tests. First a method which finds all frequent items in a database, second a method that actually clean the database using those frequent items. Although I didn't end up using the first method, the pre-processing method is now much clearer to understand. So I left the new method. Just tell me if I need to put that piece of code back. I also added tests for multiple types of sequence database. More specifically, when there is max one item per itemset, when there can be multiple items per itemsets, and when cleaning the database empties it. They should cover all cases together. Of course, the new implementation passes the tests perfectly, and the old one doesn't. Every other thing remained as is. Tell me if the way I did it was ok. I hope it's up to standards :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user reggert commented on the issue: https://github.com/apache/spark/pull/15899 Strictly speaking, this doesn't just affect pair RDDs. It affects any RDDs on which a `for` expression involving a filter operation, which includes explicit `if` clauses as well as pattern matches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110524716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -267,7 +265,7 @@ case class FileSourceScanExec( val metadata = Map( "Format" -> relation.fileFormat.toString, -"ReadSchema" -> outputSchema.catalogString, +"requiredSchema" -> requiredSchema.catalogString, --- End diff -- This is also for display in `SparkPlanInfo`? Keep the original name `ReadSchema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r110523942 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -180,9 +180,13 @@ class Dataset[T] private[sql]( // to happen right away to let these side effects take place eagerly. queryExecution.analyzed match { case c: Command => -LocalRelation(c.output, queryExecution.executedPlan.executeCollect()) --- End diff -- Yeah, the check I added to ensure we get the same results in the SQL tab has [several hundred failures](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75579/testReport/) that go through this. Looks like the path is almost always `spark.sql` when the SQL statement is a command like CTAS. I like your version and will update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 Thanks for the review! I'll get the thrift-server tests fixed up next week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110523770 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT override protected def innerChildren: Seq[QueryPlan[_]] = subqueries /** - * Canonicalized copy of this query plan. + * Returns a plan where a best effort attempt has been made to transform `this` in a way + * that preserves the result but removes cosmetic variations (case sensitivity, ordering for + * commutative operations, expression id, etc.) + * + * Plans where `this.canonicalized == other.canonicalized` will always evaluate to the same + * result. + * + * Some nodes should overwrite this to provide proper canonicalize logic. + */ + lazy val canonicalized: PlanType = { +val canonicalizedChildren = children.map(_.canonicalized) +var id = -1 +preCanonicalized.mapExpressions { + case a: Alias => +id += 1 +// As the root of the expression, Alias will always take an arbitrary exprId, we need to +// normalize that for equality testing, by assigning expr id from 0 incrementally. The +// alias name doesn't matter and should be erased. +Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, isGenerated = a.isGenerated) + + case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == -1 => +// Top level `AttributeReference` may also be used for output like `Alias`, we should +// normalize the epxrId too. +id += 1 +ar.withExprId(ExprId(id)) + + case other => normalizeExprId(other) +}.withNewChildren(canonicalizedChildren) + } + + /** + * Do some simple transformation on this plan before canonicalizing. Implementations can override + * this method to provide customer canonicalize logic without rewriting the whole logic. */ - protected lazy val canonicalized: PlanType = this + protected def preCanonicalized: PlanType = this + + /** + * Normalize the exprIds in the given expression, by updating the exprId in `AttributeReference` + * with its referenced ordinal from input attributes. It's similar to `BindReferences` but we + * do not use `BindReferences` here as the plan may take the expression as a parameter with type + * `Attribute`, and replace it with `BoundReference` will cause error. + */ + protected def normalizeExprId[T <: Expression](e: T, input: AttributeSeq = allAttributes): T = { +e.transformUp { + case ar: AttributeReference => +val ordinal = input.indexOf(ar.exprId) +if (ordinal == -1) { + ar --- End diff -- No need to normalize exprIds in this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17576: Update Dataset to camel case (DataSet) to match DataFram...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17576 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17576: Update Dataset to camel case (DataSet) to match D...
GitHub user kevinmcinerney opened a pull request: https://github.com/apache/spark/pull/17576 Update Dataset to camel case (DataSet) to match DataFrames Shouldn't Datasets and DataFrames both be camel case for the ocd ppl out there? ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinmcinerney/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17576.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17576 commit 2e00ad22b1b57bb87914ec16582c033f84cf4a17 Author: Kevin Mc InerneyDate: 2017-04-08T17:54:19Z Update Dataset to camel case (DataSet) to match DataFrames Shouldn't Datasets and DataFrames both be camel case for the ocd ppl out there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16820#discussion_r110523513 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -448,6 +448,22 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { rand(Random.nextLong()), randn(Random.nextLong()) ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + + private def assertNoExceptions(c: Column): Unit = { +for (wholeStage <- Seq(true, false)) { + withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString)) { +spark.range(0, 5).toDF("a").agg(sum("a")).withColumn("v", c).collect() --- End diff -- I found almost all the physical plans of join have the exactly same issue. I will try to submit the fix for the joins one by one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15567: [SPARK-14393][SQL] values generated by non-determ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15567#discussion_r110523451 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -205,10 +206,11 @@ case class FilterExec(condition: Expression, child: SparkPlan) protected override def doExecute(): RDD[InternalRow] = { val numOutputRows = longMetric("numOutputRows") -child.execute().mapPartitionsInternal { iter => +child.execute().mapPartitionsWithIndexInternal { (index, iter) => val predicate = newPredicate(condition, child.output) + predicate.initialize(0) --- End diff -- Just wondering why `FilterExec` is not using `index` to initialize the conditions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110523418 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT override protected def innerChildren: Seq[QueryPlan[_]] = subqueries /** - * Canonicalized copy of this query plan. + * Returns a plan where a best effort attempt has been made to transform `this` in a way + * that preserves the result but removes cosmetic variations (case sensitivity, ordering for + * commutative operations, expression id, etc.) + * + * Plans where `this.canonicalized == other.canonicalized` will always evaluate to the same + * result. + * + * Some nodes should overwrite this to provide proper canonicalize logic. + */ + lazy val canonicalized: PlanType = { +val canonicalizedChildren = children.map(_.canonicalized) +var id = -1 +preCanonicalized.mapExpressions { + case a: Alias => +id += 1 +// As the root of the expression, Alias will always take an arbitrary exprId, we need to +// normalize that for equality testing, by assigning expr id from 0 incrementally. The +// alias name doesn't matter and should be erased. +Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, isGenerated = a.isGenerated) + + case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == -1 => +// Top level `AttributeReference` may also be used for output like `Alias`, we should +// normalize the epxrId too. +id += 1 +ar.withExprId(ExprId(id)) + + case other => normalizeExprId(other) +}.withNewChildren(canonicalizedChildren) + } + + /** + * Do some simple transformation on this plan before canonicalizing. Implementations can override + * this method to provide customer canonicalize logic without rewriting the whole logic. --- End diff -- `customer` -> `customized` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17350 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75627/testReport)** for PR 17569 at commit [`510fb53`](https://github.com/apache/spark/commit/510fb530ebf3d9235206cefe8e428bf3f8689cfc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110522547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -54,14 +54,12 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr private def reorder(plan: LogicalPlan, output: Seq[Attribute]): LogicalPlan = { val (items, conditions) = extractInnerJoins(plan) -// TODO: Compute the set of star-joins and use them in the join enumeration -// algorithm to prune un-optimal plan choices. val result = // Do reordering if the number of items is appropriate and join conditions exist. // We also need to check if costs of all items can be evaluated. if (items.size > 2 && items.size <= conf.joinReorderDPThreshold && conditions.nonEmpty && items.forall(_.stats(conf).rowCount.isDefined)) { -JoinReorderDP.search(conf, items, conditions, output) +JoinReorderDP(conf).search(conf, items, conditions, output) --- End diff -- Reverted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Ok, should I create a new Jira and push there the additionnal tests ? Or is here completly fine, since it's related to the current change Tell me, and I will get the change done asap :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...
Github user zhaorongsheng commented on the issue: https://github.com/apache/spark/pull/17350 @gatorsmile Sorry for the late reply. I have checked all the functions' nullability setting and I didn't found any issue. Thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17575 Even a simplistic test of this case would give a lot more confidence that it's correct. If it means opening up a `private[spark]` method or two to make testing possible that seems reasonable. I don' think it needs significant change. Something needs to exercise this code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110520996 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -225,25 +225,26 @@ case class Invoke( getFuncResult(ev.value, s"${obj.value}.$functionName($argString)") } else { val funcResult = ctx.freshName("funcResult") + // If the function can return null, we do an extra check to make sure our null bit is still + // set correctly. + val postNullCheck = if (!returnNullable) { --- End diff -- sure, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17574 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17574 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #75626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75626/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110520685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -736,6 +736,12 @@ object SQLConf { .checkValue(weight => weight >= 0 && weight <= 1, "The weight value must be in [0, 1].") .createWithDefault(0.7) + val JOIN_REORDER_DP_STAR_FILTER = +buildConf("spark.sql.cbo.joinReorder.dp.star.filter") + .doc("Applies star-join filter heuristics to cost based join enumeration.") + .booleanConf + .createWithDefault(false) --- End diff -- Yea I also think we keep the default false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75627/testReport)** for PR 17569 at commit [`510fb53`](https://github.com/apache/spark/commit/510fb530ebf3d9235206cefe8e428bf3f8689cfc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Yes exactly, the current implementation adds too much unnecessary delimiters. We this one line change, delimiter are only placed where needed. Currently there are no tests to verify if the algorithm cleans the sequences correctly. I only found that inneficiency by printing stuff around while I implemented other things on my local github. If you want, I can add some tests, but that will necessitate a small refector to separate the cleaning part in it's own method. Calling the current method would directly call the main algorithm ... ^^' Two of the existing tests did cover cases where sequence of zero where left. However not at pertinent places (Integer/String type, variable-size itemsets clean a five at the end of the third sequence, leaving 2 zero instead of one). I can however vouch that the previous code worked just fine. Both the results of the old implementation and this one are the same. They also correspond to the results I obtained for another standalone CP based implementation. It's just that this code makes the pre-processing more efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110519122 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -423,8 +423,15 @@ case class CatalogRelation( Objects.hashCode(tableMeta.identifier, output) } - /** Only compare table identifier. */ --- End diff -- Actually we should compare more, e.g. if the table schema is altered, the new table relation should not be considered as same with the old table relation, even after canonicalization. Also, it's tricky to remove the output of a plan during canonicalization as the parenting plan may rely on the output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17541 cc @gatorsmile any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75625/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75625/testReport)** for PR 17569 at commit [`3080ac2`](https://github.com/apache/spark/commit/3080ac2230e2512d6de3f6aadfed0e31b3b7eed3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75624/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75624/testReport)** for PR 17568 at commit [`0679ebe`](https://github.com/apache/spark/commit/0679ebe17ed6c4619a7aef64fd41c2f21ffd3c7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 ping @cloud-fan Can you take a look of this? If you don't think this is appropriate direction, please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #75626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75626/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110517840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -225,25 +225,26 @@ case class Invoke( getFuncResult(ev.value, s"${obj.value}.$functionName($argString)") } else { val funcResult = ctx.freshName("funcResult") + // If the function can return null, we do an extra check to make sure our null bit is still + // set correctly. + val postNullCheck = if (!returnNullable) { --- End diff -- nit: rename `postNullCheck`. It is actually not only null check but also assigning the function result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17574 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17574 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17574 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75623/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #75623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75623/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r110517523 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou new String(nonCircularBuffer, StandardCharsets.UTF_8) } } + + +/** + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols. + */ +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory { + private var hdfsHandler : URLStreamHandler = _ + + def createURLStreamHandler(protocol: String): URLStreamHandler = { +if (protocol.compareToIgnoreCase("hdfs") == 0) { --- End diff -- Sorry, missed this. There's nothing explicit in 2.8+ right now; don't hold your breath. If people do want to co-dev one, be happy to help. There's no point me implementing something which isn't useful/going to be used by downstream projects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17467 Just for info, while trying to use the `sc` in the `KinesisBackedBlockRDD ` : `- Basic reading from Kinesis *** FAILED *** org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2084) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) ... Cause: java.io.NotSerializableException: org.apache.spark.SparkContext Serialization stack: - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@60c1663c) - field (class: org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD, name: org$apache$spark$streaming$kinesis$KinesisBackedBlockRDD$$sc, type: class org.apache.spark.SparkContext) - object (class org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD, KinesisBackedBlockRDD[0] at BlockRDD at KinesisBackedBlockRDD.scala:90) - field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class org.apache.spark.rdd.RDD) - object (class org.apache.spark.OneToOneDependency, org.apache.spark.OneToOneDependency@52a33c3f) - writeObject data (class: scala.collection.immutable.List$SerializationProxy) - object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@71ed560f) - writeReplace data (class: scala.collection.immutable.List$SerializationProxy) - object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@52a33c3f)) - field (class: org.apache.spark.rdd.RDD, name: org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq) - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[1] at map at KinesisBackedBlockRDDSuite.scala:83) - field (class: org.apache.spark.rdd.RDD$$anonfun$collect$1, name: $outer, type: class org.apache.spark.rdd.RDD) - object (class org.apache.spark.rdd.RDD$$anonfun$collect$1, ) - field (class: org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13, name: $outer, type: class org.apache.spark.rdd.RDD$$anonfun$collect$1) - object (class org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2084) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17506 Is there anything else that can be done on this patch. The patch fixes all the deprecated api testcases that try to use the aws secret/id credentials instead of the builder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17467 @brkyvz - thanks for taking time to review the patch. appreciate it. Implemented all your suggestions. Now passing a new map for the kinesis configs and added mechanism to use the builder for the configs. As for the spark context, I wanted to use the sparkcontext available in `KinesisBackedBlockRDD` directly as well (instead of creating a new config map), but the sc in `KinesisBackedBlockRDD` is not available, and trying to use it there causes serialization errors. Passing a different config map looked like the only simple solution to access the kineses configs. The patch now doesnot use the `sc` at all and expects a kinesisConf to be passed to the `KinesisInputDStream` builder directly. Let me know your thoughts. Thanks again for the review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75625/testReport)** for PR 17569 at commit [`3080ac2`](https://github.com/apache/spark/commit/3080ac2230e2512d6de3f6aadfed0e31b3b7eed3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110516074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -262,17 +264,18 @@ object RowEncoder { input :: Nil) case _: DecimalType => - Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal])) + Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal]), +returnNullable = false) case StringType => - Invoke(input, "toString", ObjectType(classOf[String])) + Invoke(input, "toString", ObjectType(classOf[String]), returnNullable = false) --- End diff -- Here is statistics for 59 call sites of `Invoke()`. 18: `dataType` is primitive type 21: `returnNullable` is true (no specification at call site) 19: `returnNullable` is false 1: set a variable to `returnNullable What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75620/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17541 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17541 **[Test build #75620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75620/testReport)** for PR 17541 at commit [`9305187`](https://github.com/apache/spark/commit/930518759489f64d96e439715872353e64d681a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75624/testReport)** for PR 17568 at commit [`0679ebe`](https://github.com/apache/spark/commit/0679ebe17ed6c4619a7aef64fd41c2f21ffd3c7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17574 Do we need it as a normal dependency? Looks like sql/core doesn't use it and the building works without this dependency. Sorry if I am missing something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17575 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-proc...
GitHub user Syrux opened a pull request: https://github.com/apache/spark/pull/17575 [SPARK-20265][MLlib] Improve Prefix'span pre-processing efficiency ## What changes were proposed in this pull request? Improve PrefixSpan pre-processing efficency by preventing sequences of zero in the cleaned database. The efficiency gain is reflected in the following graph : https://postimg.org/image/9x6ireuvn/ ## How was this patch tested? Using MLlib's PrefixSpan existing tests and tests of my own on the 8 datasets shown in the graph. All result obtained were stricly the same as the original implementation (without this change). dev/run-tests was also runned, no error were found. Author : Cyril de VogelaereYou can merge this pull request into a Git repository by running: $ git pull https://github.com/Syrux/spark SPARK-20265 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17575 commit 7af4945fbfb309f7a7784cba2b1fc4cb4945fba0 Author: Syrux Date: 2017-04-08T10:17:04Z [SPARK-20265][MLlib] Improve Prefix'span pre-processing efficiency --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #75623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75623/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17574 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17569 Seems there are places (i.e., `RowEncoder`) calling `isNullAt` which gives `returnNullable` as true (default value). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user witgo commented on the issue: https://github.com/apache/spark/pull/17567 OK, I see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75621/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75621/testReport)** for PR 17569 at commit [`a39803a`](https://github.com/apache/spark/commit/a39803ab0f77124add833bebb3cb0353306aa1f2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17469 **[Test build #75622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/testReport)** for PR 17469 at commit [`bc03f3c`](https://github.com/apache/spark/commit/bc03f3c5799e749558696fef0723e592523fbcd9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75622/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17469 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 Great! I'll still follow up with Shane & Josh re: @felixcheung triggering build as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17469 Yes, it seems from your comment @holdenk. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17469 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17469 **[Test build #75622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/testReport)** for PR 17469 at commit [`bc03f3c`](https://github.com/apache/spark/commit/bc03f3c5799e749558696fef0723e592523fbcd9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17469 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 I've e-mailed them since the Jenkins configuration is a bit too involved (and I'd need Shane to sign off on any Jenkins change anyways). Sorry this is slowing down your PR @map222 and thank you so much for your patience with us :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #3648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3648/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75619/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75619/testReport)** for PR 17569 at commit [`fc6caac`](https://github.com/apache/spark/commit/fc6caacf5fca8cd89b1e324540761ae23f88d9d1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110513988 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -96,6 +96,16 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { checkDataset(dsBoolean.map(e => !e), false, true) } + test("mapPrimitiveArray") { --- End diff -- No, I have just added to confirm this check works well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110513970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -262,17 +264,18 @@ object RowEncoder { input :: Nil) case _: DecimalType => - Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal])) + Invoke(input, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal]), +returnNullable = false) case StringType => - Invoke(input, "toString", ObjectType(classOf[String])) + Invoke(input, "toString", ObjectType(classOf[String]), returnNullable = false) --- End diff -- can we check how many places we set `returnNullable` to true? If it's only a few, we can change the defaut value of `returnNullable` to false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110513952 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -96,6 +96,16 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { checkDataset(dsBoolean.map(e => !e), false, true) } + test("mapPrimitiveArray") { --- End diff -- do these tests fail before this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75621/testReport)** for PR 17569 at commit [`a39803a`](https://github.com/apache/spark/commit/a39803ab0f77124add833bebb3cb0353306aa1f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110513852 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -228,17 +228,13 @@ case class Invoke( s""" Object $funcResult = null; ${getFuncResult(funcResult, s"${obj.value}.$functionName($argString)")} -if ($funcResult == null) { - ${ev.isNull} = true; -} else { - ${ev.value} = (${ctx.boxedType(javaType)}) $funcResult; -} +${ev.value} = (${ctx.boxedType(javaType)}) $funcResult; """ } // If the function can return null, we do an extra check to make sure our null bit is still set // correctly. -val postNullCheck = if (ctx.defaultValue(dataType) == "null") { +val postNullCheck = if (ctx.defaultValue(dataType) == "null" && returnNullable) { --- End diff -- Yes, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17540 LGTM, @rdblue the failed tests are thrift server tests, which are hard to debug. You can run hive tests locally and see what failed.(usually failed thrift server tests means we have failed hive tests) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110513816 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -356,7 +361,8 @@ object ScalaReflection extends ScalaReflection { udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(), Nil, dataType = ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt())) -Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) +Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil, --- End diff -- I see. It is UDT. I have checked `deserialized` only in Spark runtime. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r110513641 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -180,9 +180,13 @@ class Dataset[T] private[sql]( // to happen right away to let these side effects take place eagerly. queryExecution.analyzed match { case c: Command => -LocalRelation(c.output, queryExecution.executedPlan.executeCollect()) --- End diff -- Actually do we need to do this? most `Command`s are just local operations(talking with metastore). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r110513606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -180,9 +180,13 @@ class Dataset[T] private[sql]( // to happen right away to let these side effects take place eagerly. queryExecution.analyzed match { case c: Command => -LocalRelation(c.output, queryExecution.executedPlan.executeCollect()) --- End diff -- how about `LocalRelation(c.output, withAction("collect")(_. executeCollect()))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17540 The `withNewExecutionId` was added at https://github.com/rxin/spark/commit/1b0317f64cfe99ff70580eeb99753cd0d31f849a#diff-89b9796aae086e790ddd9351f0db8115R134 . The execution id is used to track all jobs that belong to the same query, so I think it makes sense to call `withExecutionId` at action methods like `Dataset#collect` or `DataFrameWriter#insertInto` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 No your correct. The tooling around Jenkins hasn't had enough love as of late since there are plans to replace a lot of it, so newer committers aren't always added everywhere they need to be. I've got some access I can look and see if I can fix it, but if I don't see where we'll have to wait for Josh or Shane (who have been very helpful) to update the config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org