[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli closed the pull request at: https://github.com/apache/spark/pull/11008 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-191427912 @nongli Can you close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54522189 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,74 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") +s""" + | if ($needToSort) { + | $addToSorter(); + | Long $spillSizeBefore = $metrics.memoryBytesSpilled(); + | $sortedIterator = $sorterVariable.sort(); + | $dataSize.add($sorterVariable.getPeakMemoryUsage()); + | $spillSize.add($metrics.memoryBytesSpilled() - $spillSizeBefore); + | $metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage()); + | $needToSort = false; + | } + | + | while ($sortedIterator.hasNext()) { + | UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next(); + | ${consume(ctx, null, outputRow)} + | if (shouldStop()) return; + | } + """.stripMargin.trim + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { +val colExprs = child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable) +} + +ctx.currentVars = input +val code = GenerateUnsafeProjection.createCode(ctx, colExprs) + +s""" + | // Convert the input attributes to an UnsafeRow and add it to the sorter + | ${code.code} --- End diff -- This may have performance regression, when Sort is top of Exchange (or other operator that produce UnsafeRow), we will create variables from UnsafeRow, than create another UnsafeRow using these variables. See https://github.com/apache/spark/pull/11008#discussion_r53856345 @yhuai Should we revert this patch or fix this by follow-up PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11359 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190386889 OK I am merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190385118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52192/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190385115 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190384539 **[Test build #52192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52192/consoleFull)** for PR 11359 at commit [`65ed647`](https://github.com/apache/spark/commit/65ed64708cec2f5944d3ca8e7921fce85d7b8b0d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190337767 **[Test build #52192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52192/consoleFull)** for PR 11359 at commit [`65ed647`](https://github.com/apache/spark/commit/65ed64708cec2f5944d3ca8e7921fce85d7b8b0d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54458824 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") --- End diff -- Thanks, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54458793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") +ctx.addMutableState(classOf[Long].getName, spillSizeBefore, "") +s""" + | if ($needToSort) { + | $addToSorter(); + | $spillSizeBefore = $metrics.memoryBytesSpilled(); + | $sortedIterator = $sorterVariable.sort(); + | $dataSize.add($sorterVariable.getPeakMemoryUsage()); + | $spillSize.add($metrics.memoryBytesSpilled() - $spillSizeBefore); + | $metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage()); + | $needToSort = false; + | } + | + | while ($sortedIterator.hasNext()) { + | UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next(); + | ${consume(ctx, null, outputRow)} + | if (shouldStop()) return; + | } + """.stripMargin.trim + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { +val colExprs = child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable) +} + +ctx.currentVars = input +val code = GenerateUnsafeProjection.createCode(ctx, colExprs, useSubexprElimination = false) --- End diff -- Couldn't think of anything but I wasn't sure of the implications; so just copied it as-is from https://github.com/apache/spark/pull/11008. Should I remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54369578 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") +ctx.addMutableState(classOf[Long].getName, spillSizeBefore, "") +s""" + | if ($needToSort) { + | $addToSorter(); + | $spillSizeBefore = $metrics.memoryBytesSpilled(); + | $sortedIterator = $sorterVariable.sort(); + | $dataSize.add($sorterVariable.getPeakMemoryUsage()); + | $spillSize.add($metrics.memoryBytesSpilled() - $spillSizeBefore); + | $metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage()); + | $needToSort = false; + | } + | + | while ($sortedIterator.hasNext()) { + | UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next(); + | ${consume(ctx, null, outputRow)} + | if (shouldStop()) return; + | } + """.stripMargin.trim + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { +val colExprs = child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable) +} + +ctx.currentVars = input +val code = GenerateUnsafeProjection.createCode(ctx, colExprs, useSubexprElimination = false) --- End diff -- Any reason to explicilty set subexpr to false? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-190040188 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54369543 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") --- End diff -- This can just be a local var. Just remove the ".addMutableState" below and fix line 141. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189683500 @nongli this should be ready for your pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189608504 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189608505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52109/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189608444 **[Test build #52109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52109/consoleFull)** for PR 11359 at commit [`4651ce9`](https://github.com/apache/spark/commit/4651ce97ac94dc0d04b5dbb6e2ed355ba3d7abc7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189598960 **[Test build #52109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52109/consoleFull)** for PR 11359 at commit [`4651ce9`](https://github.com/apache/spark/commit/4651ce97ac94dc0d04b5dbb6e2ed355ba3d7abc7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188970323 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51982/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188970314 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188969778 **[Test build #51982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51982/consoleFull)** for PR 11359 at commit [`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188933941 **[Test build #51982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51982/consoleFull)** for PR 11359 at commit [`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188933507 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188925921 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51978/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188925919 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188925649 **[Test build #51978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51978/consoleFull)** for PR 11359 at commit [`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188922487 Generated code: ```java /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ /** Codegened pipeline for: /* 006 */ * Sort [id#0L ASC], true, 0 /* 007 */ +- INPUT /* 008 */ */ /* 009 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 010 */ private Object[] references; /* 011 */ private boolean sort_needToSort; /* 012 */ private org.apache.spark.sql.execution.Sort sort_plan; /* 013 */ private org.apache.spark.sql.execution.UnsafeExternalRowSorter sort_sorter; /* 014 */ private org.apache.spark.executor.TaskMetrics sort_metrics; /* 015 */ private scala.collection.Iterator sort_sortedIter; /* 016 */ private scala.collection.Iterator inputadapter_input; /* 017 */ private UnsafeRow sort_result; /* 018 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder sort_holder; /* 019 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter sort_rowWriter; /* 020 */ private long sort_dataSize; /* 021 */ private long sort_spillSize; /* 022 */ private long sort_spillSizeBefore; /* 023 */ /* 024 */ public GeneratedIterator(Object[] references) { /* 025 */ this.references = references; /* 026 */ } /* 027 */ /* 028 */ public void init(scala.collection.Iterator inputs[]) { /* 029 */ sort_needToSort = true; /* 030 */ this.sort_plan = (org.apache.spark.sql.execution.Sort) references[0]; /* 031 */ sort_sorter = sort_plan.createSorter(); /* 032 */ sort_metrics = org.apache.spark.TaskContext.get().taskMetrics(); /* 033 */ /* 034 */ inputadapter_input = inputs[0]; /* 035 */ sort_result = new UnsafeRow(1); /* 036 */ this.sort_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(sort_result, 0); /* 037 */ this.sort_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(sort_holder, 1); /* 038 */ /* 039 */ } /* 040 */ /* 041 */ private void sort_addToSorter() throws java.io.IOException { /* 042 */ while (inputadapter_input.hasNext()) { /* 043 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 044 */ /* input[0, bigint] */ /* 045 */ boolean inputadapter_isNull = inputadapter_row.isNullAt(0); /* 046 */ long inputadapter_value = inputadapter_isNull ? -1L : (inputadapter_row.getLong(0)); /* 047 */ // Convert the input attributes to an UnsafeRow and add it to the sorter /* 048 */ /* 049 */ sort_rowWriter.zeroOutNullBytes(); /* 050 */ /* 051 */ if (inputadapter_isNull) { /* 052 */ sort_rowWriter.setNullAt(0); /* 053 */ } else { /* 054 */ sort_rowWriter.write(0, inputadapter_value); /* 055 */ } /* 056 */ /* 057 */ sort_sorter.insertRow(sort_result); /* 058 */ if (shouldStop()) { /* 059 */ return; /* 060 */ } /* 061 */ } /* 062 */ /* 063 */ } /* 064 */ /* 065 */ protected void processNext() throws java.io.IOException { /* 066 */ if (sort_needToSort) { /* 067 */ sort_addToSorter(); /* 068 */ sort_spillSizeBefore = sort_metrics.memoryBytesSpilled(); /* 069 */ sort_sortedIter = sort_sorter.sort(); /* 070 */ sort_dataSize += sort_sorter.getPeakMemoryUsage(); /* 071 */ sort_spillSize += sort_metrics.memoryBytesSpilled() - sort_spillSizeBefore; /* 072 */ sort_metrics.incPeakExecutionMemory(sort_sorter.getPeakMemoryUsage()); /* 073 */ sort_needToSort = false; /* 074 */ } /* 075 */ /* 076 */ while (sort_sortedIter.hasNext()) { /* 077 */ UnsafeRow sort_outputRow = (UnsafeRow)sort_sortedIter.next(); /* 078 */ append(sort_outputRow.copy()); /* 079 */ } /* 080 */ } /* 081 */ } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188921710 Maybe paste the generated code in the comment section so it doesn't get merged as part of the commit. Otherwise the commit description is super long. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-18631 **[Test build #51978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51978/consoleFull)** for PR 11359 at commit [`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188733513 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188733520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51953/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188733131 **[Test build #51953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51953/consoleFull)** for PR 11359 at commit [`02aa3d0`](https://github.com/apache/spark/commit/02aa3d0c764d3387960d33e40f8f5fd88714d052). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188687269 **[Test build #51953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51953/consoleFull)** for PR 11359 at commit [`02aa3d0`](https://github.com/apache/spark/commit/02aa3d0c764d3387960d33e40f8f5fd88714d052). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188686409 Thanks @rxin, added! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188613970 @sameeragarwal you should update the pr description to actually include what this patch does (in addition to that it was built on an earlier pr). For code gen prs, would be great to paste in the generated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188595382 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51922/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188595379 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188594830 **[Test build #51922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51922/consoleFull)** for PR 11359 at commit [`fa7c991`](https://github.com/apache/spark/commit/fa7c991a66d820e3757831747ca8dc5c2df1c634). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188569046 **[Test build #51922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51922/consoleFull)** for PR 11359 at commit [`fa7c991`](https://github.com/apache/spark/commit/fa7c991a66d820e3757831747ca8dc5c2df1c634). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11359 [SPARK-13123][SQL] Implement whole state codegen for sort ## What changes were proposed in this pull request? This just builds on @nongli 's PR: https://github.com/apache/spark/pull/11008 which actually implements the feature, and adds some unit tests for verifying correctness ## How was this patch tested? Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark sort-codegen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11359.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11359 commit 11e26c9201f939022a9750c4eb7986f63a37bd5f Author: Nong LiDate: 2016-02-02T00:31:05Z [SPARK-13123][SQL] Implement whole state codegen for sort. This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object. commit 564a5b362530f7618d17074bdb2ade33dd4ec6f5 Author: Nong Li Date: 2016-02-02T19:52:38Z Import order fixes. commit 7f50b6a5e49fd20ef2847ff65834ef5774e6a832 Author: Sameer Agarwal Date: 2016-02-25T00:41:55Z Merge commit 'refs/pull/11008/head' of github.com:apache/spark into sort-codegen commit d50ca8e031edcddc07b36c7014417c7060a84478 Author: Sameer Agarwal Date: 2016-02-25T01:26:35Z fix compile + tests commit aceab91af9d2dfffce256630a7bcf8631a60a1dc Author: Sameer Agarwal Date: 2016-02-25T01:40:56Z add unit test in WholeStageCodegenSuite commit fa7c991a66d820e3757831747ca8dc5c2df1c634 Author: Sameer Agarwal Date: 2016-02-25T02:07:47Z add test in SQLMetricsSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11008#discussion_r53856345 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +98,63 @@ case class Sort( sortedIterator } } + + override def upstream(): RDD[InternalRow] = { +child.asInstanceOf[CodegenSupport].upstream() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +s""" + | if ($needToSort) { + | $addToSorter(); + | $sortedIterator = $sorterVariable.sort(); + | $needToSort = false; + | } + | + | while ($sortedIterator.hasNext()) { + | UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next(); + | ${consume(ctx, null, outputRow)} + | } + """.stripMargin.trim + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { +val colExprs = child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable) +} + +ctx.currentVars = input +val code = GenerateUnsafeProjection.createCode(ctx, colExprs, false) --- End diff -- If the child can produce UnsafeRow (for example, Exchange), we should have a way to avoid this unpack and pack again, or we will see regression (generated version slower than non-generated). I think we can pass the variable for input row into `doCosume`, could be null. It's better to do this after #11274 , then we don't need to worry about whether should we create variables for input or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11008#discussion_r51834737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +98,63 @@ case class Sort( sortedIterator } } + + override def upstream(): RDD[InternalRow] = { +child.asInstanceOf[CodegenSupport].upstream() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ --- End diff -- it's ok here as discussed offline. i just found mutable state in here as a way to pass variable names through pretty brittle. maybe good to have a more general abstraction for this in codegen, but not that big of a deal right now/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178826829 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178826831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50584/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178826530 **[Test build #50584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50584/consoleFull)** for PR 11008 at commit [`564a5b3`](https://github.com/apache/spark/commit/564a5b362530f7618d17074bdb2ade33dd4ec6f5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178795947 **[Test build #50584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50584/consoleFull)** for PR 11008 at commit [`564a5b3`](https://github.com/apache/spark/commit/564a5b362530f7618d17074bdb2ade33dd4ec6f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11008#discussion_r51624075 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +98,63 @@ case class Sort( sortedIterator } } + + override def upstream(): RDD[InternalRow] = { +child.asInstanceOf[CodegenSupport].upstream() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ --- End diff -- Why? This is the state that needs to be kept between the two member functions in this class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178273018 ``` /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ /* 007 */ private Object[] references; /* 008 */ private boolean Sort_needToSort0; /* 009 */ private org.apache.spark.sql.execution.Sort Sort_plan1; /* 010 */ private org.apache.spark.sql.execution.UnsafeExternalRowSorter Sort_sorter2; /* 011 */ private scala.collection.Iterator Sort_sortedIter3; /* 012 */ private UnsafeRow Sort_result24; /* 013 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder Sort_holder25; /* 014 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter Sort_rowWriter26; /* 015 */ /* 016 */ private void Sort_addToSorter4() throws java.io.IOException { /* 017 */ /* 018 */ while (input.hasNext()) { /* 019 */ InternalRow InputAdapter_row5 = (InternalRow) input.next(); /* 020 */ /* input[0, int] */ /* 021 */ boolean InputAdapter_isNull6 = InputAdapter_row5.isNullAt(0); /* 022 */ int InputAdapter_value7 = InputAdapter_isNull6 ? -1 : (InputAdapter_row5.getInt(0)); /* 023 */ /* input[1, string] */ /* 024 */ boolean InputAdapter_isNull8 = InputAdapter_row5.isNullAt(1); /* 025 */ UTF8String InputAdapter_value9 = InputAdapter_isNull8 ? null : (InputAdapter_row5.getUTF8String(1)); /* 026 */ /* 027 */ /* (input[0, int] < 20) */ /* 028 */ /* input[0, int] */ /* 029 */ /* 030 */ /* 20 */ /* 031 */ /* 032 */ boolean Filter_value11 = false; /* 033 */ Filter_value11 = InputAdapter_value7 < 20; /* 034 */ if (!false && Filter_value11) { /* 035 */ /* 036 */ /* input[0, int] */ /* 037 */ /* 038 */ /* input[1, string] */ /* 039 */ /* 040 */ /* 041 */ // Convert the input attributes to an UnsafeRow and add it to the sorter /* 042 */ /* 043 */ Sort_holder25.reset(); /* 044 */ /* 045 */ Sort_rowWriter26.zeroOutNullBytes(); /* 046 */ /* 047 */ /* input[0, int] */ /* 048 */ /* 049 */ if (InputAdapter_isNull6) { /* 050 */ Sort_rowWriter26.setNullAt(0); /* 051 */ } else { /* 052 */ Sort_rowWriter26.write(0, InputAdapter_value7); /* 053 */ } /* 054 */ /* 055 */ /* input[1, string] */ /* 056 */ /* 057 */ if (InputAdapter_isNull8) { /* 058 */ Sort_rowWriter26.setNullAt(1); /* 059 */ } else { /* 060 */ Sort_rowWriter26.write(1, InputAdapter_value9); /* 061 */ } /* 062 */ Sort_result24.setTotalSize(Sort_holder25.totalSize()); /* 063 */ /* 064 */ Sort_sorter2.insertRow(Sort_result24); /* 065 */ /* 066 */ } /* 067 */ /* 068 */ } /* 069 */ /* 070 */ } /* 071 */ /* 072 */ public GeneratedIterator(Object[] references) { /* 073 */ this.references = references; /* 074 */ Sort_needToSort0 = true; /* 075 */ this.Sort_plan1 = (org.apache.spark.sql.execution.Sort) references[0]; /* 076 */ Sort_sorter2 = Sort_plan1.createSorter(); /* 077 */ /* 078 */ Sort_result24 = new UnsafeRow(2); /* 079 */ this.Sort_holder25 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(Sort_result24, 32); /* 080 */ this.Sort_rowWriter26 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(Sort_holder25, 2); /* 081 */ } /* 082 */ /* 083 */ protected void processNext() throws java.io.IOException { /* 084 */ if (Sort_needToSort0) { /* 085 */ Sort_addToSorter4(); /* 086 */ Sort_sortedIter3 = Sort_sorter2.sort(); /* 087 */ Sort_needToSort0 = false; /* 088 */ } /* 089 */ /* 090 */ while (Sort_sortedIter3.hasNext()) { /* 091 */ UnsafeRow Sort_outputRow29 = (UnsafeRow)Sort_sortedIter3.next(); /* 092 */ System.out.println(Sort_outputRow29); /* 093 */ currentRow = Sort_outputRow29; /* 094 */ return; /* 095 */ } /* 096 */ } /* 097 */ } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
GitHub user nongli opened a pull request: https://github.com/apache/spark/pull/11008 [SPARK-13123][SQL] Implement whole state codegen for sort. This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nongli/spark spark-13123 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11008.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11008 commit 11e26c9201f939022a9750c4eb7986f63a37bd5f Author: Nong LiDate: 2016-02-02T00:31:05Z [SPARK-13123][SQL] Implement whole state codegen for sort. This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178273316 **[Test build #50511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50511/consoleFull)** for PR 11008 at commit [`11e26c9`](https://github.com/apache/spark/commit/11e26c9201f939022a9750c4eb7986f63a37bd5f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178273558 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178273556 **[Test build #50511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50511/consoleFull)** for PR 11008 at commit [`11e26c9`](https://github.com/apache/spark/commit/11e26c9201f939022a9750c4eb7986f63a37bd5f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class UnsafeExternalRowSorter ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11008#issuecomment-178273560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50511/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11008#discussion_r51525091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +98,63 @@ case class Sort( sortedIterator } } + + override def upstream(): RDD[InternalRow] = { +child.asInstanceOf[CodegenSupport].upstream() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ --- End diff -- this is pretty ghetto... (although i understand maybe it's the simplest way to implement this) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org