[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-03-02 Thread nongli
Github user nongli closed the pull request at:

https://github.com/apache/spark/pull/11008


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-03-02 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-191427912
  
@nongli Can you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54522189
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,74 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
+s"""
+   | if ($needToSort) {
+   |   $addToSorter();
+   |   Long $spillSizeBefore = $metrics.memoryBytesSpilled();
+   |   $sortedIterator = $sorterVariable.sort();
+   |   $dataSize.add($sorterVariable.getPeakMemoryUsage());
+   |   $spillSize.add($metrics.memoryBytesSpilled() - 
$spillSizeBefore);
+   |   
$metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage());
+   |   $needToSort = false;
+   | }
+   |
+   | while ($sortedIterator.hasNext()) {
+   |   UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next();
+   |   ${consume(ctx, null, outputRow)}
+   |   if (shouldStop()) return;
+   | }
+ """.stripMargin.trim
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val colExprs = child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable)
+}
+
+ctx.currentVars = input
+val code = GenerateUnsafeProjection.createCode(ctx, colExprs)
+
+s"""
+   | // Convert the input attributes to an UnsafeRow and add it to the 
sorter
+   | ${code.code}
--- End diff --

This may have performance regression, when Sort is top of Exchange (or 
other operator that produce UnsafeRow), we will create variables from 
UnsafeRow, than create another UnsafeRow using these variables.

See https://github.com/apache/spark/pull/11008#discussion_r53856345

@yhuai Should we revert this patch or fix this by follow-up PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11359


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190386889
  
OK I am merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190385118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52192/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190385115
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190384539
  
**[Test build #52192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52192/consoleFull)**
 for PR 11359 at commit 
[`65ed647`](https://github.com/apache/spark/commit/65ed64708cec2f5944d3ca8e7921fce85d7b8b0d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190337767
  
**[Test build #52192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52192/consoleFull)**
 for PR 11359 at commit 
[`65ed647`](https://github.com/apache/spark/commit/65ed64708cec2f5944d3ca8e7921fce85d7b8b0d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54458824
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,75 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
--- End diff --

Thanks, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54458793
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,75 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
+ctx.addMutableState(classOf[Long].getName, spillSizeBefore, "")
+s"""
+   | if ($needToSort) {
+   |   $addToSorter();
+   |   $spillSizeBefore = $metrics.memoryBytesSpilled();
+   |   $sortedIterator = $sorterVariable.sort();
+   |   $dataSize.add($sorterVariable.getPeakMemoryUsage());
+   |   $spillSize.add($metrics.memoryBytesSpilled() - 
$spillSizeBefore);
+   |   
$metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage());
+   |   $needToSort = false;
+   | }
+   |
+   | while ($sortedIterator.hasNext()) {
+   |   UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next();
+   |   ${consume(ctx, null, outputRow)}
+   |   if (shouldStop()) return;
+   | }
+ """.stripMargin.trim
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val colExprs = child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable)
+}
+
+ctx.currentVars = input
+val code = GenerateUnsafeProjection.createCode(ctx, colExprs, 
useSubexprElimination = false)
--- End diff --

Couldn't think of anything but I wasn't sure of the implications; so just 
copied it as-is from https://github.com/apache/spark/pull/11008. Should I 
remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-28 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54369578
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,75 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
+ctx.addMutableState(classOf[Long].getName, spillSizeBefore, "")
+s"""
+   | if ($needToSort) {
+   |   $addToSorter();
+   |   $spillSizeBefore = $metrics.memoryBytesSpilled();
+   |   $sortedIterator = $sorterVariable.sort();
+   |   $dataSize.add($sorterVariable.getPeakMemoryUsage());
+   |   $spillSize.add($metrics.memoryBytesSpilled() - 
$spillSizeBefore);
+   |   
$metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage());
+   |   $needToSort = false;
+   | }
+   |
+   | while ($sortedIterator.hasNext()) {
+   |   UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next();
+   |   ${consume(ctx, null, outputRow)}
+   |   if (shouldStop()) return;
+   | }
+ """.stripMargin.trim
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val colExprs = child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable)
+}
+
+ctx.currentVars = input
+val code = GenerateUnsafeProjection.createCode(ctx, colExprs, 
useSubexprElimination = false)
--- End diff --

Any reason to explicilty set subexpr to false?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-28 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-190040188
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-28 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54369543
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,75 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
--- End diff --

This can just be a local var. Just remove the ".addMutableState" below and 
fix line 141.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-27 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-189683500
  
@nongli this should be ready for your pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-189608504
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-189608505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52109/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-189608444
  
**[Test build #52109 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52109/consoleFull)**
 for PR 11359 at commit 
[`4651ce9`](https://github.com/apache/spark/commit/4651ce97ac94dc0d04b5dbb6e2ed355ba3d7abc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-189598960
  
**[Test build #52109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52109/consoleFull)**
 for PR 11359 at commit 
[`4651ce9`](https://github.com/apache/spark/commit/4651ce97ac94dc0d04b5dbb6e2ed355ba3d7abc7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188970323
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51982/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188970314
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188969778
  
**[Test build #51982 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51982/consoleFull)**
 for PR 11359 at commit 
[`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188933941
  
**[Test build #51982 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51982/consoleFull)**
 for PR 11359 at commit 
[`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188933507
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188925921
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51978/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188925919
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188925649
  
**[Test build #51978 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51978/consoleFull)**
 for PR 11359 at commit 
[`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188922487
  
Generated code:

```java
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ /** Codegened pipeline for:
/* 006 */ * Sort [id#0L ASC], true, 0
/* 007 */ +- INPUT
/* 008 */ */
/* 009 */ class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 010 */   private Object[] references;
/* 011 */   private boolean sort_needToSort;
/* 012 */   private org.apache.spark.sql.execution.Sort sort_plan;
/* 013 */   private org.apache.spark.sql.execution.UnsafeExternalRowSorter 
sort_sorter;
/* 014 */   private org.apache.spark.executor.TaskMetrics sort_metrics;
/* 015 */   private scala.collection.Iterator sort_sortedIter;
/* 016 */   private scala.collection.Iterator inputadapter_input;
/* 017 */   private UnsafeRow sort_result;
/* 018 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder sort_holder;
/* 019 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
sort_rowWriter;
/* 020 */   private long sort_dataSize;
/* 021 */   private long sort_spillSize;
/* 022 */   private long sort_spillSizeBefore;
/* 023 */
/* 024 */   public GeneratedIterator(Object[] references) {
/* 025 */ this.references = references;
/* 026 */   }
/* 027 */
/* 028 */   public void init(scala.collection.Iterator inputs[]) {
/* 029 */ sort_needToSort = true;
/* 030 */ this.sort_plan = (org.apache.spark.sql.execution.Sort) 
references[0];
/* 031 */ sort_sorter = sort_plan.createSorter();
/* 032 */ sort_metrics = 
org.apache.spark.TaskContext.get().taskMetrics();
/* 033 */
/* 034 */ inputadapter_input = inputs[0];
/* 035 */ sort_result = new UnsafeRow(1);
/* 036 */ this.sort_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(sort_result, 0);
/* 037 */ this.sort_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(sort_holder, 
1);
/* 038 */
/* 039 */   }
/* 040 */
/* 041 */   private void sort_addToSorter() throws java.io.IOException {
/* 042 */ while (inputadapter_input.hasNext()) {
/* 043 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 044 */   /* input[0, bigint] */
/* 045 */   boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 046 */   long inputadapter_value = inputadapter_isNull ? -1L : 
(inputadapter_row.getLong(0));
/* 047 */   // Convert the input attributes to an UnsafeRow and add it 
to the sorter
/* 048 */
/* 049 */   sort_rowWriter.zeroOutNullBytes();
/* 050 */
/* 051 */   if (inputadapter_isNull) {
/* 052 */ sort_rowWriter.setNullAt(0);
/* 053 */   } else {
/* 054 */ sort_rowWriter.write(0, inputadapter_value);
/* 055 */   }
/* 056 */
/* 057 */   sort_sorter.insertRow(sort_result);
/* 058 */   if (shouldStop()) {
/* 059 */ return;
/* 060 */   }
/* 061 */ }
/* 062 */
/* 063 */   }
/* 064 */
/* 065 */   protected void processNext() throws java.io.IOException {
/* 066 */ if (sort_needToSort) {
/* 067 */   sort_addToSorter();
/* 068 */   sort_spillSizeBefore = sort_metrics.memoryBytesSpilled();
/* 069 */   sort_sortedIter = sort_sorter.sort();
/* 070 */   sort_dataSize += sort_sorter.getPeakMemoryUsage();
/* 071 */   sort_spillSize += sort_metrics.memoryBytesSpilled() - 
sort_spillSizeBefore;
/* 072 */   
sort_metrics.incPeakExecutionMemory(sort_sorter.getPeakMemoryUsage());
/* 073 */   sort_needToSort = false;
/* 074 */ }
/* 075 */
/* 076 */ while (sort_sortedIter.hasNext()) {
/* 077 */   UnsafeRow sort_outputRow = 
(UnsafeRow)sort_sortedIter.next();
/* 078 */   append(sort_outputRow.copy());
/* 079 */ }
/* 080 */   }
/* 081 */ }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188921710
  
Maybe paste the generated code in the comment section so it doesn't get 
merged as part of the commit. Otherwise the commit description is super long. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-18631
  
**[Test build #51978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51978/consoleFull)**
 for PR 11359 at commit 
[`c953a60`](https://github.com/apache/spark/commit/c953a60f8c500daf9c768caa98f712a593a83f2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188733513
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188733520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51953/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188733131
  
**[Test build #51953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51953/consoleFull)**
 for PR 11359 at commit 
[`02aa3d0`](https://github.com/apache/spark/commit/02aa3d0c764d3387960d33e40f8f5fd88714d052).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188687269
  
**[Test build #51953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51953/consoleFull)**
 for PR 11359 at commit 
[`02aa3d0`](https://github.com/apache/spark/commit/02aa3d0c764d3387960d33e40f8f5fd88714d052).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188686409
  
Thanks @rxin, added!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188613970
  
@sameeragarwal  you should update the pr description to actually include 
what this patch does (in addition to that it was built on an earlier pr).

For code gen prs, would be great to paste in the generated code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188595382
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51922/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188595379
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188594830
  
**[Test build #51922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51922/consoleFull)**
 for PR 11359 at commit 
[`fa7c991`](https://github.com/apache/spark/commit/fa7c991a66d820e3757831747ca8dc5c2df1c634).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188569046
  
**[Test build #51922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51922/consoleFull)**
 for PR 11359 at commit 
[`fa7c991`](https://github.com/apache/spark/commit/fa7c991a66d820e3757831747ca8dc5c2df1c634).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/11359

[SPARK-13123][SQL] Implement whole state codegen for sort

## What changes were proposed in this pull request?

This just builds on @nongli 's PR: 
https://github.com/apache/spark/pull/11008 which actually implements the 
feature, and adds some unit tests for verifying correctness

## How was this patch tested?

Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark sort-codegen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11359


commit 11e26c9201f939022a9750c4eb7986f63a37bd5f
Author: Nong Li 
Date:   2016-02-02T00:31:05Z

[SPARK-13123][SQL] Implement whole state codegen for sort.

This does the simplest thing of just assembly a row on consume and driving 
the
underlying external sorter object.

commit 564a5b362530f7618d17074bdb2ade33dd4ec6f5
Author: Nong Li 
Date:   2016-02-02T19:52:38Z

Import order fixes.

commit 7f50b6a5e49fd20ef2847ff65834ef5774e6a832
Author: Sameer Agarwal 
Date:   2016-02-25T00:41:55Z

Merge commit 'refs/pull/11008/head' of github.com:apache/spark into 
sort-codegen

commit d50ca8e031edcddc07b36c7014417c7060a84478
Author: Sameer Agarwal 
Date:   2016-02-25T01:26:35Z

fix compile + tests

commit aceab91af9d2dfffce256630a7bcf8631a60a1dc
Author: Sameer Agarwal 
Date:   2016-02-25T01:40:56Z

add unit test in WholeStageCodegenSuite

commit fa7c991a66d820e3757831747ca8dc5c2df1c634
Author: Sameer Agarwal 
Date:   2016-02-25T02:07:47Z

add test in SQLMetricsSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-23 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11008#discussion_r53856345
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +98,63 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstream(): RDD[InternalRow] = {
+child.asInstanceOf[CodegenSupport].upstream()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+s"""
+   | if ($needToSort) {
+   |   $addToSorter();
+   |   $sortedIterator = $sorterVariable.sort();
+   |   $needToSort = false;
+   | }
+   |
+   | while ($sortedIterator.hasNext()) {
+   |   UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next();
+   |   ${consume(ctx, null, outputRow)}
+   | }
+ """.stripMargin.trim
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val colExprs = child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable)
+}
+
+ctx.currentVars = input
+val code = GenerateUnsafeProjection.createCode(ctx, colExprs, false)
--- End diff --

If the child can produce UnsafeRow (for example, Exchange), we should have 
a way to avoid this unpack and pack again, or we will see regression (generated 
version slower than non-generated).

I think we can pass the variable for input row into `doCosume`, could be 
null. It's better to do this after #11274 , then we don't need to worry about 
whether should we create variables for input or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11008#discussion_r51834737
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +98,63 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstream(): RDD[InternalRow] = {
+child.asInstanceOf[CodegenSupport].upstream()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
--- End diff --

it's ok here as discussed offline. i just found mutable state in here as a 
way to pass variable names through pretty brittle. maybe good to have a more 
general abstraction for this in codegen, but not that big of a deal right now/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178826829
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178826831
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50584/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178826530
  
**[Test build #50584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50584/consoleFull)**
 for PR 11008 at commit 
[`564a5b3`](https://github.com/apache/spark/commit/564a5b362530f7618d17074bdb2ade33dd4ec6f5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178795947
  
**[Test build #50584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50584/consoleFull)**
 for PR 11008 at commit 
[`564a5b3`](https://github.com/apache/spark/commit/564a5b362530f7618d17074bdb2ade33dd4ec6f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-02 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11008#discussion_r51624075
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +98,63 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstream(): RDD[InternalRow] = {
+child.asInstanceOf[CodegenSupport].upstream()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
--- End diff --

Why? This is the state that needs to be kept between the two member 
functions in this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178273018
  
```
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */ 
/* 005 */ class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   
/* 007 */   private Object[] references;
/* 008 */   private boolean Sort_needToSort0;
/* 009 */   private org.apache.spark.sql.execution.Sort Sort_plan1;
/* 010 */   private org.apache.spark.sql.execution.UnsafeExternalRowSorter 
Sort_sorter2;
/* 011 */   private scala.collection.Iterator Sort_sortedIter3;
/* 012 */   private UnsafeRow Sort_result24;
/* 013 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder Sort_holder25;
/* 014 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
Sort_rowWriter26;
/* 015 */   
/* 016 */   private void Sort_addToSorter4() throws java.io.IOException {
/* 017 */ 
/* 018 */ while (input.hasNext()) {
/* 019 */   InternalRow InputAdapter_row5 = (InternalRow) input.next();
/* 020 */   /* input[0, int] */
/* 021 */   boolean InputAdapter_isNull6 = 
InputAdapter_row5.isNullAt(0);
/* 022 */   int InputAdapter_value7 = InputAdapter_isNull6 ? -1 : 
(InputAdapter_row5.getInt(0));
/* 023 */   /* input[1, string] */
/* 024 */   boolean InputAdapter_isNull8 = 
InputAdapter_row5.isNullAt(1);
/* 025 */   UTF8String InputAdapter_value9 = InputAdapter_isNull8 ? 
null : (InputAdapter_row5.getUTF8String(1));
/* 026 */   
/* 027 */   /* (input[0, int] < 20) */
/* 028 */   /* input[0, int] */
/* 029 */   
/* 030 */   /* 20 */
/* 031 */   
/* 032 */   boolean Filter_value11 = false;
/* 033 */   Filter_value11 = InputAdapter_value7 < 20;
/* 034 */   if (!false && Filter_value11) {
/* 035 */ 
/* 036 */ /* input[0, int] */
/* 037 */ 
/* 038 */ /* input[1, string] */
/* 039 */ 
/* 040 */ 
/* 041 */ // Convert the input attributes to an UnsafeRow and add 
it to the sorter
/* 042 */ 
/* 043 */ Sort_holder25.reset();
/* 044 */ 
/* 045 */ Sort_rowWriter26.zeroOutNullBytes();
/* 046 */ 
/* 047 */ /* input[0, int] */
/* 048 */ 
/* 049 */ if (InputAdapter_isNull6) {
/* 050 */   Sort_rowWriter26.setNullAt(0);
/* 051 */ } else {
/* 052 */   Sort_rowWriter26.write(0, InputAdapter_value7);
/* 053 */ }
/* 054 */ 
/* 055 */ /* input[1, string] */
/* 056 */ 
/* 057 */ if (InputAdapter_isNull8) {
/* 058 */   Sort_rowWriter26.setNullAt(1);
/* 059 */ } else {
/* 060 */   Sort_rowWriter26.write(1, InputAdapter_value9);
/* 061 */ }
/* 062 */ Sort_result24.setTotalSize(Sort_holder25.totalSize());
/* 063 */ 
/* 064 */ Sort_sorter2.insertRow(Sort_result24);
/* 065 */ 
/* 066 */   }
/* 067 */   
/* 068 */ }
/* 069 */ 
/* 070 */   }
/* 071 */   
/* 072 */   public GeneratedIterator(Object[] references) {
/* 073 */ this.references = references;
/* 074 */ Sort_needToSort0 = true;
/* 075 */ this.Sort_plan1 = (org.apache.spark.sql.execution.Sort) 
references[0];
/* 076 */ Sort_sorter2 = Sort_plan1.createSorter();
/* 077 */ 
/* 078 */ Sort_result24 = new UnsafeRow(2);
/* 079 */ this.Sort_holder25 = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(Sort_result24, 
32);
/* 080 */ this.Sort_rowWriter26 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(Sort_holder25,
 2);
/* 081 */   }
/* 082 */   
/* 083 */   protected void processNext() throws java.io.IOException {
/* 084 */ if (Sort_needToSort0) {
/* 085 */   Sort_addToSorter4();
/* 086 */   Sort_sortedIter3 = Sort_sorter2.sort();
/* 087 */   Sort_needToSort0 = false;
/* 088 */ }
/* 089 */ 
/* 090 */ while (Sort_sortedIter3.hasNext()) {
/* 091 */   UnsafeRow Sort_outputRow29 = 
(UnsafeRow)Sort_sortedIter3.next();
/* 092 */   System.out.println(Sort_outputRow29);
/* 093 */   currentRow = Sort_outputRow29;
/* 094 */   return;
/* 095 */ }
/* 096 */   }
/* 097 */ }
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread nongli
GitHub user nongli opened a pull request:

https://github.com/apache/spark/pull/11008

[SPARK-13123][SQL] Implement whole state codegen for sort.

This does the simplest thing of just assembly a row on consume and driving 
the
underlying external sorter object.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nongli/spark spark-13123

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11008


commit 11e26c9201f939022a9750c4eb7986f63a37bd5f
Author: Nong Li 
Date:   2016-02-02T00:31:05Z

[SPARK-13123][SQL] Implement whole state codegen for sort.

This does the simplest thing of just assembly a row on consume and driving 
the
underlying external sorter object.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178273316
  
**[Test build #50511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50511/consoleFull)**
 for PR 11008 at commit 
[`11e26c9`](https://github.com/apache/spark/commit/11e26c9201f939022a9750c4eb7986f63a37bd5f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178273558
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178273556
  
**[Test build #50511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50511/consoleFull)**
 for PR 11008 at commit 
[`11e26c9`](https://github.com/apache/spark/commit/11e26c9201f939022a9750c4eb7986f63a37bd5f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class UnsafeExternalRowSorter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11008#issuecomment-178273560
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50511/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-01 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11008#discussion_r51525091
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +98,63 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstream(): RDD[InternalRow] = {
+child.asInstanceOf[CodegenSupport].upstream()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
--- End diff --

this is pretty ghetto... (although i understand maybe it's the simplest way 
to implement this)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org