[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171240572
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171240574
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49305/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171214203
  
**[Test build #49300 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49300/consoleFull)**
 for PR 10029 at commit 
[`832db06`](https://github.com/apache/spark/commit/832db06a67980d1aa51bb8330ec86dbc1f1a869c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class Covariance(left: Expression, right: Expression) extends 
ImperativeAggregate`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49562972
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171214447
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171214450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49300/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171236853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49309/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171236848
  
**[Test build #49309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)**
 for PR 10029 at commit 
[`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171239983
  
**[Test build #49305 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)**
 for PR 10029 at commit 
[`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171236474
  
**[Test build #49309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)**
 for PR 10029 at commit 
[`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171236852
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171214830
  
**[Test build #49305 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)**
 for PR 10029 at commit 
[`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49562260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(left: Expression, right: Expression) extends 
ImperativeAggregate
+with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: MutableRow, 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49562259
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(left: Expression, right: Expression) extends 
ImperativeAggregate
+with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: MutableRow, 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49562404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(left: Expression, right: Expression) extends 
ImperativeAggregate
+with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: MutableRow, 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49562322
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171388415
  
LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10029


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49557554
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49557393
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171198669
  
@viirya Yeah, we can merge this in first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49554363
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171191441
  
@davies Understood. Since the time to come out a better codegen version for 
this is not sure, can we merge this first in? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171169101
  
ping @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49554701
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
--- End diff --

Should we support implicit conversion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49555926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49555900
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171193912
  
**[Test build #49300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49300/consoleFull)**
 for PR 10029 at commit 
[`832db06`](https://github.com/apache/spark/commit/832db06a67980d1aa51bb8330ec86dbc1f1a869c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-171184876
  
@viirya One question is that: we are moving to do more codegen staff (whole 
stage codegen), but the ImperativeAggregate can't support that, not sure will 
we do another codegen version for this or not. Right now, ImperativeAggregate 
version faster, because the generated code is still not the best, but we can 
improve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49555736
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
--- End diff --

Oh, I see, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49554615
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val xAvgOffset = mutableAggBufferOffset
+  val yAvgOffset = mutableAggBufferOffset + 1
+  val CkOffset = mutableAggBufferOffset + 2
+  val countOffset = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputXAvgOffset = inputAggBufferOffset
+  val inputYAvgOffset = inputAggBufferOffset + 1
+  val inputCkOffset = inputAggBufferOffset + 2
+  val inputCountOffset = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(xAvgOffset, 0.0)
+buffer.setDouble(yAvgOffset, 0.0)
+buffer.setDouble(CkOffset, 0.0)
+buffer.setLong(countOffset, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(xAvgOffset)
+  var yAvg = buffer.getDouble(yAvgOffset)
+  var Ck = buffer.getDouble(CkOffset)
+  var count = buffer.getLong(countOffset)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(xAvgOffset, xAvg)
+  buffer.setDouble(yAvgOffset, yAvg)
+  buffer.setDouble(CkOffset, Ck)
+  buffer.setLong(countOffset, count)
+}
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: 

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49554635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
--- End diff --

this could fit in one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49555196
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression)
+  extends ImperativeAggregate with Serializable {
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
--- End diff --

`ImperativeAggregate` already supports implicit conversion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170482524
  
**[Test build #49128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49128/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49299893
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1
+  val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2
+  val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1
+  val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2
+  val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(mutableAggBufferOffset, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0)
+buffer.setLong(mutableAggBufferOffsetPlus3, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(mutableAggBufferOffset)
+  var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1)
+  var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2)
+  var count = buffer.getLong(mutableAggBufferOffsetPlus3)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(mutableAggBufferOffset, xAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus2, Ck)
+  

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49300964
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
--- End diff --

ImplicitCastInputTypes should do that for us. I added another test for 
integer input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170558530
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170563859
  
**[Test build #49146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49146/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170550776
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170550777
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49128/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170550666
  
**[Test build #49128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49128/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170576801
  
**[Test build #49146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49146/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170576891
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170576895
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49146/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170614439
  
@davies please see if the updates are good for you. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170611553
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49150/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170611550
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170611002
  
**[Test build #49150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49150/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170578055
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-170578993
  
**[Test build #49150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49150/consoleFull)**
 for PR 10029 at commit 
[`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217415
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
--- End diff --

These two do need to be passed in, see `CentralMomentAgg`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217997
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1
+  val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2
+  val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1
+  val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2
+  val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(mutableAggBufferOffset, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0)
+buffer.setLong(mutableAggBufferOffsetPlus3, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(mutableAggBufferOffset)
+  var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1)
+  var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2)
+  var count = buffer.getLong(mutableAggBufferOffsetPlus3)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(mutableAggBufferOffset, xAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus2, Ck)
+  

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49216499
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -309,6 +309,46 @@ object functions extends LegacyFunctions {
 countDistinct(Column(columnName), columnNames.map(Column.apply) : _*)
 
   /**
+   * Aggregate function: returns the population covariance for two columns.
+   *
+   * @group agg_funcs
+   * @since 1.6.0
--- End diff --

2.0.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217078
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
--- End diff --

Should this be true ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217495
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
--- End diff --

Should this support all NumericType?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217530
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
--- End diff --

Should this support all NumericType?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49217743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1
--- End diff --

Can we have a better name for them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2016-01-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10029#discussion_r49260274
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+/**
+ * Compute the covariance between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NULL.
+ *
+ */
+abstract class Covariance(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int,
+inputAggBufferOffset: Int)
+  extends ImperativeAggregate with Serializable {
+
+  override def children: Seq[Expression] = Seq(left, right)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (left.dataType.isInstanceOf[DoubleType] && 
right.dataType.isInstanceOf[DoubleType]) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(
+s"covariance requires that both arguments are double type, " +
+  s"not (${left.dataType}, ${right.dataType}).")
+}
+  }
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  override def inputAggBufferAttributes: Seq[AttributeReference] = {
+aggBufferAttributes.map(_.newInstance())
+  }
+
+  override val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  // Local cache of mutableAggBufferOffset(s) that will be used in update 
and merge
+  val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1
+  val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2
+  val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3
+
+  // Local cache of inputAggBufferOffset(s) that will be used in update 
and merge
+  val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1
+  val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2
+  val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3
+
+  override def initialize(buffer: MutableRow): Unit = {
+buffer.setDouble(mutableAggBufferOffset, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0)
+buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0)
+buffer.setLong(mutableAggBufferOffsetPlus3, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val leftEval = left.eval(input)
+val rightEval = right.eval(input)
+
+if (leftEval != null && rightEval != null) {
+  val x = leftEval.asInstanceOf[Double]
+  val y = rightEval.asInstanceOf[Double]
+
+  var xAvg = buffer.getDouble(mutableAggBufferOffset)
+  var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1)
+  var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2)
+  var count = buffer.getLong(mutableAggBufferOffsetPlus3)
+
+  val deltaX = x - xAvg
+  val deltaY = y - yAvg
+  count += 1
+  xAvg += deltaX / count
+  yAvg += deltaY / count
+  Ck += deltaX * (y - yAvg)
+
+  buffer.setDouble(mutableAggBufferOffset, xAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg)
+  buffer.setDouble(mutableAggBufferOffsetPlus2, Ck)
+  

[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160417691
  
**[Test build #46849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46849/consoleFull)**
 for PR 10029 at commit 
[`5a419f6`](https://github.com/apache/spark/commit/5a419f6747b6929b67614b670918106e5a18dceb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160417840
  
**[Test build #46849 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46849/consoleFull)**
 for PR 10029 at commit 
[`5a419f6`](https://github.com/apache/spark/commit/5a419f6747b6929b67614b670918106e5a18dceb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class Covariance(`\n  * `case class CovSample(`\n  * `case class 
CovPopulation(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160417843
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46849/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160417842
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160418449
  
**[Test build #46851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46851/consoleFull)**
 for PR 10029 at commit 
[`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160421565
  
**[Test build #46851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46851/consoleFull)**
 for PR 10029 at commit 
[`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class Covariance(`\n  * `case class CovSample(`\n  * `case class 
CovPopulation(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160421583
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160421584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46851/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160421665
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160422208
  
**[Test build #46853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46853/consoleFull)**
 for PR 10029 at commit 
[`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160433449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46853/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160433448
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp

2015-11-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10029#issuecomment-160433418
  
**[Test build #46853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46853/consoleFull)**
 for PR 10029 at commit 
[`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class Covariance(`\n  * `case class CovSample(`\n  * `case class 
CovPopulation(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org