[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171240572 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171240574 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49305/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214203 **[Test build #49300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49300/consoleFull)** for PR 10029 at commit [`832db06`](https://github.com/apache/spark/commit/832db06a67980d1aa51bb8330ec86dbc1f1a869c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214447 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49300/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49309/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236848 **[Test build #49309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)** for PR 10029 at commit [`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171239983 **[Test build #49305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)** for PR 10029 at commit [`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236474 **[Test build #49309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)** for PR 10029 at commit [`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236852 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214830 **[Test build #49305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)** for PR 10029 at commit [`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562259 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562322 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171388415 LGTM, merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10029 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49557554 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49557393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171198669 @viirya Yeah, we can merge this in first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49554363 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171191441 @davies Understood. Since the time to come out a better codegen version for this is not sure, can we merge this first in? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171169101 ping @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49554701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) --- End diff -- Should we support implicit conversion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49555926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49555900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171193912 **[Test build #49300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49300/consoleFull)** for PR 10029 at commit [`832db06`](https://github.com/apache/spark/commit/832db06a67980d1aa51bb8330ec86dbc1f1a869c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171184876 @viirya One question is that: we are moving to do more codegen staff (whole stage codegen), but the ImperativeAggregate can't support that, not sure will we do another codegen version for this or not. Right now, ImperativeAggregate version faster, because the generated code is still not the best, but we can improve. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49555736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) --- End diff -- Oh, I see, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49554615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49554635 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, --- End diff -- this could fit in one line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49555196 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) --- End diff -- `ImperativeAggregate` already supports implicit conversion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170482524 **[Test build #49128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49128/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49299893 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1 + val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2 + val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1 + val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2 + val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(mutableAggBufferOffset, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0) +buffer.setLong(mutableAggBufferOffsetPlus3, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(mutableAggBufferOffset) + var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1) + var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2) + var count = buffer.getLong(mutableAggBufferOffsetPlus3) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(mutableAggBufferOffset, xAvg) + buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg) + buffer.setDouble(mutableAggBufferOffsetPlus2, Ck) +
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49300964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) --- End diff -- ImplicitCastInputTypes should do that for us. I added another test for integer input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170558530 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170563859 **[Test build #49146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49146/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170550776 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170550777 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49128/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170550666 **[Test build #49128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49128/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170576801 **[Test build #49146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49146/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170576891 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170576895 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49146/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170614439 @davies please see if the updates are good for you. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170611553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49150/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170611550 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170611002 **[Test build #49150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49150/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170578055 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-170578993 **[Test build #49150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49150/consoleFull)** for PR 10029 at commit [`45d9f0c`](https://github.com/apache/spark/commit/45d9f0c12d9b461460b9c38899fbbd7bb089460e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, --- End diff -- These two do need to be passed in, see `CentralMomentAgg` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217997 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1 + val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2 + val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1 + val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2 + val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(mutableAggBufferOffset, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0) +buffer.setLong(mutableAggBufferOffsetPlus3, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(mutableAggBufferOffset) + var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1) + var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2) + var count = buffer.getLong(mutableAggBufferOffsetPlus3) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(mutableAggBufferOffset, xAvg) + buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg) + buffer.setDouble(mutableAggBufferOffsetPlus2, Ck) +
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49216499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -309,6 +309,46 @@ object functions extends LegacyFunctions { countDistinct(Column(columnName), columnNames.map(Column.apply) : _*) /** + * Aggregate function: returns the population covariance for two columns. + * + * @group agg_funcs + * @since 1.6.0 --- End diff -- 2.0.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217078 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false --- End diff -- Should this be true ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType --- End diff -- Should this support all NumericType? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217530 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) --- End diff -- Should this support all NumericType? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49217743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1 --- End diff -- Can we have a better name for them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49260274 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int, +inputAggBufferOffset: Int) + extends ImperativeAggregate with Serializable { + + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val mutableAggBufferOffsetPlus1 = mutableAggBufferOffset + 1 + val mutableAggBufferOffsetPlus2 = mutableAggBufferOffset + 2 + val mutableAggBufferOffsetPlus3 = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputAggBufferOffsetPlus1 = inputAggBufferOffset + 1 + val inputAggBufferOffsetPlus2 = inputAggBufferOffset + 2 + val inputAggBufferOffsetPlus3 = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(mutableAggBufferOffset, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus1, 0.0) +buffer.setDouble(mutableAggBufferOffsetPlus2, 0.0) +buffer.setLong(mutableAggBufferOffsetPlus3, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(mutableAggBufferOffset) + var yAvg = buffer.getDouble(mutableAggBufferOffsetPlus1) + var Ck = buffer.getDouble(mutableAggBufferOffsetPlus2) + var count = buffer.getLong(mutableAggBufferOffsetPlus3) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(mutableAggBufferOffset, xAvg) + buffer.setDouble(mutableAggBufferOffsetPlus1, yAvg) + buffer.setDouble(mutableAggBufferOffsetPlus2, Ck) +
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160417691 **[Test build #46849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46849/consoleFull)** for PR 10029 at commit [`5a419f6`](https://github.com/apache/spark/commit/5a419f6747b6929b67614b670918106e5a18dceb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160417840 **[Test build #46849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46849/consoleFull)** for PR 10029 at commit [`5a419f6`](https://github.com/apache/spark/commit/5a419f6747b6929b67614b670918106e5a18dceb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `abstract class Covariance(`\n * `case class CovSample(`\n * `case class CovPopulation(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160417843 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46849/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160417842 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160418449 **[Test build #46851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46851/consoleFull)** for PR 10029 at commit [`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160421565 **[Test build #46851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46851/consoleFull)** for PR 10029 at commit [`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `abstract class Covariance(`\n * `case class CovSample(`\n * `case class CovPopulation(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160421583 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160421584 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46851/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160421665 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160422208 **[Test build #46853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46853/consoleFull)** for PR 10029 at commit [`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160433449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46853/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160433448 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-160433418 **[Test build #46853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46853/consoleFull)** for PR 10029 at commit [`f0c1838`](https://github.com/apache/spark/commit/f0c18380c29a2ebf3b18ec959f1b0854b4dbefbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `abstract class Covariance(`\n * `case class CovSample(`\n * `case class CovPopulation(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org