[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17471 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78851/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r124716883 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocksFailed.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle.protocol; + +import com.google.common.base.Objects; +import io.netty.buffer.ByteBuf; + +// Needed by ScalaDoc. See SPARK-7726 +import static org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type; + +/** + * This message is responded from shuffle service when client failed to "open blocks" due to + * some reason(e.g. the shuffle service is suffering from high memory cost). + */ +public class OpenBlocksFailed extends BlockTransferMessage { + + public final int reason; + + public OpenBlocksFailed(int reason) { +this.reason = reason; + } + + @Override + protected Type type() { return Type.OPEN_BLOCKS_FAILED; } + + @Override + public int hashCode() { +return Objects.hashCode(reason); + } + + public String toString() { +String reasonStr = null; +switch (reason) { + case 1: +reasonStr = "shuffle service is suffering high memory cost"; +break; + default: +reasonStr = "unknown"; +break; +} +return Objects.toStringHelper(this) + .add("reason", reasonStr) + .toString(); + } + + @Override + public boolean equals(Object other) { +if (other != null && other instanceof OpenBlocksFailed) { + OpenBlocksFailed o = (OpenBlocksFailed) other; + return Objects.equal(reason, o.reason); --- End diff -- nit: `this.reason == o.reason`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17471 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17471 **[Test build #78851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78851/testReport)** for PR 17471 at commit [`6b94c2b`](https://github.com/apache/spark/commit/6b94c2b05adb26715087af778557934648a58b01). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18458 **[Test build #78869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78869/testReport)** for PR 18458 at commit [`c47b3a2`](https://github.com/apache/spark/commit/c47b3a249b51ab093181eaa82d965d6787176778). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18463 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78868/testReport)** for PR 18463 at commit [`466325d`](https://github.com/apache/spark/commit/466325d3fd353668583f3bde38ae490d9db0b189). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18463 retest this plesae --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78867/testReport)** for PR 18463 at commit [`466325d`](https://github.com/apache/spark/commit/466325d3fd353668583f3bde38ae490d9db0b189). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18458#discussion_r124716253 --- Diff: R/pkg/R/functions.R --- @@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x = "character"), column(jc) }) -#' from_json -#' -#' Parses a column containing a JSON string into a Column of \code{structType} with the specified -#' \code{schema} or array of \code{structType} if \code{as.json.array} is set to \code{TRUE}. -#' If the string is unparseable, the Column will contains the value NA. +#' @details +#' \code{from_json}: Parses a column containing a JSON string into a Column of \code{structType} +#' with the specified \code{schema} or array of \code{structType} if \code{as.json.array} is set +#' to \code{TRUE}. If the string is unparseable, the Column will contains the value NA. #' -#' @param x Column containing the JSON string. +#' @rdname column_collection_functions #' @param schema a structType object to use as the schema to use when parsing the JSON string. #' @param as.json.array indicating if input string is JSON array of objects or a single object. -#' @param ... additional named properties to control how the json is parsed, accepts the same -#'options as the JSON data source. -#' -#' @family non-aggregate functions -#' @rdname from_json -#' @name from_json -#' @aliases from_json,Column,structType-method +#' @aliases from_json from_json,Column,structType-method #' @export #' @examples +#' #' \dontrun{ -#' schema <- structType(structField("name", "string"), -#' select(df, from_json(df$value, schema, dateFormat = "dd/MM/")) -#'} +#' df2 <- sql("SELECT named_struct('name', 'Bob') as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' schema <- structType(structField("name", "string")) +#' head(select(df2, from_json(df2$people_json, schema)))} --- End diff -- Thanks for catching this. Added an example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r124715952 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -90,16 +96,28 @@ protected void handleMessage( try { OpenBlocks msg = (OpenBlocks) msgObj; checkAuth(client, msg.appId); -long streamId = streamManager.registerStream(client.getClientId(), - new ManagedBufferIterator(msg.appId, msg.execId, msg.blockIds)); -if (logger.isTraceEnabled()) { - logger.trace("Registered streamId {} with {} buffers for client {} from host {}", - streamId, - msg.blockIds.length, - client.getClientId(), - getRemoteAddress(client.getChannel())); +// Return OpenBlocksFailed when memory usage is above the water mark. +long usage = memoryUsage.getMemoryUsage(); +if (usage > memWaterMark) { + logger.warn("Memory usage({}) is above water mark({}), rejecting 'open blocks' request " + +"from client({}, {}).", usage, memWaterMark, client.getClientId(), +client.getSocketAddress()); + callback.onSuccess(new OpenBlocksFailed(1).toByteBuffer()); +} else { + logger.trace("Memory usage({}) is under water mark({}), accepting 'open blocks' " + +"request from client({}, {}).", usage, memWaterMark, client.getClientId(), +client.getSocketAddress()); + long streamId = streamManager.registerStream(client.getClientId(), +new ManagedBufferIterator(msg.appId, msg.execId, msg.blockIds)); + if (logger.isTraceEnabled()) { +logger.trace("Registered streamId {} with {} buffers for client {} from host {}", --- End diff -- shall we merge this and the above log into one log entry? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18463 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78866/testReport)** for PR 18463 at commit [`466325d`](https://github.com/apache/spark/commit/466325d3fd353668583f3bde38ae490d9db0b189). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r124715302 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -257,4 +257,31 @@ public Properties cryptoConf() { return CryptoUtils.toCryptoConf("spark.network.crypto.config.", conf.getAll()); } + /** + * When memory usage of Netty is above this water mark, it's regarded as memory shortage. --- End diff -- do we have a config for shuffle service JVM heap size? maybe we can use that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18450: [SPARK-21238][SQL] allow nested SQL execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18450 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18450: [SPARK-21238][SQL] allow nested SQL execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78853/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r124715080 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/PooledByteBufAllocatorWithMetrics.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.util; + +import java.util.Iterator; +import java.util.List; + +import io.netty.buffer.PoolArenaMetric; +import io.netty.buffer.PoolChunkListMetric; +import io.netty.buffer.PoolChunkMetric; +import io.netty.buffer.PooledByteBufAllocator; + +/** + * A {@link PooledByteBufAllocator} providing some metrics. + */ +public class PooledByteBufAllocatorWithMetrics extends PooledByteBufAllocator { + + public PooledByteBufAllocatorWithMetrics( + boolean preferDirect, + int nHeapArena, + int nDirectArena, + int pageSize, + int maxOrder, + int tinyCacheSize, + int smallCacheSize, + int normalCacheSize) { +super(preferDirect, nHeapArena, nDirectArena, pageSize, maxOrder, tinyCacheSize, + smallCacheSize, normalCacheSize); + } + + public long offHeapUsage() { +return sumOfMetrics(directArenas()); + } + + public long onHeapUsage() { +return sumOfMetrics(heapArenas()); + } + + private long sumOfMetrics(List metrics) { +long sum = 0; +for (int i = 0; i < metrics.size(); i++) { + PoolArenaMetric metric = metrics.get(i); --- End diff -- nit: it's better to use `Iterator` pattern here, as the input list may not be an indexed list and `list.get(i)` becomes `O(n)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18458#discussion_r124715019 --- Diff: R/pkg/R/functions.R --- @@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x = "character"), column(jc) }) -#' from_json -#' -#' Parses a column containing a JSON string into a Column of \code{structType} with the specified -#' \code{schema} or array of \code{structType} if \code{as.json.array} is set to \code{TRUE}. -#' If the string is unparseable, the Column will contains the value NA. +#' @details +#' \code{from_json}: Parses a column containing a JSON string into a Column of \code{structType} +#' with the specified \code{schema} or array of \code{structType} if \code{as.json.array} is set +#' to \code{TRUE}. If the string is unparseable, the Column will contains the value NA. --- End diff -- Corrected the typo. Will consider updating `null` & `NA` in the future :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18450: [SPARK-21238][SQL] allow nested SQL execution
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18450 **[Test build #78853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78853/testReport)** for PR 18450 at commit [`f8e9901`](https://github.com/apache/spark/commit/f8e99013dffeffc2bfe37624b84dbf9736fed8b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78863/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78863/testReport)** for PR 18463 at commit [`488c287`](https://github.com/apache/spark/commit/488c2871e4589f1a469cff2dba1e962173eaf910). * This patch **fails due to an unknown error code, -10**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124714544 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -267,10 +298,111 @@ class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext { val df = df1.join(broadcast(df2), "key") testSparkPlanMetrics(df, 2, Map( 1L -> ("BroadcastHashJoin", Map( -"number of output rows" -> 2L))) +"number of output rows" -> 2L, +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)"))) ) } + test("BroadcastHashJoin metrics: track avg probe") { +// The executed plan looks like: +// Project [a#210, b#211, b#221] +// +- BroadcastHashJoin [a#210], [a#220], Inner, BuildRight +//:- Project [_1#207 AS a#210, _2#208 AS b#211] +//: +- Filter isnotnull(_1#207) +//: +- LocalTableScan [_1#207, _2#208] +//+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, binary, true])) +// +- Project [_1#217 AS a#220, _2#218 AS b#221] +// +- Filter isnotnull(_1#217) +// +- LocalTableScan [_1#217, _2#218] +// +// Assume the execution plan is +// WholeStageCodegen disabled: +// ... -> BroadcastHashJoin(nodeId = 1) -> Project(nodeId = 0) +// +// WholeStageCodegen enabled: +// ... -> +// WholeStageCodegen(nodeId = 0, Filter(nodeId = 4) -> Project(nodeId = 3) -> --- End diff -- can you format it a little bit? to indicate that we only have a `WholeStageCodegen`, all other plans are the inner children of `WholeStageCodegen`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18448: [SPARK-20889][SparkR] Grouped documentation for MISC col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18448 **[Test build #78865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78865/testReport)** for PR 18448 at commit [`ff27f18`](https://github.com/apache/spark/commit/ff27f182b9055511d2fef59c6d66e113fcbef535). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18301 @viirya ok let's add it back --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124714354 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -367,6 +367,22 @@ class TungstenAggregationIterator( } } + TaskContext.get().addTaskCompletionListener(_ => { +// At the end of the task, update the task's peak memory usage. Since we destroy +// the map to create the sorter, their memory usages should not overlap, so it is safe +// to just use the max of the two. +val mapMemory = hashMap.getPeakMemoryUsedBytes +val sorterMemory = Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L) +val maxMemory = Math.max(mapMemory, sorterMemory) +val metrics = TaskContext.get().taskMetrics() +peakMemory += maxMemory +spillSize += metrics.memoryBytesSpilled - spillSizeBefore +metrics.incPeakExecutionMemory(maxMemory) --- End diff -- makes sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78859/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18448: [SPARK-20889][SparkR] Grouped documentation for M...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18448#discussion_r124714226 --- Diff: R/pkg/R/functions.R --- @@ -132,6 +132,27 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL +#' Miscellaneous functions for Column operations +#' +#' Miscellaneous functions defined for \code{Column}. +#' +#' @param x Column to compute on. In \code{sha2}, it is one of 224, 256, 384, or 512. +#' @param y Column to compute on. +#' @param ... additional columns. --- End diff -- updated now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18301 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18301 **[Test build #78859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78859/testReport)** for PR 18301 at commit [`9cbd627`](https://github.com/apache/spark/commit/9cbd627bed6279550a85aaf1d596f22c6b69bfc6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18448: [SPARK-20889][SparkR] Grouped documentation for M...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18448#discussion_r124714065 --- Diff: R/pkg/R/functions.R --- @@ -132,6 +132,27 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL +#' Miscellaneous functions for Column operations +#' +#' Miscellaneous functions defined for \code{Column}. +#' +#' @param x Column to compute on. In \code{sha2}, it is one of 224, 256, 384, or 512. +#' @param y Column to compute on. --- End diff -- I think roxygen automatically chooses the order of the arguments based on the order they appear in the file, and ignores the order we specify. So even if I move `y` before `x` here, in the generated doc, `x` will still appear before `y`. Indeed, as you can see from the screenshot, `...` appears before `y`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78861/testReport)** for PR 18463 at commit [`86bfa22`](https://github.com/apache/spark/commit/86bfa22d1f8d46e75dcc5f9085b7976365bc0e8f). * This patch **fails due to an unknown error code, -10**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78861/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78864/testReport)** for PR 18405 at commit [`0163e04`](https://github.com/apache/spark/commit/0163e04e9a5705fe963bad764704e6828161b374). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78860/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78860/testReport)** for PR 18463 at commit [`5d5b390`](https://github.com/apache/spark/commit/5d5b39077d49225df2603217dea7e8d978a22a76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18416#discussion_r124712549 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -339,6 +340,28 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { LHMapClass(LHMap(1 -> 2)) -> LHMap("test" -> MapClass(Map(3 -> 4 } + test("arbitrary sets") { --- End diff -- Added a test for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18416#discussion_r124712535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -834,6 +834,140 @@ case class CollectObjectsToMap private( } } +object CollectObjectsToSet { + private val curId = new java.util.concurrent.atomic.AtomicInteger() + + /** + * Construct an instance of CollectObjectsToSet case class. + * + * @param function The function applied on the collection elements. + * @param inputData An expression that when evaluated returns a collection object. + * @param collClass The type of the resulting collection. + */ + def apply( + function: Expression => Expression, + inputData: Expression, + collClass: Class[_]): CollectObjectsToSet = { +val id = curId.getAndIncrement() +val loopValue = s"CollectObjectsToSet_loopValue$id" +val loopIsNull = s"CollectObjectsToSet_loopIsNull$id" +val arrayType = inputData.dataType.asInstanceOf[ArrayType] +val loopVar = LambdaVariable(loopValue, loopIsNull, arrayType.elementType) +CollectObjectsToSet( + loopValue, loopIsNull, function(loopVar), inputData, collClass) + } +} + +/** + * Expression used to convert a Catalyst Array to an external Scala `Set`. + * The collection is constructed using the associated builder, obtained by calling `newBuilder` + * on the collection's companion object. + * + * Notice that when we convert a Catalyst array which contains duplicated elements to an external + * Scala `Set`, the elements will be de-duplicated. + * + * @param loopValue the name of the loop variable that is used when iterating over the value + * collection, and which is used as input for the `lambdaFunction` + * @param loopIsNull the nullability of the loop variable that is used when iterating over + *the value collection, and which is used as input for the + *`lambdaFunction` + * @param lmbdaFunction A function that takes the `loopValue` as input, and is used as + *a lambda function to handle collection elements. + * @param inputData An expression that when evaluated returns an array object. + * @param collClass The type of the resulting collection. + */ +case class CollectObjectsToSet private( +loopValue: String, +loopIsNull: String, +lambdaFunction: Expression, +inputData: Expression, +collClass: Class[_]) extends Expression with NonSQLExpression { + + override def nullable: Boolean = inputData.nullable + + override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil + + override def eval(input: InternalRow): Any = +throw new UnsupportedOperationException("Only code-generated evaluation is supported") + + override def dataType: DataType = ObjectType(collClass) + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +// The data with PythonUserDefinedType are actually stored with the data type of its sqlType. +def inputDataType(dataType: DataType) = dataType match { + case p: PythonUserDefinedType => p.sqlType + case _ => dataType +} + +val arrayType = inputDataType(inputData.dataType).asInstanceOf[ArrayType] +val loopValueJavaType = ctx.javaType(arrayType.elementType) +ctx.addMutableState("boolean", loopIsNull, "") +ctx.addMutableState(loopValueJavaType, loopValue, "") +val genFunction = lambdaFunction.genCode(ctx) + +val genInputData = inputData.genCode(ctx) +val dataLength = ctx.freshName("dataLength") +val loopIndex = ctx.freshName("loopIndex") +val builderValue = ctx.freshName("builderValue") + +val getLength = s"${genInputData.value}.numElements()" +val getLoopVar = ctx.getValue(genInputData.value, arrayType.elementType, loopIndex) + +// Make a copy of the data if it's unsafe-backed +def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) = + s"$value instanceof ${clazz.getSimpleName}? $value.copy() : $value" +val genFunctionValue = + lambdaFunction.dataType match { +case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value) +case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value) +case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value) +case _ => genFunction.value + } + +val loopNullCheck = s"$loopIsNull = ${genInputData.value}.isNullAt($loopIndex);"
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18301 @rxin I just revert it in previous commits. @cloud-fan should I revert it back? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78863/testReport)** for PR 18463 at commit [`488c287`](https://github.com/apache/spark/commit/488c2871e4589f1a469cff2dba1e962173eaf910). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 hey i didn't track super closely, but it is pretty important to show at least one more digit, e.g. 1.7, rather than just 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18301 **[Test build #78862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78862/testReport)** for PR 18301 at commit [`9a048f8`](https://github.com/apache/spark/commit/9a048f817b9a6499a64778c13141c9bc320cf2ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78861/testReport)** for PR 18463 at commit [`86bfa22`](https://github.com/apache/spark/commit/86bfa22d1f8d46e75dcc5f9085b7976365bc0e8f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78860/testReport)** for PR 18463 at commit [`5d5b390`](https://github.com/apache/spark/commit/5d5b39077d49225df2603217dea7e8d978a22a76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78857/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78857/testReport)** for PR 16028 at commit [`d84bb21`](https://github.com/apache/spark/commit/d84bb214908aea84421133958762bbf2a3e4f7d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78852/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124710724 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -163,29 +178,45 @@ class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext { val df2 = testData2.groupBy('a).count() val expected2 = Seq( Map("number of output rows" -> 4L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)"), +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)"), Map("number of output rows" -> 3L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)")) +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)")) testSparkPlanMetrics(df2, 1, Map( 2L -> ("HashAggregate", expected2(0)), 0L -> ("HashAggregate", expected2(1))) ) } test("Aggregate metrics: track avg probe") { -val random = new Random() -val manyBytes = (0 until 65535).map { _ => - val byteArrSize = random.nextInt(100) - val bytes = new Array[Byte](byteArrSize) - random.nextBytes(bytes) - (bytes, random.nextInt(100)) -} -val df = manyBytes.toSeq.toDF("a", "b").repartition(1).groupBy('a).count() -val metrics = getSparkPlanMetrics(df, 1, Set(2L, 0L)).get -Seq(metrics(2L)._2("avg hashmap probe (min, med, max)"), -metrics(0L)._2("avg hashmap probe (min, med, max)")).foreach { probes => - probes.toString.stripPrefix("\n(").stripSuffix(")").split(", ").foreach { probe => -assert(probe.toInt > 1) +// The executed plan looks like: +// HashAggregate(keys=[a#61], functions=[count(1)], output=[a#61, count#71L]) +// +- Exchange hashpartitioning(a#61, 5) +//+- HashAggregate(keys=[a#61], functions=[partial_count(1)], output=[a#61, count#76L]) +// +- Exchange RoundRobinPartitioning(1) +// +- LocalTableScan [a#61] +// +// Assume the execution plan is: +// Wholestage disabled: +// LocalTableScan(nodeId = 4) ->Exchange (nodeId = 3) -> HashAggregate(nodeId = 2) -> --- End diff -- I attached the tree string. This doc is used to show `nodeId` relations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78852/testReport)** for PR 18405 at commit [`c998374`](https://github.com/apache/spark/commit/c998374cf68e4f8520b9b29fd40c3a4b652dbdb8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124710649 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -367,6 +367,22 @@ class TungstenAggregationIterator( } } + TaskContext.get().addTaskCompletionListener(_ => { +// At the end of the task, update the task's peak memory usage. Since we destroy +// the map to create the sorter, their memory usages should not overlap, so it is safe +// to just use the max of the two. +val mapMemory = hashMap.getPeakMemoryUsedBytes +val sorterMemory = Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L) +val maxMemory = Math.max(mapMemory, sorterMemory) +val metrics = TaskContext.get().taskMetrics() +peakMemory += maxMemory +spillSize += metrics.memoryBytesSpilled - spillSizeBefore +metrics.incPeakExecutionMemory(maxMemory) --- End diff -- hmm, the description of `peakExecutionMemory` in `TaskMetrics` is: ...The value of this accumulator should be approximately the sum of the peak sizes across all such data structures created in this task... So it is designed to get the sum of peak memory of operators in the task. I think because the operators are not ran in sequence but in an iterator way, it's reasonable to sum the peak memory. Although the peak points of the operators might not at the same moment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18463 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78858/testReport)** for PR 18463 at commit [`5d5b390`](https://github.com/apache/spark/commit/5d5b39077d49225df2603217dea7e8d978a22a76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18463 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18435: [SPARK-21225][CORE] Considering CPUS_PER_TASK when alloc...
Github user JackYangzg commented on the issue: https://github.com/apache/spark/pull/18435 @jerryshao Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124709997 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -163,29 +178,45 @@ class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext { val df2 = testData2.groupBy('a).count() val expected2 = Seq( Map("number of output rows" -> 4L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)"), +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)"), Map("number of output rows" -> 3L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)")) +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)")) testSparkPlanMetrics(df2, 1, Map( 2L -> ("HashAggregate", expected2(0)), 0L -> ("HashAggregate", expected2(1))) ) } test("Aggregate metrics: track avg probe") { -val random = new Random() -val manyBytes = (0 until 65535).map { _ => - val byteArrSize = random.nextInt(100) - val bytes = new Array[Byte](byteArrSize) - random.nextBytes(bytes) - (bytes, random.nextInt(100)) -} -val df = manyBytes.toSeq.toDF("a", "b").repartition(1).groupBy('a).count() -val metrics = getSparkPlanMetrics(df, 1, Set(2L, 0L)).get -Seq(metrics(2L)._2("avg hashmap probe (min, med, max)"), -metrics(0L)._2("avg hashmap probe (min, med, max)")).foreach { probes => - probes.toString.stripPrefix("\n(").stripSuffix(")").split(", ").foreach { probe => -assert(probe.toInt > 1) +// The executed plan looks like: +// HashAggregate(keys=[a#61], functions=[count(1)], output=[a#61, count#71L]) +// +- Exchange hashpartitioning(a#61, 5) +//+- HashAggregate(keys=[a#61], functions=[partial_count(1)], output=[a#61, count#76L]) +// +- Exchange RoundRobinPartitioning(1) +// +- LocalTableScan [a#61] +// +// Assume the execution plan is: +// Wholestage disabled: +// LocalTableScan(nodeId = 4) ->Exchange (nodeId = 3) -> HashAggregate(nodeId = 2) -> --- End diff -- tree string here please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124710004 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -163,29 +178,45 @@ class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext { val df2 = testData2.groupBy('a).count() val expected2 = Seq( Map("number of output rows" -> 4L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)"), +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)"), Map("number of output rows" -> 3L, -"avg hashmap probe (min, med, max)" -> "\n(1, 1, 1)")) +"avg hash probe (min, med, max)" -> "\n(1, 1, 1)")) testSparkPlanMetrics(df2, 1, Map( 2L -> ("HashAggregate", expected2(0)), 0L -> ("HashAggregate", expected2(1))) ) } test("Aggregate metrics: track avg probe") { -val random = new Random() -val manyBytes = (0 until 65535).map { _ => - val byteArrSize = random.nextInt(100) - val bytes = new Array[Byte](byteArrSize) - random.nextBytes(bytes) - (bytes, random.nextInt(100)) -} -val df = manyBytes.toSeq.toDF("a", "b").repartition(1).groupBy('a).count() -val metrics = getSparkPlanMetrics(df, 1, Set(2L, 0L)).get -Seq(metrics(2L)._2("avg hashmap probe (min, med, max)"), -metrics(0L)._2("avg hashmap probe (min, med, max)")).foreach { probes => - probes.toString.stripPrefix("\n(").stripSuffix(")").split(", ").foreach { probe => -assert(probe.toInt > 1) +// The executed plan looks like: +// HashAggregate(keys=[a#61], functions=[count(1)], output=[a#61, count#71L]) +// +- Exchange hashpartitioning(a#61, 5) +//+- HashAggregate(keys=[a#61], functions=[partial_count(1)], output=[a#61, count#76L]) +// +- Exchange RoundRobinPartitioning(1) +// +- LocalTableScan [a#61] +// +// Assume the execution plan is: +// Wholestage disabled: +// LocalTableScan(nodeId = 4) ->Exchange (nodeId = 3) -> HashAggregate(nodeId = 2) -> +// Exchange(nodeId = 1) -> HashAggregate(nodeId = 0) +// +// Wholestage enabled: +// LocalTableScan(nodeId = 6) -> Exchange(nodeId = 5) -> --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124709371 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -367,6 +367,22 @@ class TungstenAggregationIterator( } } + TaskContext.get().addTaskCompletionListener(_ => { +// At the end of the task, update the task's peak memory usage. Since we destroy +// the map to create the sorter, their memory usages should not overlap, so it is safe +// to just use the max of the two. +val mapMemory = hashMap.getPeakMemoryUsedBytes +val sorterMemory = Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L) +val maxMemory = Math.max(mapMemory, sorterMemory) +val metrics = TaskContext.get().taskMetrics() +peakMemory += maxMemory +spillSize += metrics.memoryBytesSpilled - spillSizeBefore +metrics.incPeakExecutionMemory(maxMemory) --- End diff -- not related to this PR, but `peakMemory` should pick the max memory usage among the operators in one task, instead of accumulating them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124709196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -367,6 +367,22 @@ class TungstenAggregationIterator( } } + TaskContext.get().addTaskCompletionListener(_ => { +// At the end of the task, update the task's peak memory usage. Since we destroy +// the map to create the sorter, their memory usages should not overlap, so it is safe +// to just use the max of the two. +val mapMemory = hashMap.getPeakMemoryUsedBytes +val sorterMemory = Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L) +val maxMemory = Math.max(mapMemory, sorterMemory) +val metrics = TaskContext.get().taskMetrics() +peakMemory += maxMemory +spillSize += metrics.memoryBytesSpilled - spillSizeBefore --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r124709173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -367,6 +367,22 @@ class TungstenAggregationIterator( } } + TaskContext.get().addTaskCompletionListener(_ => { +// At the end of the task, update the task's peak memory usage. Since we destroy +// the map to create the sorter, their memory usages should not overlap, so it is safe +// to just use the max of the two. +val mapMemory = hashMap.getPeakMemoryUsedBytes +val sorterMemory = Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L) +val maxMemory = Math.max(mapMemory, sorterMemory) +val metrics = TaskContext.get().taskMetrics() +peakMemory += maxMemory --- End diff -- nit: it's more clear to call `set` here, instead of `+=` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18458 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18458 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/13873 Hmm, I guess we need #16056 to fix nullability of `StaticInvoke`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18458 **[Test build #78856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78856/testReport)** for PR 18458 at commit [`664629d`](https://github.com/apache/spark/commit/664629dab0150d4db2ea7fcdc63d35f6694bad7f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18448: [SPARK-20889][SparkR] Grouped documentation for MISC col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18448 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18448: [SPARK-20889][SparkR] Grouped documentation for MISC col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18448 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18422: [SPARK-20889][SparkR] Grouped documentation for NONAGGRE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18422 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18448: [SPARK-20889][SparkR] Grouped documentation for MISC col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18448 **[Test build #78854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78854/testReport)** for PR 18448 at commit [`203be11`](https://github.com/apache/spark/commit/203be118cd7bc8a4a919150cad5ac086f4c55c6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18422: [SPARK-20889][SparkR] Grouped documentation for NONAGGRE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78855/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18422: [SPARK-20889][SparkR] Grouped documentation for NONAGGRE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18422 **[Test build #78855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78855/testReport)** for PR 18422 at commit [`aff832e`](https://github.com/apache/spark/commit/aff832ef95192532f161fa89cb9f49a7cc1d2d08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 @jerryshao actually, this threading issue cause an infinite loop when we restart historyserver and replaying event logs of spark apps. you can see the jstack log in attachments of SPARK-21223. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18301 **[Test build #78859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78859/testReport)** for PR 18301 at commit [`9cbd627`](https://github.com/apache/spark/commit/9cbd627bed6279550a85aaf1d596f22c6b69bfc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18463 **[Test build #78858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78858/testReport)** for PR 18463 at commit [`5d5b390`](https://github.com/apache/spark/commit/5d5b39077d49225df2603217dea7e8d978a22a76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18458#discussion_r124706483 --- Diff: R/pkg/R/functions.R --- @@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x = "character"), column(jc) }) -#' from_json -#' -#' Parses a column containing a JSON string into a Column of \code{structType} with the specified -#' \code{schema} or array of \code{structType} if \code{as.json.array} is set to \code{TRUE}. -#' If the string is unparseable, the Column will contains the value NA. +#' @details +#' \code{from_json}: Parses a column containing a JSON string into a Column of \code{structType} +#' with the specified \code{schema} or array of \code{structType} if \code{as.json.array} is set +#' to \code{TRUE}. If the string is unparseable, the Column will contains the value NA. #' -#' @param x Column containing the JSON string. +#' @rdname column_collection_functions #' @param schema a structType object to use as the schema to use when parsing the JSON string. #' @param as.json.array indicating if input string is JSON array of objects or a single object. -#' @param ... additional named properties to control how the json is parsed, accepts the same -#'options as the JSON data source. -#' -#' @family non-aggregate functions -#' @rdname from_json -#' @name from_json -#' @aliases from_json,Column,structType-method +#' @aliases from_json from_json,Column,structType-method #' @export #' @examples +#' #' \dontrun{ -#' schema <- structType(structField("name", "string"), -#' select(df, from_json(df$value, schema, dateFormat = "dd/MM/")) -#'} +#' df2 <- sql("SELECT named_struct('name', 'Bob') as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' schema <- structType(structField("name", "string")) +#' head(select(df2, from_json(df2$people_json, schema)))} --- End diff -- I think it's worthwhile to keep `dateFormat = "dd/MM/")` in the example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18458#discussion_r124706890 --- Diff: R/pkg/R/functions.R --- @@ -132,6 +132,35 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL +#' Collection functions for Column operations +#' +#' Collection functions defined for \code{Column}. +#' +#' @param x Column to compute on. Note the difference in the following methods: +#' \itemize{ +#' \item \code{to_json}: it is the column containing the struct or array of the structs. +#' \item \code{from_json}: it is the column containing the JSON string. +#' } +#' @param ... additional argument(s). In \code{to_json} and \code{from_json}, this contains +#'additional named properties to control how it is converted, accepts the same +#'options as the JSON data source. +#' @name column_collection_functions +#' @rdname column_collection_functions +#' @family collection functions +#' @examples +#' \dontrun{ +#' # Dataframe used throughout this doc +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +#' tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp)) +#' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1))) +#' tmp2 <- mutate(tmp, v2 = explode(tmp$v1)) +#' head(tmp2) +#' head(select(tmp, posexplode(tmp$v1))) +#' head(select(tmp, sort_array(tmp$v1))) +#' head(select(tmp, sort_array(tmp$v1, FALSE)))} --- End diff -- nit, let's improve this? I think in sort_array we could be more clear, eg. `sort_array(tmp$v1, asc = FALSE)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18458#discussion_r124706681 --- Diff: R/pkg/R/functions.R --- @@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x = "character"), column(jc) }) -#' from_json -#' -#' Parses a column containing a JSON string into a Column of \code{structType} with the specified -#' \code{schema} or array of \code{structType} if \code{as.json.array} is set to \code{TRUE}. -#' If the string is unparseable, the Column will contains the value NA. +#' @details +#' \code{from_json}: Parses a column containing a JSON string into a Column of \code{structType} +#' with the specified \code{schema} or array of \code{structType} if \code{as.json.array} is set +#' to \code{TRUE}. If the string is unparseable, the Column will contains the value NA. --- End diff -- btw, `will contains the value NA.` is very consistently documented. in this case this is right, but there are many other that says the value is `null` (note lower case) which isn't quite correct on the R side. another project? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18463: [WIP][SPARK-21093][R] Terminate R's worker processes in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18463 Apparently, I can only reproduce this with Jenkins for now. I tested them on the environments below: - CentOS Linux release 7.3.1611 (Core) / R version 3.4.0 / Java(TM) SE Runtime Environment (build 1.8.0_101-b13) - macOS 10.12.3 (16D32) / R version 3.4.0 / Java(TM) SE Runtime Environment (build 1.8.0_45-b14) - Ubuntu 14.04 LTS / R version 3.3.1 / Java(TM) SE Runtime Environment (build 1.8.0_131-b11) At least, I checked `SparkSQL functions: Spark package found ...` passes which was failed with an unknown error code, -10 - https://github.com/apache/spark/pull/18456 I tested this each 10-ish times on MacOS and CentOS and 3 times on Ubuntu but I could not reproduce this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18301 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78848/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18301 **[Test build #78848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78848/testReport)** for PR 18301 at commit [`9cbd627`](https://github.com/apache/spark/commit/9cbd627bed6279550a85aaf1d596f22c6b69bfc6). * This patch **fails due to an unknown error code, -10**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18301 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18463: [WIP][SPARK-21093][R] Terminate R's worker proces...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/18463 [WIP][SPARK-21093][R] Terminate R's worker processes in the parent of R's daemon to prevent a leak ## What changes were proposed in this pull request? This is a retry for https://github.com/apache/spark/pull/18320 ## How was this patch tested? Manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-21093-retry Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18463.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18463 commit f2c15a3241583b9d692cf48b36e62d8b84cbc5dd Author: hyukjinkwonDate: 2017-06-25T18:05:57Z [SPARK-21093][R] Terminate R's worker processes in the parent of R's daemon to prevent a leak ## What changes were proposed in this pull request? `mcfork` in R looks opening a pipe ahead but the existing logic does not properly close it when it is executed hot. This leads to the failure of more forking due to the limit for number of files open. This hot execution looks particularly for `gapply`/`gapplyCollect`. For unknown reason, this happens more easily in CentOS and could be reproduced in Mac too. All the details are described in https://issues.apache.org/jira/browse/SPARK-21093 This PR proposes simply to terminate R's worker processes in the parent of R's daemon to prevent a leak. ## How was this patch tested? I ran the codes below on both CentOS and Mac with that configuration disabled/enabled. ```r df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", function(key, x) { x }, schema(df))) collect(gapply(df, "a", function(key, x) { x }, schema(df))) ... # 30 times ``` Also, now it passes R tests on CentOS as below: ``` SparkSQL functions: Spark package found in SPARK_HOME: .../spark .. .. .. .. .. ``` Author: hyukjinkwon Closes #18320 from HyukjinKwon/SPARK-21093. commit 9e907cbaa6d6b65e09008181b61747ffcb67d5d0 Author: hyukjinkwon Date: 2017-06-29T03:46:12Z Disable Scala/Python tests for debugging and print everything --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78857/testReport)** for PR 16028 at commit [`d84bb21`](https://github.com/apache/spark/commit/d84bb214908aea84421133958762bbf2a3e4f7d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18448: [SPARK-20889][SparkR] Grouped documentation for M...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18448#discussion_r124706110 --- Diff: R/pkg/R/functions.R --- @@ -132,6 +132,27 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL +#' Miscellaneous functions for Column operations +#' +#' Miscellaneous functions defined for \code{Column}. +#' +#' @param x Column to compute on. In \code{sha2}, it is one of 224, 256, 384, or 512. +#' @param y Column to compute on. --- End diff -- probably not own in this PR ... since `y` always go first, should we flip this order I think? list y first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13873 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18448: [SPARK-20889][SparkR] Grouped documentation for M...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18448#discussion_r124706139 --- Diff: R/pkg/R/functions.R --- @@ -132,6 +132,27 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL +#' Miscellaneous functions for Column operations +#' +#' Miscellaneous functions defined for \code{Column}. +#' +#' @param x Column to compute on. In \code{sha2}, it is one of 224, 256, 384, or 512. +#' @param y Column to compute on. +#' @param ... additional columns. --- End diff -- nit: capital `Columns` to indicate type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13873 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78849/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13873 **[Test build #78849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78849/testReport)** for PR 13873 at commit [`306b283`](https://github.com/apache/spark/commit/306b283f457f5718a152853df3aa854f7fba8ac2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124705885 --- Diff: R/pkg/R/functions.R --- @@ -132,23 +132,40 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL -#' lit +#' Non-aggregate functions for Column operations #' -#' A new \linkS4class{Column} is created to represent the literal value. -#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' Non-aggregate functions defined for \code{Column}. #' -#' @param x a literal value or a Column. +#' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. +#' In \code{monotonically_increasing_id}, it should be empty. +#' @param y Column to compute on. +#' @param ... additional argument(s). In \code{expr}, it contains an expression character --- End diff -- and so in all other cases in this group, `...` is expected for other columns. perhaps we can say `additional Columns` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124705377 --- Diff: R/pkg/R/functions.R --- @@ -824,32 +835,23 @@ setMethod("initcap", column(jc) }) -#' is.nan -#' -#' Return true if the column is NaN, alias for \link{isnan} -#' -#' @param x Column to compute on. +#' @details +#' \code{is.nan}: Alias for \link{isnan}. --- End diff -- roxygen does this by text order, I think - doesn't it make this go first, before isnan? perhaps we swap the order of code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124704884 --- Diff: R/pkg/R/functions.R --- @@ -3554,21 +3493,17 @@ setMethod("grouping_id", column(jc) }) -#' input_file_name -#' -#' Creates a string column with the input file name for a given row +#' @details +#' \code{input_file_name}: Creates a string column with the input file name for a given row. #' -#' @rdname input_file_name -#' @name input_file_name -#' @family non-aggregate functions -#' @aliases input_file_name,missing-method +#' @rdname column_nonaggregate_functions +#' @aliases input_file_name input_file_name,missing-method #' @export #' @examples -#' \dontrun{ -#' df <- read.text("README.md") #' -#' head(select(df, input_file_name())) -#' } +#' \dontrun{ +#' tmp <- read.text("README.md") --- End diff -- why rename to `tmp` though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124705740 --- Diff: R/pkg/R/functions.R --- @@ -132,23 +132,40 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL -#' lit +#' Non-aggregate functions for Column operations #' -#' A new \linkS4class{Column} is created to represent the literal value. -#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' Non-aggregate functions defined for \code{Column}. #' -#' @param x a literal value or a Column. +#' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. +#' In \code{monotonically_increasing_id}, it should be empty. --- End diff -- and same for `input_file_name` - btw, should be empty might be a bit confusing? how about `In ..., Should be used with no argument.`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124704722 --- Diff: R/pkg/R/functions.R --- @@ -132,23 +132,40 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL -#' lit +#' Non-aggregate functions for Column operations #' -#' A new \linkS4class{Column} is created to represent the literal value. -#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' Non-aggregate functions defined for \code{Column}. #' -#' @param x a literal value or a Column. +#' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. +#' In \code{monotonically_increasing_id}, it should be empty. +#' @param y Column to compute on. +#' @param ... additional argument(s). In \code{expr}, it contains an expression character +#'object to be parsed. +#' @name column_nonaggregate_functions +#' @rdname column_nonaggregate_functions +#' @seealso coalesce,SparkDataFrame-method #' @family non-aggregate functions -#' @rdname lit -#' @name lit +#' @examples +#' \dontrun{ +#' # Dataframe used throughout this doc +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} +NULL + +#' @details +#' \code{lit}: A new \linkS4class{Column} is created to represent the literal value. --- End diff -- this format is actually kinda weird. let's fix it? I don't think we need to link to Column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124705490 --- Diff: R/pkg/R/functions.R --- @@ -132,23 +132,40 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL -#' lit +#' Non-aggregate functions for Column operations #' -#' A new \linkS4class{Column} is created to represent the literal value. -#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' Non-aggregate functions defined for \code{Column}. #' -#' @param x a literal value or a Column. +#' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. +#' In \code{monotonically_increasing_id}, it should be empty. +#' @param y Column to compute on. +#' @param ... additional argument(s). In \code{expr}, it contains an expression character +#'object to be parsed. +#' @name column_nonaggregate_functions +#' @rdname column_nonaggregate_functions +#' @seealso coalesce,SparkDataFrame-method #' @family non-aggregate functions -#' @rdname lit -#' @name lit +#' @examples +#' \dontrun{ +#' # Dataframe used throughout this doc +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} +NULL + +#' @details +#' \code{lit}: A new \linkS4class{Column} is created to represent the literal value. +#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' +#' @rdname column_nonaggregate_functions #' @export -#' @aliases lit,ANY-method +#' @aliases lit lit,ANY-method #' @examples +#' #' \dontrun{ -#' lit(df$name) -#' select(df, lit("x")) -#' select(df, lit("2015-01-01")) -#'} +#' tmp <- mutate(df, v1 = lit(df$mpg), v2 = lit("x"), v3 = lit("2015-01-01"), +#' v4 = negate(df$mpg), v5 = expr('length(model)'), +#' v6 = greatest(df$vs, df$am), v7 = least(df$vs, df$am), +#' v8 = column("mpg")) --- End diff -- is there example for ``` nanvl(df$c, x) coalesce(df$c, df$d, df$e) ``` that I've missed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18422: [SPARK-20889][SparkR] Grouped documentation for N...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18422#discussion_r124705601 --- Diff: R/pkg/R/functions.R --- @@ -132,23 +132,40 @@ NULL #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL -#' lit +#' Non-aggregate functions for Column operations #' -#' A new \linkS4class{Column} is created to represent the literal value. -#' If the parameter is a \linkS4class{Column}, it is returned unchanged. +#' Non-aggregate functions defined for \code{Column}. #' -#' @param x a literal value or a Column. +#' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. +#' In \code{monotonically_increasing_id}, it should be empty. +#' @param y Column to compute on. +#' @param ... additional argument(s). In \code{expr}, it contains an expression character --- End diff -- `In \code{expr}, it contains an expression character` - this isn't quite right actually - it's in `x` in expr, not as ... parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18449: [SPARK-21237][SQL] Invalidate stats once table da...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18449 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org