[GitHub] [spark] attilapiros commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
attilapiros commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-639990452 It is in the `unit-tests.log` on the jenkins too (this is the appender target see [log4j.properties](https://github.com/apache/spark/blob/bdeae92b9ce886166a0c339306fa6d3a8a922ca5/core/src/test/resources/log4j.properties#L23)): ``` $ wget https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123572/artifact/core/target/unit-tests.log $ grep "POSSIBLE THREAD LEAK.* migrate-shuffle-to-BlockManagerId" unit-tests.log 130 ↵ = POSSIBLE THREAD LEAK IN SUITE o.a.s.storage.BlockManagerSuite, thread names: shuffle-boss-7296-1, rpc-boss-7278-1, migrate-shuffle-to-BlockManagerId(exec2, localhost, 42830, None), shuffle-boss-7281-1, shuffle-boss-7242-1, shuffle-boss-7284-1, shuffle-boss-7287-1, rpc-boss-7260-1, shuffle-boss-7302-1, rpc-boss-7263-1, shuffle-boss-7269-1, rpc-boss-7290-1, rpc-boss-7240-1, shuffle-boss-7305-1, shuffle-boss-7253-1, shuffle-boss-7249-1, shuffle-boss-7275-1, rpc-boss-7257-1, shuffle-boss-7245-1, block-migration-thread, rpc-boss-7299-1, shuffle-boss-7272-1, rpc-boss-7266-1, shuffle-boss-7293-1 = ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639987633 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123586/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639987630 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639963007 **[Test build #123586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)** for PR 28593 at commit [`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639987630 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639987583 **[Test build #123586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)** for PR 28593 at commit [`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r436235957 ## File path: common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.kvstore; + +import org.apache.spark.annotation.Private; + +import java.io.IOException; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentLinkedQueue; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.Collection; + +/** + * Implementation of KVStore that writes data to InMemoryStore at first and uses + * a background thread to dump data to LevelDB once the writing to InMemoryStore + * is completed. + */ +@Private +public class HybridStore implements KVStore { + + private InMemoryStore inMemoryStore = new InMemoryStore(); + private LevelDB levelDB = null; + + // Flag to indicate if we should use inMemoryStore Or levelDB. + private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true); + + // A background thread that dumps data in inMemoryStore to levelDB + private Thread backgroundThread = null; + + // A hash map that stores all class types (except CachedQuantile) that had been writen + // to inMemoryStore. + private ConcurrentHashMap, Boolean> klassMap = new ConcurrentHashMap<>(); + + // CachedQuantile can be written to kvstore after rebuildAppStore(), so we need Review comment: Given we still have to deal with exception, how about generalizing a bit more, like allowing to write on in-memory DB but capturing all of the write operations between the time the replay is done and the time we just shift to level DB? You may feel it sounds as back and forth - sorry I didn't realize there are additional writes after replaying. But it only applies to the case during migrating so would be still simpler than before, and we can get rid of assertion on being "read-only" with exception. I'm also OK with current state - eventually I'll try to deal with that afterwards. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r436233082 ## File path: common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.kvstore; + +import org.apache.spark.annotation.Private; + +import java.io.IOException; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentLinkedQueue; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.Collection; + +/** + * Implementation of KVStore that writes data to InMemoryStore at first and uses + * a background thread to dump data to LevelDB once the writing to InMemoryStore + * is completed. + */ +@Private +public class HybridStore implements KVStore { + + private InMemoryStore inMemoryStore = new InMemoryStore(); + private LevelDB levelDB = null; + + // Flag to indicate if we should use inMemoryStore Or levelDB. + private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true); + + // A background thread that dumps data in inMemoryStore to levelDB + private Thread backgroundThread = null; + + // A hash map that stores all class types (except CachedQuantile) that had been writen + // to inMemoryStore. + private ConcurrentHashMap, Boolean> klassMap = new ConcurrentHashMap<>(); + + // CachedQuantile can be written to kvstore after rebuildAppStore(), so we need + // to handle it specially to avoid conflicts. We will use a queue store CachedQuantile + // objects when the underlying store is inMemoryStore, and dump these objects to levelDB + // before the switch completes. + private Class cachedQuantileKlass = null; + private ConcurrentLinkedQueue cachedQuantileQueue = new ConcurrentLinkedQueue<>(); + + + @Override + public T getMetadata(Class klass) throws Exception { +KVStore store = getStore(); +T metaData = store.getMetadata(klass); +return metaData; + } + + @Override + public void setMetadata(Object value) throws Exception { +KVStore store = getStore(); Review comment: Same here. ## File path: common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.kvstore; + +import org.apache.spark.annotation.Private; + +import java.io.IOException; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentLinkedQueue; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.Collection; + +/** + * Implementation of KVStore that writes data to InMemoryStore at first and uses + * a background thread to dump data to LevelDB once the writing to InMemoryStore + * is completed. + */ +@Private +public class HybridStore implements KVStore { + + private InMemoryStore inMemoryStore = new InMemoryStore(); + private LevelDB levelDB = null; + + // Flag to indicate if we should use inMemoryStore Or levelDB. + private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true); + + // A background thread that dumps data in inMemoryStore to levelDB + private Thread backgroundThread = null; + + // A hash map that stores all class types (except CachedQuantile) that had been writen + // to inMemoryStore. + private ConcurrentHashMap, Boolean> klassMap = new ConcurrentHashMap<>(); + +
[GitHub] [spark] viirya commented on a change in pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
viirya commented on a change in pull request #28733: URL: https://github.com/apache/spark/pull/28733#discussion_r436234167 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushCNFPredicateThroughJoin.scala ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions.{And, Expression, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Try converting join condition to conjunctive normal form expression so that more predicates may + * be able to be pushed down. + * To avoid expanding the join condition, the join condition will be kept in the original form even + * when predicate pushdown happens. + */ +object PushCNFPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { + /** + * Convert an expression into conjunctive normal form. + * Definition and algorithm: https://en.wikipedia.org/wiki/Conjunctive_normal_form + * CNF can explode exponentially in the size of the input expression when converting Or clauses. + * Use a configuration MAX_CNF_NODE_COUNT to prevent such cases. + * + * @param condition to be conversed into CNF. + * @return If the number of expressions exceeds threshold on converting Or, return Seq.empty. + * If the conversion repeatedly expands nondeterministic expressions, return Seq.empty. + * Otherwise, return the converted result as sequence of disjunctive expressions. + */ + protected def conjunctiveNormalForm(condition: Expression): Seq[Expression] = { Review comment: Could you add tests for this method? We should have particular tests to verify the CNF conversion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
viirya commented on a change in pull request #28733: URL: https://github.com/apache/spark/pull/28733#discussion_r436234167 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushCNFPredicateThroughJoin.scala ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions.{And, Expression, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Try converting join condition to conjunctive normal form expression so that more predicates may + * be able to be pushed down. + * To avoid expanding the join condition, the join condition will be kept in the original form even + * when predicate pushdown happens. + */ +object PushCNFPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { + /** + * Convert an expression into conjunctive normal form. + * Definition and algorithm: https://en.wikipedia.org/wiki/Conjunctive_normal_form + * CNF can explode exponentially in the size of the input expression when converting Or clauses. + * Use a configuration MAX_CNF_NODE_COUNT to prevent such cases. + * + * @param condition to be conversed into CNF. + * @return If the number of expressions exceeds threshold on converting Or, return Seq.empty. + * If the conversion repeatedly expands nondeterministic expressions, return Seq.empty. + * Otherwise, return the converted result as sequence of disjunctive expressions. + */ + protected def conjunctiveNormalForm(condition: Expression): Seq[Expression] = { Review comment: Could you add tests for this optimization rule? We should have particular tests to verify the CNF conversion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639963197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639963197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639963007 **[Test build #123586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)** for PR 28593 at commit [`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639961662 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123584/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639935264 **[Test build #123584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)** for PR 28593 at commit [`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639961658 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639961627 **[Test build #123584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)** for PR 28593 at commit [`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639961658 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639960955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639960955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639960706 **[Test build #123585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123585/testReport)** for PR 28733 at commit [`cbc1220`](https://github.com/apache/spark/commit/cbc1220c4598a1a24e552cb5815d8404ec2af308). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28700: [SPARK-31860][BUILD] only push release tags on succes
cloud-fan commented on pull request #28700: URL: https://github.com/apache/spark/pull/28700#issuecomment-639959102 with this patch, does it mean we have to re-do all the release steps if one step failed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436227550 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala ## @@ -204,20 +199,37 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray( } } - private[this] class SpillableArrayIterator( + private[this] class MergerIterator( Review comment: Done. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226844 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala ## @@ -168,7 +159,11 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray( if (spillableArray == null) { new InMemoryBufferIterator(startIndex) } else { - new SpillableArrayIterator(spillableArray.getIterator(startIndex), numFieldsPerRow) + new MergerIterator(spillableArray.getIterator(if (startIndex > numRowBufferedInMemory) { +startIndex - numRowBufferedInMemory +} else 0), + numFieldsPerRow, + startIndex) Review comment: Make sense. I will do it. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226766 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala ## @@ -124,29 +129,15 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray( numRowsSpillThreshold, false) -// populate with existing in-memory buffered rows -if (inMemoryBuffer != null) { - inMemoryBuffer.foreach(existingUnsafeRow => -spillableArray.insertRecord( - existingUnsafeRow.getBaseObject, - existingUnsafeRow.getBaseOffset, - existingUnsafeRow.getSizeInBytes, - 0, - false) - ) - inMemoryBuffer.clear() -} numFieldsPerRow = unsafeRow.numFields() } - Review comment: Sure. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639950400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639950400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226636 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java ## @@ -148,4 +141,34 @@ public void close() throws IOException { } } } + + private void readFile() throws IOException { Review comment: I will rename method name to readSpilledFile. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639878972 **[Test build #123582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)** for PR 27694 at commit [`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226544 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java ## @@ -148,4 +141,34 @@ public void close() throws IOException { } } } + + private void readFile() throws IOException { +assert (dataFile.length() > 0); +final ConfigEntry bufferSizeConfigEntry = +package$.MODULE$.UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE(); Review comment: Yah. My Intellij had 8 spaces tab. I will fix it. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639949573 **[Test build #123582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)** for PR 27694 at commit [`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226351 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java ## @@ -47,55 +47,48 @@ private int numRecords; private int numRecordsRemaining; - private byte[] arr = new byte[1024 * 1024]; + private byte[] arr = new byte[1024]; Review comment: Yes. I am looking into this. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226277 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ## @@ -720,5 +720,14 @@ public void loadNext() throws IOException { @Override public long getKeyPrefix() { return current.getKeyPrefix(); } + +private void initializeNumRecords() throws IOException { + if (numRecords == 0) { +for (UnsafeSorterIterator iter: iterators) { + numRecords += iter.getNumRecords(); +} +numRecords += current.getNumRecords(); + } +} Review comment: Thank you for the suggestion. It make sense. I will do it and test it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436226152 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ## @@ -703,6 +702,7 @@ public boolean hasNext() { @Override public void loadNext() throws IOException { Review comment: Thank you for the suggestion. I will start my feedback with my understanding of your suggestion. Your idea is to reuse the code. You are proposing that content of loadNext() method be replaced by invocation of method hasNext(). I would kindly suggest if you could look loadNext() method again. Perhaps you missed the fact that last call inside of loadNext() is “current.loadNext()”. The last call of hasNext() method is “current.hasNext()”. If we follow your recommendation, then we would change behavior of loadNext() method. I think that loadNext() moves to the next element, and hasNext() check if next element exists but it does not move to the next element. I believe that we cannot reuse the code here. Could you please let me know if my reasoning make sense. Thank you for your time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639935264 **[Test build #123584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)** for PR 28593 at commit [`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639932650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639932650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639927936 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123583/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639927931 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639913015 **[Test build #123583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)** for PR 28593 at commit [`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639927931 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639927850 **[Test build #123583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)** for PR 28593 at commit [`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
gengliangwang commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639916084 test failure from TPCDSQuerySuite: > 18884 was not less than or equal to 8000 too long generated codes found in the WholeStageCodegenExec subtree (id=375762) and JIT optimization might not work: I will update the PR to simplify the pushed down predicates This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639913428 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #27172: [WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read
github-actions[bot] closed pull request #27172: URL: https://github.com/apache/spark/pull/27172 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639913428 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27215: [SPARK-30514][K8S] add python environment support for JavaMainAppResource
github-actions[bot] commented on pull request #27215: URL: https://github.com/apache/spark/pull/27215#issuecomment-639913381 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-639913015 **[Test build #123583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)** for PR 28593 at commit [`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script
therealJacobWu commented on a change in pull request #28731: URL: https://github.com/apache/spark/pull/28731#discussion_r436210074 ## File path: bin/beeline ## @@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then source "$(dirname "$0")"/find-spark-home fi +. "${SPARK_HOME}"/bin/load-spark-env.sh + CLASS="org.apache.hive.beeline.BeeLine" -exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@" +exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@" Review comment: Hi Hyukjin, some discussions about the krb5.conf location before is here https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does not work. From the discussion, people saying we should use SPARK_SUBMIT_OPTS to pass the non-standard krb5.conf location by `-Djava.security.krb5.conf=/etc/krb5.conf-custom`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script
therealJacobWu commented on a change in pull request #28731: URL: https://github.com/apache/spark/pull/28731#discussion_r436210074 ## File path: bin/beeline ## @@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then source "$(dirname "$0")"/find-spark-home fi +. "${SPARK_HOME}"/bin/load-spark-env.sh + CLASS="org.apache.hive.beeline.BeeLine" -exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@" +exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@" Review comment: Hi Hyukjin, some discussions about the krb5.conf location before is here https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does not work. From the discussion, people saying we can use SPARK_SUBMIT_OPTS to pass the non-standard krb5.conf location by `-Djava.security.krb5.conf=/etc/krb5.conf-custom`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script
therealJacobWu commented on a change in pull request #28731: URL: https://github.com/apache/spark/pull/28731#discussion_r436210074 ## File path: bin/beeline ## @@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then source "$(dirname "$0")"/find-spark-home fi +. "${SPARK_HOME}"/bin/load-spark-env.sh + CLASS="org.apache.hive.beeline.BeeLine" -exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@" +exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@" Review comment: Hi Hyukjin, some discussions about the krb5.conf location before is here https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does not work. From the discussion, people saying we can use SPARK_SUBMIT_OPTS to include the location by `-Djava.security.krb5.conf=/etc/krb5.conf-custom`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
karuppayya commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-639899181 @viirya @cloud-fan Can you please help review this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639881226 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123578/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639881216 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639723997 **[Test build #123578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123578/testReport)** for PR 28733 at commit [`a9a5c0b`](https://github.com/apache/spark/commit/a9a5c0bc88ec08b2c1645a0b6758519f5ada83b1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639881216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639880795 **[Test build #123578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123578/testReport)** for PR 28733 at commit [`a9a5c0b`](https://github.com/apache/spark/commit/a9a5c0bc88ec08b2c1645a0b6758519f5ada83b1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639879503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639879503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639878972 **[Test build #123582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)** for PR 27694 at commit [`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
HeartSaVioR commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-639878377 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
SparkQA removed a comment on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639865749 **[Test build #123581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)** for PR 27598 at commit [`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
AmplabJenkins removed a comment on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639874811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
AmplabJenkins commented on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639874811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
SparkQA commented on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639874682 **[Test build #123581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)** for PR 27598 at commit [`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
HeartSaVioR edited a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447 Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out for "pinpointing" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing HDFS state store provider. I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it (check only once when the provider instance is created), have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
HeartSaVioR edited a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447 Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out for "pinpointing" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing HDFS state store provider. I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it, have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28726: [SPARK-31906][SQL][DOCS] Enhance comments in NamedExpression.qualifier
maropu commented on a change in pull request #28726: URL: https://github.com/apache/spark/pull/28726#discussion_r436195481 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ## @@ -85,6 +85,7 @@ trait NamedExpression extends Expression { *e.g. top level attributes aliased in the SELECT clause, or column from a LocalRelation. * 2. Seq with a Single element: either the table name or the alias name of the table. * 3. Seq with 2 elements: database name and table name + * 4. Seq with 3 elements: catalog name, database name and table name Review comment: namespace instead? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
HeartSaVioR commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447 Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out of "pinpoint" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing in HDFS state store provider. I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it, have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
HeartSaVioR edited a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447 Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out of "pinpoint" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing HDFS state store provider. I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it, have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #28724: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns
maropu closed pull request #28724: URL: https://github.com/apache/spark/pull/28724 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28724: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns
maropu commented on pull request #28724: URL: https://github.com/apache/spark/pull/28724#issuecomment-639871449 Thanks! Merged to master/3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
AmplabJenkins removed a comment on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639866434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.
AmplabJenkins removed a comment on pull request #28740: URL: https://github.com/apache/spark/pull/28740#issuecomment-639866391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.
AmplabJenkins commented on pull request #28740: URL: https://github.com/apache/spark/pull/28740#issuecomment-639866391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
AmplabJenkins commented on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639866434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.
SparkQA removed a comment on pull request #28740: URL: https://github.com/apache/spark/pull/28740#issuecomment-639738076 **[Test build #123579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123579/testReport)** for PR 28740 at commit [`753a3b7`](https://github.com/apache/spark/commit/753a3b7e1e0697b7eadcfe1ed3eb4b6628ac918a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.
SparkQA commented on pull request #28740: URL: https://github.com/apache/spark/pull/28740#issuecomment-639865831 **[Test build #123579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123579/testReport)** for PR 28740 at commit [`753a3b7`](https://github.com/apache/spark/commit/753a3b7e1e0697b7eadcfe1ed3eb4b6628ac918a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
SparkQA commented on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639865749 **[Test build #123581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)** for PR 27598 at commit [`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shanyu commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
shanyu commented on pull request #27598: URL: https://github.com/apache/spark/pull/27598#issuecomment-639864928 Can we please test this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639861480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123580/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
SparkQA removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639812505 **[Test build #123580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)** for PR 28707 at commit [`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639861463 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639861463 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
SparkQA commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639861273 **[Test build #123580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)** for PR 28707 at commit [`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters
AmplabJenkins removed a comment on pull request #28709: URL: https://github.com/apache/spark/pull/28709#issuecomment-639860420 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters
AmplabJenkins commented on pull request #28709: URL: https://github.com/apache/spark/pull/28709#issuecomment-639860420 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters
SparkQA removed a comment on pull request #28709: URL: https://github.com/apache/spark/pull/28709#issuecomment-639662492 **[Test build #123575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123575/testReport)** for PR 28709 at commit [`6535b24`](https://github.com/apache/spark/commit/6535b2455470e1bdc646ffa2090b86c40fb155a1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters
SparkQA commented on pull request #28709: URL: https://github.com/apache/spark/pull/28709#issuecomment-639859333 **[Test build #123575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123575/testReport)** for PR 28709 at commit [`6535b24`](https://github.com/apache/spark/commit/6535b2455470e1bdc646ffa2090b86c40fb155a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639849864 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123577/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639849856 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639695938 **[Test build #123577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123577/testReport)** for PR 28733 at commit [`a216cf8`](https://github.com/apache/spark/commit/a216cf8c0e8761e7316bee5681f9ee731fc0ff59). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639849856 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-639849494 **[Test build #123577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123577/testReport)** for PR 28733 at commit [`a216cf8`](https://github.com/apache/spark/commit/a216cf8c0e8761e7316bee5681f9ee731fc0ff59). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639813262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639813262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-638180171 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123475/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
SparkQA commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639812505 **[Test build #123580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)** for PR 28707 at commit [`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
xuanyuanking commented on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639811979 All the comments addressed in 1f71563. Thanks for the review! It also includes my alternative of adding the invalidation for all state store operations in StateStoreProvider, PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store
AmplabJenkins removed a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639809051 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org