[GitHub] [spark] attilapiros commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-05 Thread GitBox


attilapiros commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639990452


   It is in the `unit-tests.log` on the jenkins too (this is the appender 
target see 
[log4j.properties](https://github.com/apache/spark/blob/bdeae92b9ce886166a0c339306fa6d3a8a922ca5/core/src/test/resources/log4j.properties#L23)):
   
   ```
   $ wget 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123572/artifact/core/target/unit-tests.log
   $ grep "POSSIBLE THREAD LEAK.* migrate-shuffle-to-BlockManagerId" 
unit-tests.log  
   130 ↵
   = POSSIBLE THREAD LEAK IN SUITE o.a.s.storage.BlockManagerSuite, thread 
names: shuffle-boss-7296-1, rpc-boss-7278-1, 
migrate-shuffle-to-BlockManagerId(exec2, localhost, 42830, None), 
shuffle-boss-7281-1, shuffle-boss-7242-1, shuffle-boss-7284-1, 
shuffle-boss-7287-1, rpc-boss-7260-1, shuffle-boss-7302-1, rpc-boss-7263-1, 
shuffle-boss-7269-1, rpc-boss-7290-1, rpc-boss-7240-1, shuffle-boss-7305-1, 
shuffle-boss-7253-1, shuffle-boss-7249-1, shuffle-boss-7275-1, rpc-boss-7257-1, 
shuffle-boss-7245-1, block-migration-thread, rpc-boss-7299-1, 
shuffle-boss-7272-1, rpc-boss-7266-1, shuffle-boss-7293-1 =
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639987633


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123586/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639987630


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639963007


   **[Test build #123586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)**
 for PR 28593 at commit 
[`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639987630







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639987583


   **[Test build #123586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)**
 for PR 28593 at commit 
[`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-05 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r436235957



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java
##
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.kvstore;
+
+import org.apache.spark.annotation.Private;
+
+import java.io.IOException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.Collection;
+
+/**
+ * Implementation of KVStore that writes data to InMemoryStore at first and 
uses
+ * a background thread to dump data to LevelDB once the writing to 
InMemoryStore
+ * is completed.
+ */
+@Private
+public class HybridStore implements KVStore {
+
+  private InMemoryStore inMemoryStore = new InMemoryStore();
+  private LevelDB levelDB = null;
+
+  // Flag to indicate if we should use inMemoryStore Or levelDB.
+  private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true);
+
+  // A background thread that dumps data in inMemoryStore to levelDB
+  private Thread backgroundThread = null;
+
+  // A hash map that stores all class types (except CachedQuantile) that had 
been writen
+  // to inMemoryStore.
+  private ConcurrentHashMap, Boolean> klassMap = new 
ConcurrentHashMap<>();
+
+  // CachedQuantile can be written to kvstore after rebuildAppStore(), so we 
need

Review comment:
   Given we still have to deal with exception, how about generalizing a bit 
more, like allowing to write on in-memory DB but capturing all of the write 
operations between the time the replay is done and the time we just shift to 
level DB?
   
   You may feel it sounds as back and forth - sorry I didn't realize there are 
additional writes after replaying. But it only applies to the case during 
migrating so would be still simpler than before, and we can get rid of 
assertion on being "read-only" with exception.
   
   I'm also OK with current state - eventually I'll try to deal with that 
afterwards.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-05 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r436233082



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java
##
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.kvstore;
+
+import org.apache.spark.annotation.Private;
+
+import java.io.IOException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.Collection;
+
+/**
+ * Implementation of KVStore that writes data to InMemoryStore at first and 
uses
+ * a background thread to dump data to LevelDB once the writing to 
InMemoryStore
+ * is completed.
+ */
+@Private
+public class HybridStore implements KVStore {
+
+  private InMemoryStore inMemoryStore = new InMemoryStore();
+  private LevelDB levelDB = null;
+
+  // Flag to indicate if we should use inMemoryStore Or levelDB.
+  private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true);
+
+  // A background thread that dumps data in inMemoryStore to levelDB
+  private Thread backgroundThread = null;
+
+  // A hash map that stores all class types (except CachedQuantile) that had 
been writen
+  // to inMemoryStore.
+  private ConcurrentHashMap, Boolean> klassMap = new 
ConcurrentHashMap<>();
+
+  // CachedQuantile can be written to kvstore after rebuildAppStore(), so we 
need
+  // to handle it specially to avoid conflicts. We will use a queue store 
CachedQuantile
+  // objects when the underlying store is inMemoryStore, and dump these 
objects to levelDB
+  // before the switch completes.
+  private Class cachedQuantileKlass = null;
+  private ConcurrentLinkedQueue cachedQuantileQueue = new 
ConcurrentLinkedQueue<>();
+
+
+  @Override
+  public  T getMetadata(Class klass) throws Exception {
+KVStore store = getStore();
+T metaData = store.getMetadata(klass);
+return metaData;
+  }
+
+  @Override
+  public void setMetadata(Object value) throws Exception {
+KVStore store = getStore();

Review comment:
   Same here.

##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/HybridStore.java
##
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.kvstore;
+
+import org.apache.spark.annotation.Private;
+
+import java.io.IOException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.Collection;
+
+/**
+ * Implementation of KVStore that writes data to InMemoryStore at first and 
uses
+ * a background thread to dump data to LevelDB once the writing to 
InMemoryStore
+ * is completed.
+ */
+@Private
+public class HybridStore implements KVStore {
+
+  private InMemoryStore inMemoryStore = new InMemoryStore();
+  private LevelDB levelDB = null;
+
+  // Flag to indicate if we should use inMemoryStore Or levelDB.
+  private AtomicBoolean shouldUseInMemoryStore = new AtomicBoolean(true);
+
+  // A background thread that dumps data in inMemoryStore to levelDB
+  private Thread backgroundThread = null;
+
+  // A hash map that stores all class types (except CachedQuantile) that had 
been writen
+  // to inMemoryStore.
+  private ConcurrentHashMap, Boolean> klassMap = new 
ConcurrentHashMap<>();
+
+  

[GitHub] [spark] viirya commented on a change in pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


viirya commented on a change in pull request #28733:
URL: https://github.com/apache/spark/pull/28733#discussion_r436234167



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushCNFPredicateThroughJoin.scala
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions.{And, Expression, Not, Or, 
PredicateHelper}
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Try converting join condition to conjunctive normal form expression so that 
more predicates may
+ * be able to be pushed down.
+ * To avoid expanding the join condition, the join condition will be kept in 
the original form even
+ * when predicate pushdown happens.
+ */
+object PushCNFPredicateThroughJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  /**
+   * Convert an expression into conjunctive normal form.
+   * Definition and algorithm: 
https://en.wikipedia.org/wiki/Conjunctive_normal_form
+   * CNF can explode exponentially in the size of the input expression when 
converting Or clauses.
+   * Use a configuration MAX_CNF_NODE_COUNT to prevent such cases.
+   *
+   * @param condition to be conversed into CNF.
+   * @return If the number of expressions exceeds threshold on converting Or, 
return Seq.empty.
+   * If the conversion repeatedly expands nondeterministic 
expressions, return Seq.empty.
+   * Otherwise, return the converted result as sequence of disjunctive 
expressions.
+   */
+  protected def conjunctiveNormalForm(condition: Expression): Seq[Expression] 
= {

Review comment:
   Could you add tests for this method? We should have particular tests to 
verify the CNF conversion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


viirya commented on a change in pull request #28733:
URL: https://github.com/apache/spark/pull/28733#discussion_r436234167



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushCNFPredicateThroughJoin.scala
##
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions.{And, Expression, Not, Or, 
PredicateHelper}
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Try converting join condition to conjunctive normal form expression so that 
more predicates may
+ * be able to be pushed down.
+ * To avoid expanding the join condition, the join condition will be kept in 
the original form even
+ * when predicate pushdown happens.
+ */
+object PushCNFPredicateThroughJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  /**
+   * Convert an expression into conjunctive normal form.
+   * Definition and algorithm: 
https://en.wikipedia.org/wiki/Conjunctive_normal_form
+   * CNF can explode exponentially in the size of the input expression when 
converting Or clauses.
+   * Use a configuration MAX_CNF_NODE_COUNT to prevent such cases.
+   *
+   * @param condition to be conversed into CNF.
+   * @return If the number of expressions exceeds threshold on converting Or, 
return Seq.empty.
+   * If the conversion repeatedly expands nondeterministic 
expressions, return Seq.empty.
+   * Otherwise, return the converted result as sequence of disjunctive 
expressions.
+   */
+  protected def conjunctiveNormalForm(condition: Expression): Seq[Expression] 
= {

Review comment:
   Could you add tests for this optimization rule? We should have 
particular tests to verify the CNF conversion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639963197







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639963197







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639963007


   **[Test build #123586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123586/testReport)**
 for PR 28593 at commit 
[`3d1819d`](https://github.com/apache/spark/commit/3d1819d3e1b19d750906580b8b1ba98a1501faa5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639961662


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123584/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639935264


   **[Test build #123584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)**
 for PR 28593 at commit 
[`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639961658


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639961627


   **[Test build #123584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)**
 for PR 28593 at commit 
[`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639961658







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639960955







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639960955







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


SparkQA commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639960706


   **[Test build #123585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123585/testReport)**
 for PR 28733 at commit 
[`cbc1220`](https://github.com/apache/spark/commit/cbc1220c4598a1a24e552cb5815d8404ec2af308).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28700: [SPARK-31860][BUILD] only push release tags on succes

2020-06-05 Thread GitBox


cloud-fan commented on pull request #28700:
URL: https://github.com/apache/spark/pull/28700#issuecomment-639959102


   with this patch, does it mean we have to re-do all the release steps if one 
step failed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436227550



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala
##
@@ -204,20 +199,37 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray(
 }
   }
 
-  private[this] class SpillableArrayIterator(
+  private[this] class MergerIterator(

Review comment:
   Done. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226844



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala
##
@@ -168,7 +159,11 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray(
 if (spillableArray == null) {
   new InMemoryBufferIterator(startIndex)
 } else {
-  new SpillableArrayIterator(spillableArray.getIterator(startIndex), 
numFieldsPerRow)
+  new MergerIterator(spillableArray.getIterator(if (startIndex > 
numRowBufferedInMemory) {
+startIndex - 
numRowBufferedInMemory
+} else 0),
+  numFieldsPerRow,
+  startIndex)

Review comment:
   Make sense. I will do it. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226766



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala
##
@@ -124,29 +129,15 @@ private[sql] class ExternalAppendOnlyUnsafeRowArray(
   numRowsSpillThreshold,
   false)
 
-// populate with existing in-memory buffered rows
-if (inMemoryBuffer != null) {
-  inMemoryBuffer.foreach(existingUnsafeRow =>
-spillableArray.insertRecord(
-  existingUnsafeRow.getBaseObject,
-  existingUnsafeRow.getBaseOffset,
-  existingUnsafeRow.getSizeInBytes,
-  0,
-  false)
-  )
-  inMemoryBuffer.clear()
-}
 numFieldsPerRow = unsafeRow.numFields()
   }
-

Review comment:
   Sure. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639950400







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639950400







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226636



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
##
@@ -148,4 +141,34 @@ public void close() throws IOException {
  }
}
   }
+
+  private void readFile() throws IOException {

Review comment:
   I will rename method name to readSpilledFile. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639878972


   **[Test build #123582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226544



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
##
@@ -148,4 +141,34 @@ public void close() throws IOException {
  }
}
   }
+
+  private void readFile() throws IOException {
+assert (dataFile.length() > 0);
+final ConfigEntry bufferSizeConfigEntry =
+package$.MODULE$.UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE();

Review comment:
   Yah. My Intellij had 8 spaces tab. I will fix it. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639949573


   **[Test build #123582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226351



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
##
@@ -47,55 +47,48 @@
   private int numRecords;
   private int numRecordsRemaining;
 
-  private byte[] arr = new byte[1024 * 1024];
+  private byte[] arr = new byte[1024];

Review comment:
   Yes. I am looking into this. Thank you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226277



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
##
@@ -720,5 +720,14 @@ public void loadNext() throws IOException {
 
 @Override
 public long getKeyPrefix() { return current.getKeyPrefix(); }
+
+private void initializeNumRecords() throws IOException {
+  if (numRecords == 0) {
+for (UnsafeSorterIterator iter: iterators) {
+  numRecords += iter.getNumRecords();
+}
+numRecords += current.getNumRecords();
+  }
+}

Review comment:
   Thank you for the suggestion. It make sense. I will do it and test it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-05 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436226152



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
##
@@ -703,6 +702,7 @@ public boolean hasNext() {
 
 @Override
 public void loadNext() throws IOException {

Review comment:
   Thank you for the suggestion. I will start my feedback with my 
understanding of your suggestion. Your idea is to reuse the code. You are 
proposing that content of loadNext() method be replaced by invocation of method 
hasNext().
   
   I would kindly suggest if you could look loadNext() method again. Perhaps 
you missed the fact that last call inside of loadNext()  is 
“current.loadNext()”. The last call of hasNext() method is “current.hasNext()”. 
If we follow your recommendation, then we would change behavior of loadNext() 
method. I think that loadNext() moves to the next element, and hasNext() check 
if next element exists but it does not move to the next element. I believe that 
we cannot reuse the code here.
   
   Could you please let me know if my reasoning make sense. Thank you for your 
time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639935264


   **[Test build #123584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123584/testReport)**
 for PR 28593 at commit 
[`96c197c`](https://github.com/apache/spark/commit/96c197cdac39d6b91f81ef6bee82f4a03dcb3979).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639932650







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639932650







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639927936


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123583/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639927931


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639913015


   **[Test build #123583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)**
 for PR 28593 at commit 
[`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639927931







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639927850


   **[Test build #123583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)**
 for PR 28593 at commit 
[`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


gengliangwang commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639916084


   test failure from TPCDSQuerySuite:
   > 18884 was not less than or equal to 8000 too long generated codes found in 
the WholeStageCodegenExec subtree (id=375762) and JIT optimization might not 
work:
   
   I will update the PR to simplify the pushed down predicates



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639913428







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #27172: [WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read

2020-06-05 Thread GitBox


github-actions[bot] closed pull request #27172:
URL: https://github.com/apache/spark/pull/27172


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639913428







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #27215: [SPARK-30514][K8S] add python environment support for JavaMainAppResource

2020-06-05 Thread GitBox


github-actions[bot] commented on pull request #27215:
URL: https://github.com/apache/spark/pull/27215#issuecomment-639913381


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-05 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-639913015


   **[Test build #123583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123583/testReport)**
 for PR 28593 at commit 
[`8ed4fc4`](https://github.com/apache/spark/commit/8ed4fc41a43442864cdb0fa2e0160fa5e80f374e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-05 Thread GitBox


therealJacobWu commented on a change in pull request #28731:
URL: https://github.com/apache/spark/pull/28731#discussion_r436210074



##
File path: bin/beeline
##
@@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then
   source "$(dirname "$0")"/find-spark-home
 fi
 
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
 CLASS="org.apache.hive.beeline.BeeLine"
-exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@"
+exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@"

Review comment:
   Hi Hyukjin, some discussions about the krb5.conf location before is here 
https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does 
not work. From the discussion, people saying we should use SPARK_SUBMIT_OPTS to 
pass the non-standard krb5.conf location by  
`-Djava.security.krb5.conf=/etc/krb5.conf-custom`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-05 Thread GitBox


therealJacobWu commented on a change in pull request #28731:
URL: https://github.com/apache/spark/pull/28731#discussion_r436210074



##
File path: bin/beeline
##
@@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then
   source "$(dirname "$0")"/find-spark-home
 fi
 
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
 CLASS="org.apache.hive.beeline.BeeLine"
-exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@"
+exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@"

Review comment:
   Hi Hyukjin, some discussions about the krb5.conf location before is here 
https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does 
not work. From the discussion, people saying we can use SPARK_SUBMIT_OPTS to 
pass the non-standard krb5.conf location by  
`-Djava.security.krb5.conf=/etc/krb5.conf-custom`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] therealJacobWu commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-05 Thread GitBox


therealJacobWu commented on a change in pull request #28731:
URL: https://github.com/apache/spark/pull/28731#discussion_r436210074



##
File path: bin/beeline
##
@@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then
   source "$(dirname "$0")"/find-spark-home
 fi
 
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
 CLASS="org.apache.hive.beeline.BeeLine"
-exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@"
+exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@"

Review comment:
   Hi Hyukjin, some discussions about the krb5.conf location before is here 
https://issues.apache.org/jira/browse/SPARK-12050, but in this case, this does 
not work. From the discussion, people saying we can use SPARK_SUBMIT_OPTS to 
include the location by  `-Djava.security.krb5.conf=/etc/krb5.conf-custom`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec

2020-06-05 Thread GitBox


karuppayya commented on pull request #28715:
URL: https://github.com/apache/spark/pull/28715#issuecomment-639899181


   @viirya @cloud-fan Can you please help review this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639881226


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123578/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639881216


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639723997


   **[Test build #123578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123578/testReport)**
 for PR 28733 at commit 
[`a9a5c0b`](https://github.com/apache/spark/commit/a9a5c0bc88ec08b2c1645a0b6758519f5ada83b1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639881216







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


SparkQA commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639880795


   **[Test build #123578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123578/testReport)**
 for PR 28733 at commit 
[`a9a5c0b`](https://github.com/apache/spark/commit/a9a5c0bc88ec08b2c1645a0b6758519f5ada83b1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639879503







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639879503







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639878972


   **[Test build #123582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123582/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-05 Thread GitBox


HeartSaVioR commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-639878377


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639865749


   **[Test build #123581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)**
 for PR 27598 at commit 
[`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639874811







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639874811







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


SparkQA commented on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639874682


   **[Test build #123581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)**
 for PR 27598 at commit 
[`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out for "pinpointing" - do you think your approach works 
with other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing HDFS 
state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it (check only once when the provider instance is created), have we 
measured the actual overhead? If the overhead turns out to be trivial then it 
won't be matter we run validation check for each batch. It sounds to be 
sub-optimal, but the overhead would be trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out for "pinpointing" - do you think your approach works 
with other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing HDFS 
state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it, have we measured the actual overhead? If the overhead turns out 
to be trivial then it won't be matter we run validation check for each batch. 
It sounds to be sub-optimal, but the overhead would be trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28726: [SPARK-31906][SQL][DOCS] Enhance comments in NamedExpression.qualifier

2020-06-05 Thread GitBox


maropu commented on a change in pull request #28726:
URL: https://github.com/apache/spark/pull/28726#discussion_r436195481



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
##
@@ -85,6 +85,7 @@ trait NamedExpression extends Expression {
*e.g. top level attributes aliased in the SELECT clause, or column from 
a LocalRelation.
* 2. Seq with a Single element: either the table name or the alias name of 
the table.
* 3. Seq with 2 elements: database name and table name
+   * 4. Seq with 3 elements: catalog name, database name and table name

Review comment:
   namespace instead?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


HeartSaVioR commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out of "pinpoint" - do you think your approach works with 
other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing in 
HDFS state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it, have we measured the actual overhead? If the overhead turns out 
to be trivial then it won't be matter we run validation check for each batch. 
It sounds to be sub-optimal, but the overhead would be trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out of "pinpoint" - do you think your approach works with 
other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing HDFS 
state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it, have we measured the actual overhead? If the overhead turns out 
to be trivial then it won't be matter we run validation check for each batch. 
It sounds to be sub-optimal, but the overhead would be trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #28724: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns

2020-06-05 Thread GitBox


maropu closed pull request #28724:
URL: https://github.com/apache/spark/pull/28724


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28724: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns

2020-06-05 Thread GitBox


maropu commented on pull request #28724:
URL: https://github.com/apache/spark/pull/28724#issuecomment-639871449


   Thanks! Merged to master/3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639866434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28740:
URL: https://github.com/apache/spark/pull/28740#issuecomment-639866391







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28740:
URL: https://github.com/apache/spark/pull/28740#issuecomment-639866391







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639866434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28740:
URL: https://github.com/apache/spark/pull/28740#issuecomment-639738076


   **[Test build #123579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123579/testReport)**
 for PR 28740 at commit 
[`753a3b7`](https://github.com/apache/spark/commit/753a3b7e1e0697b7eadcfe1ed3eb4b6628ac918a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28740: [SPARK-31903][SQL][PYSPARK][2.4] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-05 Thread GitBox


SparkQA commented on pull request #28740:
URL: https://github.com/apache/spark/pull/28740#issuecomment-639865831


   **[Test build #123579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123579/testReport)**
 for PR 28740 at commit 
[`753a3b7`](https://github.com/apache/spark/commit/753a3b7e1e0697b7eadcfe1ed3eb4b6628ac918a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


SparkQA commented on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639865749


   **[Test build #123581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123581/testReport)**
 for PR 27598 at commit 
[`20a7a9c`](https://github.com/apache/spark/commit/20a7a9c82510d1f26953b3884a4824c5dfa65e47).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shanyu commented on pull request #27598: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-05 Thread GitBox


shanyu commented on pull request #27598:
URL: https://github.com/apache/spark/pull/27598#issuecomment-639864928


   Can we please test this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639861480


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123580/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639812505


   **[Test build #123580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)**
 for PR 28707 at commit 
[`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639861463


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639861463







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


SparkQA commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639861273


   **[Test build #123580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)**
 for PR 28707 at commit 
[`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28709:
URL: https://github.com/apache/spark/pull/28709#issuecomment-639860420







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28709:
URL: https://github.com/apache/spark/pull/28709#issuecomment-639860420







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28709:
URL: https://github.com/apache/spark/pull/28709#issuecomment-639662492


   **[Test build #123575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123575/testReport)**
 for PR 28709 at commit 
[`6535b24`](https://github.com/apache/spark/commit/6535b2455470e1bdc646ffa2090b86c40fb155a1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28709: [SPARK-31901][SQL] Use the session time zone in legacy date formatters

2020-06-05 Thread GitBox


SparkQA commented on pull request #28709:
URL: https://github.com/apache/spark/pull/28709#issuecomment-639859333


   **[Test build #123575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123575/testReport)**
 for PR 28709 at commit 
[`6535b24`](https://github.com/apache/spark/commit/6535b2455470e1bdc646ffa2090b86c40fb155a1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639849864


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123577/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639849856


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


SparkQA removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639695938


   **[Test build #123577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123577/testReport)**
 for PR 28733 at commit 
[`a216cf8`](https://github.com/apache/spark/commit/a216cf8c0e8761e7316bee5681f9ee731fc0ff59).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639849856







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-05 Thread GitBox


SparkQA commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-639849494


   **[Test build #123577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123577/testReport)**
 for PR 28733 at commit 
[`a216cf8`](https://github.com/apache/spark/commit/a216cf8c0e8761e7316bee5681f9ee731fc0ff59).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639813262







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639813262







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-638180171


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123475/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


SparkQA commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639812505


   **[Test build #123580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123580/testReport)**
 for PR 28707 at commit 
[`7a5e09a`](https://github.com/apache/spark/commit/7a5e09a3d52cc6fa0c5aad0aa1e3c84878afe656).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


xuanyuanking commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639811979


   All the comments addressed in 1f71563. Thanks for the review!
   It also includes my alternative of adding the invalidation for all state 
store operations in StateStoreProvider, PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639809051







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >