[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641736976







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641736976







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641721923


   **[Test build #123725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)**
 for PR 28743 at commit 
[`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


moskvax commented on a change in pull request #28743:
URL: https://github.com/apache/spark/pull/28743#discussion_r437872692



##
File path: python/pyspark/sql/pandas/serializers.py
##
@@ -150,15 +151,22 @@ def _create_batch(self, series):
 series = ((s, None) if not isinstance(s, (list, tuple)) else s for s 
in series)
 
 def create_array(s, t):
-mask = s.isnull()
+# Create with __arrow_array__ if the series' backing array 
implements it
+series_array = getattr(s, 'array', s._values)
+if hasattr(series_array, "__arrow_array__"):
+return series_array.__arrow_array__(type=t)
+
 # Ensure timestamp series are in expected form for Spark internal 
representation
 if t is not None and pa.types.is_timestamp(t):
 s = _check_series_convert_timestamps_internal(s, 
self._timezone)
-elif type(s.dtype) == pd.CategoricalDtype:
+elif is_categorical_dtype(s.dtype):

Review comment:
   By the way, this change was made as `CategoricalDtype` is only imported 
into the root pandas namespace after pandas 0.24.0, which was causing 
`AttributeError` when testing with earlier versions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


SparkQA commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641736450


   **[Test build #123725 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)**
 for PR 28743 at commit 
[`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-09 Thread GitBox


siknezevic commented on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-641736377


   > Also, could you add some benchmark classes in 
https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark
 ?
   
   Hello @maropu, 
   I checked available benchmarks and I can see that there is already benchmark 
that can be utilized.
   
   
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
   
   I am using Databrick’s Toolkit for all my testing (10TB, 100TB datasets).  
TPCDSQueryBenchmark is based on Databrick’s Toolkit. I was able to test 
spilling with TPCDSQueryBenchmark benchmark. I executed benchmark in the 
following way:
   
   /opt/spark/bin/spark-submit --class 
org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --conf 
'spark.sql.sortMergeJoinExec.buffer.spill.threshold=6000' --conf 
'spark.sql.sortMergeJoinExec.buffer.in.memory.threshold=1000'  
'/tmp/spark-sql_2.11-2.4.6-SNAPSHOT-tests.jar' --data-location 
'/user/testusera1/tpcds/datasets-1g/sf1-parquet/useDecimal=true,useDate=true,filterNull=false'
 --query-filter 'q14a'
   
   It runs fine and I am able to pass to it Spark config parameters to trigger 
spilling. I believe that we do not need new benchmark. Do you agree? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


gengliangwang commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641735784


   @zhli1142015 sorry I left comments in the code before I read the discussion 
in the PR.
   
   So, before you update the related code,  could you describe an end-to-end 
use case that can reproduce the error page in PR description? Does it only 
happen when leveldb is evicted or UI server is shutdown?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


yaooqinn commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437871769



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
##
@@ -433,4 +433,35 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
   assert(formatter.format(date(1970, 4, 10)) == "100")
 }
   }
+
+  test("SPARK-31939: Fix Parsing day of year when year field pattern is 
missing") {
+// resolved to queryable LocaleDate or fail directly
+val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true)

Review comment:
   Sounds good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


yaooqinn commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437871990



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##
@@ -39,6 +39,18 @@ trait DateTimeFormatterHelper {
 }
   }
 
+  private def verifyLocalDate(
+  accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): 
Unit = {
+if (accessor.isSupported(field) && candidate.isSupported(field)) {

Review comment:
   For the time being, yes. I can remove this condition

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##
@@ -39,6 +39,18 @@ trait DateTimeFormatterHelper {
 }
   }
 
+  private def verifyLocalDate(
+  accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): 
Unit = {
+if (accessor.isSupported(field) && candidate.isSupported(field)) {
+  val actual = accessor.get(field)
+  val expected = candidate.get(field)
+  if (actual != expected) {
+throw new DateTimeException(s"Conflict found: Field $field $actual 
differs from" +
+  s" $field $expected derived from $candidate")

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641732806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


gengliangwang commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r437870387



##
File path: 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java
##
@@ -276,6 +276,41 @@ public void testNegativeIndexValues() throws Exception {
 assertEquals(expected, results);
   }
 
+  @Test
+  public void testCloseLevelDBIterator() throws Exception {
+// SPARK-31929: test when LevelDB.close() is called, related 
LevelDBIterators
+// are closed. And files opened by iterators are also closed.
+File dbpathForCloseTest = File
+.createTempFile(

Review comment:
   please change to indents to two spaces in this file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641732806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


gengliangwang commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r437869878



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
##
@@ -189,7 +198,12 @@ public void delete(Class type, Object naturalKey) 
throws Exception {
   @Override
   public Iterator iterator() {
 try {
-  return new LevelDBIterator<>(type, LevelDB.this, this);
+  LevelDBIterator iterator = new LevelDBIterator<>(
+  type,

Review comment:
   Nit: put all the parameters to one line?
   ```
   LevelDBIterator iterator = new LevelDBIterator<>(type, LevelDB.this, 
this);```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-09 Thread GitBox


dilipbiswal commented on pull request #28773:
URL: https://github.com/apache/spark/pull/28773#issuecomment-641732487


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641689848


   **[Test build #123719 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123719/testReport)**
 for PR 28412 at commit 
[`1e514b9`](https://github.com/apache/spark/commit/1e514b910a56b719a08d6f7a7689a2a53dcc06a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


gengliangwang commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r437869753



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
##
@@ -256,6 +275,7 @@ void closeIterator(LevelDBIterator it) throws 
IOException {
   DB _db = this._db.get();
   if (_db != null) {
 it.close();
+iteratorTracker.remove(it);

Review comment:
   shall we remove it even when `_db` is null?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641732044


   **[Test build #123719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123719/testReport)**
 for PR 28412 at commit 
[`1e514b9`](https://github.com/apache/spark/commit/1e514b910a56b719a08d6f7a7689a2a53dcc06a5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28773:
URL: https://github.com/apache/spark/pull/28773#issuecomment-641725711







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28773:
URL: https://github.com/apache/spark/pull/28773#issuecomment-641725711







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28773:
URL: https://github.com/apache/spark/pull/28773#issuecomment-641651098


   **[Test build #123711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123711/testReport)**
 for PR 28773 at commit 
[`1013ac8`](https://github.com/apache/spark/commit/1013ac8064c1e380f4be4c297c165fae1a20602e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-09 Thread GitBox


SparkQA commented on pull request #28773:
URL: https://github.com/apache/spark/pull/28773#issuecomment-641724830


   **[Test build #123711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123711/testReport)**
 for PR 28773 at commit 
[`1013ac8`](https://github.com/apache/spark/commit/1013ac8064c1e380f4be4c297c165fae1a20602e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641722284


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/28349/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641722278


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641722278







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


SparkQA commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641721923


   **[Test build #123725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)**
 for PR 28743 at commit 
[`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #27507:
URL: https://github.com/apache/spark/pull/27507#issuecomment-641719947


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123716/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641720102







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641720102







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #27507:
URL: https://github.com/apache/spark/pull/27507#issuecomment-641679815


   **[Test build #123716 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123716/testReport)**
 for PR 27507 at commit 
[`ca6c1c5`](https://github.com/apache/spark/commit/ca6c1c5eef73ae1a3d33f17acebcfcc3d77d9d63).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #27507:
URL: https://github.com/apache/spark/pull/27507#issuecomment-641719937


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


SparkQA commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641719773


   **[Test build #123724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123724/testReport)**
 for PR 28769 at commit 
[`84e9012`](https://github.com/apache/spark/commit/84e9012b49af708ca1b4e5f22f495d8ef38f3122).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #27507:
URL: https://github.com/apache/spark/pull/27507#issuecomment-641719937







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options

2020-06-09 Thread GitBox


SparkQA commented on pull request #28776:
URL: https://github.com/apache/spark/pull/28776#issuecomment-641719758


   **[Test build #123723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123723/testReport)**
 for PR 28776 at commit 
[`f6cca6b`](https://github.com/apache/spark/commit/f6cca6b5163acef655d0c0e3d6cd4848b00314e0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-06-09 Thread GitBox


SparkQA commented on pull request #27507:
URL: https://github.com/apache/spark/pull/27507#issuecomment-641719783


   **[Test build #123716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123716/testReport)**
 for PR 27507 at commit 
[`ca6c1c5`](https://github.com/apache/spark/commit/ca6c1c5eef73ae1a3d33f17acebcfcc3d77d9d63).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28776:
URL: https://github.com/apache/spark/pull/28776#issuecomment-641717889







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641186609


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


cloud-fan commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641719161


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


moskvax commented on a change in pull request #28743:
URL: https://github.com/apache/spark/pull/28743#discussion_r437858625



##
File path: python/pyspark/sql/tests/test_arrow.py
##
@@ -30,10 +30,14 @@
 pandas_requirement_message, pyarrow_requirement_message
 from pyspark.testing.utils import QuietTest
 from pyspark.util import _exception_message
+from distutils.version import LooseVersion
 
 if have_pandas:
 import pandas as pd
 from pandas.util.testing import assert_frame_equal
+pandas_version = LooseVersion(pd.__version__)
+else:
+pandas_version = LooseVersion("0")

Review comment:
   Nice, will update





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


cloud-fan commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641718840


   cc @gengliangwang 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


moskvax commented on a change in pull request #28743:
URL: https://github.com/apache/spark/pull/28743#discussion_r437858389



##
File path: python/pyspark/sql/pandas/conversion.py
##
@@ -394,10 +394,11 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
 
 # Create the Spark schema from list of names passed in with Arrow types
 if isinstance(schema, (list, tuple)):
-arrow_schema = pa.Schema.from_pandas(pdf, preserve_index=False)
+inferred_types = [pa.infer_type(s, mask=s.isna(), from_pandas=True)
+  for s in (pdf[c] for c in pdf)]
 struct = StructType()
-for name, field in zip(schema, arrow_schema):
-struct.add(name, from_arrow_type(field.type), 
nullable=field.nullable)
+for name, t in zip(schema, inferred_types):
+struct.add(name, from_arrow_type(t), nullable=True)

Review comment:
   `infer_type` only returns a type, not a `field`, which would supposedly 
have nullability information. But it appears that in the implementation of 
`Schema.from_pandas` 
([link](https://github.com/apache/arrow/blob/b058cf0d1c26ad7984c104bb84322cc7dcc66f00/python/pyarrow/types.pxi#L1328)),
 inferring nullability was not actually done and the default `nullable=True` 
would always be returned. So this change is just following the existing 
behaviour of `Schema.from_pandas`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28774:
URL: https://github.com/apache/spark/pull/28774#issuecomment-641718511







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28774:
URL: https://github.com/apache/spark/pull/28774#issuecomment-641718511







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28776:
URL: https://github.com/apache/spark/pull/28776#issuecomment-641717889







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28774:
URL: https://github.com/apache/spark/pull/28774#issuecomment-641673972


   **[Test build #123714 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123714/testReport)**
 for PR 28774 at commit 
[`c2b6b86`](https://github.com/apache/spark/commit/c2b6b86d2c450d35d9451929eab71eaeed9801c1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #28776: [SPARK-31935][SQL] Hadoop file system config should be effective in data source options

2020-06-09 Thread GitBox


gengliangwang opened a new pull request #28776:
URL: https://github.com/apache/spark/pull/28776


   
   
   ### What changes were proposed in this pull request?
   
   Mkae Hadoop file system config effective in data source options.
   
   From `org.apache.hadoop.fs.FileSystem.java`:
   ```
 public static FileSystem get(URI uri, Configuration conf) throws 
IOException {
   String scheme = uri.getScheme();
   String authority = uri.getAuthority();
   
   if (scheme == null && authority == null) { // use default FS
 return get(conf);
   }
   
   if (scheme != null && authority == null) { // no authority
 URI defaultUri = getDefaultUri(conf);
 if (scheme.equals(defaultUri.getScheme())// if scheme matches 
default
 && defaultUri.getAuthority() != null) {  // & default has authority
   return get(defaultUri, conf);  // return default
 }
   }
   
   String disableCacheName = String.format("fs.%s.impl.disable.cache", 
scheme);
   if (conf.getBoolean(disableCacheName, false)) {
 return createFileSystem(uri, conf);
   }
   
   return CACHE.get(uri, conf);
 }
   ```
   Before changes, the file system configurations in data source options are 
not propagated in `DataSource.scala`.
   After changes, we can specify authority and URI schema related 
configurations for scanning file systems.
   
   This problem only exists in data source V1. In V2, we already use 
`sparkSession.sessionState.newHadoopConfWithOptions(options)` in `FileTable`.
   ### Why are the changes needed?
   
   Allow users to specify authority and URI schema related Hadoop 
configurations for file source reading.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the file system related Hadoop configuration in data source option will 
be effective on reading.
   
   ### How was this patch tested?
   
   Unit test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #28776: [SPARK-31935][SQL] Hadoop file system config should be effective in data source options

2020-06-09 Thread GitBox


gengliangwang commented on pull request #28776:
URL: https://github.com/apache/spark/pull/28776#issuecomment-641717785


   This PR backports https://github.com/apache/spark/pull/28760 to branch-3.0



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.

2020-06-09 Thread GitBox


SparkQA commented on pull request #28774:
URL: https://github.com/apache/spark/pull/28774#issuecomment-641717749


   **[Test build #123714 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123714/testReport)**
 for PR 28774 at commit 
[`c2b6b86`](https://github.com/apache/spark/commit/c2b6b86d2c450d35d9451929eab71eaeed9801c1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xccui commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore

2020-06-09 Thread GitBox


xccui commented on pull request #28768:
URL: https://github.com/apache/spark/pull/28768#issuecomment-641714353


   Sorry that I didn't realize the potential impact of using `SparkException` 
or `NoSuchElementException`.
   
   +1 to this change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


zhli1142015 edited a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641708063


   > Of course relying on finalize is wrong, but I don't think the intent was 
to rely on finalize. Not closing these iterators is a bug. I see one case it 
clearly isn't; there may be others but haven't spotted them. It'd be nice to 
fix them all instead of the change in this patch but we may want to fix what we 
can see and also make the change in this patch for now.
   
   Thanks for your comments, i get your point.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641710484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641710484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641710231


   **[Test build #123722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123722/testReport)**
 for PR 28412 at commit 
[`0d36074`](https://github.com/apache/spark/commit/0d360741d6c8cf2a1f4b597a860636e512226875).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437849260



##
File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.IOException
+import java.util.Collection
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicBoolean
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.kvstore._
+
+/**
+ * A implementation of KVStore that accelerates event logs loading.
+ *
+ * When rebuilding the application state from event logs, HybridStore will
+ * write data to InMemoryStore at first and use a background thread to dump
+ * data to LevelDB once the writing to InMemoryStore is completed.
+ */
+
+private[history] class HybridStore extends KVStore {
+
+  private val inMemoryStore = new InMemoryStore()
+
+  private var levelDB: LevelDB = null
+
+  // Flag to indicate whether we should use inMemoryStore or levelDB
+  private[history] val shouldUseInMemoryStore = new AtomicBoolean(true)
+
+  // A background thread that dumps data from inMemoryStore to levelDB
+  private var backgroundThread: Thread = null
+
+  // A hash map that stores all classes that had been writen to inMemoryStore
+  private val klassMap = new ConcurrentHashMap[Class[_], Boolean]
+
+  override def getMetadata[T](klass: Class[T]): T = {
+getStore().getMetadata(klass)
+  }
+
+  override def setMetadata(value: Object): Unit = {
+getStore().setMetadata(value)
+  }
+
+  override def read[T](klass: Class[T], naturalKey: Object): T = {
+getStore().read(klass, naturalKey)
+  }
+
+  override def write(value: Object): Unit = {
+val store = getStore()
+store.write(value)
+
+if (backgroundThread == null) {
+  // New classes won't be dumped once the background thread is started
+  klassMap.putIfAbsent(value.getClass(), true)
+}
+  }
+
+  override def delete(klass: Class[_], naturalKey: Object): Unit = {
+if (backgroundThread != null) {
+  throw new RuntimeException("delete() shouldn't be called after " +
+"the hybrid store begins switching to levelDB")
+}
+
+getStore().delete(klass, naturalKey)
+  }
+
+  override def view[T](klass: Class[T]): KVStoreView[T] = {
+getStore().view(klass)
+  }
+
+  override def count(klass: Class[_]): Long = {
+getStore().count(klass)
+  }
+
+  override def count(klass: Class[_], index: String, indexedValue: Object): 
Long = {
+getStore().count(klass, index, indexedValue)
+  }
+
+  override def close(): Unit = {
+try {
+  if (backgroundThread != null && backgroundThread.isAlive()) {
+// The background thread is still running, wait for it to finish
+backgroundThread.join()
+  }
+  if (levelDB != null) {

Review comment:
   Sorry I missed this comment. Yeah, you are right, join() can throw 
InterruptedException. I will update the code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


zhli1142015 commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641708172


   > So looks like there're two different issues - 1) resource leaks in general 
2) concurrent usage on KV store.
   > 
   > @srowen and me have been talking about the issue 1), while @zhli1142015 is 
pointing out 2) - race condition - as store can be closed while the other 
thread "already" obtained the iterator and not closed yet. It should help to 
avoid 2) if we fix the issue 1), but it doesn't completely prevent the chance 
of 2).
   
   Thanks for you summary.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


zhli1142015 commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641708063


   > Of course relying on finalize is wrong, but I don't think the intent was 
to rely on finalize. Not closing these iterators is a bug. I see one case it 
clearly isn't; there may be others but haven't spotted them. It'd be nice to 
fix them all instead of the change in this patch but we may want to fix what we 
can see and also make the change in this patch for now.
   
   Thanks for your commetns, i get your point.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 edited a comment on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore

2020-06-09 Thread GitBox


Ngone51 edited a comment on pull request #28768:
URL: https://github.com/apache/spark/pull/28768#issuecomment-641707231


   cc @xccui @jiangxb1987 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore

2020-06-09 Thread GitBox


Ngone51 commented on pull request #28768:
URL: https://github.com/apache/spark/pull/28768#issuecomment-641707231


   cc @xccui 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28775:
URL: https://github.com/apache/spark/pull/28775#issuecomment-641706855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28775:
URL: https://github.com/apache/spark/pull/28775#issuecomment-641706855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration

2020-06-09 Thread GitBox


SparkQA commented on pull request #28775:
URL: https://github.com/apache/spark/pull/28775#issuecomment-641706549


   **[Test build #123721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123721/testReport)**
 for PR 28775 at commit 
[`3d55ef8`](https://github.com/apache/spark/commit/3d55ef8a63f1d6a698a63882d5421f4eb385240b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration

2020-06-09 Thread GitBox


Ngone51 commented on pull request #28775:
URL: https://github.com/apache/spark/pull/28775#issuecomment-641706359


   Synced with @cloud-fan , it's still better to use `ConfigEntry`.
   
   cc @akshatb1 @srowen @jiangxb1987 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration

2020-06-09 Thread GitBox


Ngone51 opened a new pull request #28775:
URL: https://github.com/apache/spark/pull/28775


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR replaces constant config with the `ConfigEntry`.
   
   
   ### Why are the changes needed?
   
   
   For better code maintenance.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Tested manually.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437846179



##
File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala
##
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.IOException
+import java.util.Collection
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicBoolean
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.kvstore._
+
+/**
+ * An implementation of KVStore that accelerates event logs loading.
+ *
+ * When rebuilding the application state from event logs, HybridStore will
+ * write data to InMemoryStore at first and use a background thread to dump
+ * data to LevelDB once the app store is restored. We don't expect write
+ * operations (except the case for caching) after calling switch to level DB.
+ */
+
+private[history] class HybridStore extends KVStore {
+
+  private val inMemoryStore = new InMemoryStore()
+
+  private var levelDB: LevelDB = null
+
+  // Flag to indicate whether we should use inMemoryStore or levelDB
+  private val shouldUseInMemoryStore = new AtomicBoolean(true)
+
+  // Flag to indicate whether this hybrid store is closed, use this flag
+  // to avoid starting background thread after the store is closed
+  private val closed = new AtomicBoolean(false)
+
+  // A background thread that dumps data from inMemoryStore to levelDB
+  private var backgroundThread: Thread = null
+
+  // A hash map that stores all classes that had been writen to inMemoryStore
+  private val klassMap = new ConcurrentHashMap[Class[_], Boolean]
+
+  override def getMetadata[T](klass: Class[T]): T = {
+getStore().getMetadata(klass)
+  }
+
+  override def setMetadata(value: Object): Unit = {
+getStore().setMetadata(value)
+  }
+
+  override def read[T](klass: Class[T], naturalKey: Object): T = {
+getStore().read(klass, naturalKey)
+  }
+
+  override def write(value: Object): Unit = {
+getStore().write(value)
+
+if (backgroundThread == null) {
+  // New classes won't be dumped once the background thread is started
+  klassMap.putIfAbsent(value.getClass(), true)
+}
+  }
+
+  override def delete(klass: Class[_], naturalKey: Object): Unit = {
+if (backgroundThread != null) {
+  throw new IllegalStateException("delete() shouldn't be called after " +
+"the hybrid store begins switching to levelDB")
+}
+
+getStore().delete(klass, naturalKey)
+  }
+
+  override def view[T](klass: Class[T]): KVStoreView[T] = {
+getStore().view(klass)
+  }
+
+  override def count(klass: Class[_]): Long = {
+getStore().count(klass)
+  }
+
+  override def count(klass: Class[_], index: String, indexedValue: Object): 
Long = {
+getStore().count(klass, index, indexedValue)
+  }
+
+  override def close(): Unit = {
+closed.set(true)
+
+if (backgroundThread != null && backgroundThread.isAlive()) {
+  // The background thread is still running, wait for it to finish
+  backgroundThread.join()
+}
+
+try {
+  if (levelDB != null) {
+levelDB.close()
+  }
+} catch {
+  case ioe: IOException => throw ioe
+} finally {
+  inMemoryStore.close()
+}
+  }
+
+  override def removeAllByIndexValues[T](
+  klass: Class[T],
+  index: String,
+  indexValues: Collection[_]): Boolean = {
+if (backgroundThread != null) {
+  throw new IllegalStateException("removeAllByIndexValues() shouldn't be " 
+
+"called after the hybrid store begins switching to levelDB")
+}
+
+getStore().removeAllByIndexValues(klass, index, indexValues)
+  }
+
+  def setLevelDB(levelDB: LevelDB): Unit = {
+this.levelDB = levelDB
+  }
+
+  /**
+   * This method is called when the writing is done for inMemoryStore. A
+   * background thread will be created and be started to dump data in 
inMemoryStore
+   * to levelDB. Once the dumping is completed, the underlying kvstore will be
+   * switched to levelDB.
+   */
+  def switchToLevelDB(listener: HybridStore.SwitchToLevelDBListener): Unit = {
+if (closed.get) {
+  

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437846179



##
File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala
##
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.IOException
+import java.util.Collection
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicBoolean
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.kvstore._
+
+/**
+ * An implementation of KVStore that accelerates event logs loading.
+ *
+ * When rebuilding the application state from event logs, HybridStore will
+ * write data to InMemoryStore at first and use a background thread to dump
+ * data to LevelDB once the app store is restored. We don't expect write
+ * operations (except the case for caching) after calling switch to level DB.
+ */
+
+private[history] class HybridStore extends KVStore {
+
+  private val inMemoryStore = new InMemoryStore()
+
+  private var levelDB: LevelDB = null
+
+  // Flag to indicate whether we should use inMemoryStore or levelDB
+  private val shouldUseInMemoryStore = new AtomicBoolean(true)
+
+  // Flag to indicate whether this hybrid store is closed, use this flag
+  // to avoid starting background thread after the store is closed
+  private val closed = new AtomicBoolean(false)
+
+  // A background thread that dumps data from inMemoryStore to levelDB
+  private var backgroundThread: Thread = null
+
+  // A hash map that stores all classes that had been writen to inMemoryStore
+  private val klassMap = new ConcurrentHashMap[Class[_], Boolean]
+
+  override def getMetadata[T](klass: Class[T]): T = {
+getStore().getMetadata(klass)
+  }
+
+  override def setMetadata(value: Object): Unit = {
+getStore().setMetadata(value)
+  }
+
+  override def read[T](klass: Class[T], naturalKey: Object): T = {
+getStore().read(klass, naturalKey)
+  }
+
+  override def write(value: Object): Unit = {
+getStore().write(value)
+
+if (backgroundThread == null) {
+  // New classes won't be dumped once the background thread is started
+  klassMap.putIfAbsent(value.getClass(), true)
+}
+  }
+
+  override def delete(klass: Class[_], naturalKey: Object): Unit = {
+if (backgroundThread != null) {
+  throw new IllegalStateException("delete() shouldn't be called after " +
+"the hybrid store begins switching to levelDB")
+}
+
+getStore().delete(klass, naturalKey)
+  }
+
+  override def view[T](klass: Class[T]): KVStoreView[T] = {
+getStore().view(klass)
+  }
+
+  override def count(klass: Class[_]): Long = {
+getStore().count(klass)
+  }
+
+  override def count(klass: Class[_], index: String, indexedValue: Object): 
Long = {
+getStore().count(klass, index, indexedValue)
+  }
+
+  override def close(): Unit = {
+closed.set(true)
+
+if (backgroundThread != null && backgroundThread.isAlive()) {
+  // The background thread is still running, wait for it to finish
+  backgroundThread.join()
+}
+
+try {
+  if (levelDB != null) {
+levelDB.close()
+  }
+} catch {
+  case ioe: IOException => throw ioe
+} finally {
+  inMemoryStore.close()
+}
+  }
+
+  override def removeAllByIndexValues[T](
+  klass: Class[T],
+  index: String,
+  indexValues: Collection[_]): Boolean = {
+if (backgroundThread != null) {
+  throw new IllegalStateException("removeAllByIndexValues() shouldn't be " 
+
+"called after the hybrid store begins switching to levelDB")
+}
+
+getStore().removeAllByIndexValues(klass, index, indexValues)
+  }
+
+  def setLevelDB(levelDB: LevelDB): Unit = {
+this.levelDB = levelDB
+  }
+
+  /**
+   * This method is called when the writing is done for inMemoryStore. A
+   * background thread will be created and be started to dump data in 
inMemoryStore
+   * to levelDB. Once the dumping is completed, the underlying kvstore will be
+   * switched to levelDB.
+   */
+  def switchToLevelDB(listener: HybridStore.SwitchToLevelDBListener): Unit = {
+if (closed.get) {
+  

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437845789



##
File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.IOException
+import java.util.Collection
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicBoolean
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.kvstore._
+
+/**
+ * A implementation of KVStore that accelerates event logs loading.
+ *
+ * When rebuilding the application state from event logs, HybridStore will
+ * write data to InMemoryStore at first and use a background thread to dump
+ * data to LevelDB once the writing to InMemoryStore is completed.
+ */
+
+private[history] class HybridStore extends KVStore {
+
+  private val inMemoryStore = new InMemoryStore()
+
+  private var levelDB: LevelDB = null
+
+  // Flag to indicate whether we should use inMemoryStore or levelDB
+  private[history] val shouldUseInMemoryStore = new AtomicBoolean(true)
+
+  // A background thread that dumps data from inMemoryStore to levelDB
+  private var backgroundThread: Thread = null
+
+  // A hash map that stores all classes that had been writen to inMemoryStore
+  private val klassMap = new ConcurrentHashMap[Class[_], Boolean]
+
+  override def getMetadata[T](klass: Class[T]): T = {
+getStore().getMetadata(klass)
+  }
+
+  override def setMetadata(value: Object): Unit = {
+getStore().setMetadata(value)
+  }
+
+  override def read[T](klass: Class[T], naturalKey: Object): T = {
+getStore().read(klass, naturalKey)
+  }
+
+  override def write(value: Object): Unit = {
+val store = getStore()
+store.write(value)
+
+if (backgroundThread == null) {
+  // New classes won't be dumped once the background thread is started
+  klassMap.putIfAbsent(value.getClass(), true)
+}
+  }
+
+  override def delete(klass: Class[_], naturalKey: Object): Unit = {
+if (backgroundThread != null) {
+  throw new RuntimeException("delete() shouldn't be called after " +
+"the hybrid store begins switching to levelDB")
+}
+
+getStore().delete(klass, naturalKey)
+  }
+
+  override def view[T](klass: Class[T]): KVStoreView[T] = {
+getStore().view(klass)
+  }
+
+  override def count(klass: Class[_]): Long = {
+getStore().count(klass)
+  }
+
+  override def count(klass: Class[_], index: String, indexedValue: Object): 
Long = {
+getStore().count(klass, index, indexedValue)
+  }
+
+  override def close(): Unit = {
+try {
+  if (backgroundThread != null && backgroundThread.isAlive()) {
+// The background thread is still running, wait for it to finish
+backgroundThread.join()
+  }
+  if (levelDB != null) {

Review comment:
   Friendly reminder about the comment here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #27648: [SPARK-24994][SQL] : Support filter pushdown for short and byte without explicit casting

2020-06-09 Thread GitBox


HyukjinKwon closed pull request #27648:
URL: https://github.com/apache/spark/pull/27648


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


HyukjinKwon commented on a change in pull request #28743:
URL: https://github.com/apache/spark/pull/28743#discussion_r437844741



##
File path: python/pyspark/sql/tests/test_arrow.py
##
@@ -30,10 +30,14 @@
 pandas_requirement_message, pyarrow_requirement_message
 from pyspark.testing.utils import QuietTest
 from pyspark.util import _exception_message
+from distutils.version import LooseVersion
 
 if have_pandas:
 import pandas as pd
 from pandas.util.testing import assert_frame_equal
+pandas_version = LooseVersion(pd.__version__)
+else:
+pandas_version = LooseVersion("0")

Review comment:
   I would do something like:
   
   ```python
   pandas_version = None
   if have_pandas:
   import pandas as pd
   from pandas.util.testing import assert_frame_equal
   pandas_version = LooseVersion(pd.__version__)
   ```
   
   if the linter is not happy with it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


HyukjinKwon commented on a change in pull request #28743:
URL: https://github.com/apache/spark/pull/28743#discussion_r437843397



##
File path: python/pyspark/sql/pandas/conversion.py
##
@@ -394,10 +394,11 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
 
 # Create the Spark schema from list of names passed in with Arrow types
 if isinstance(schema, (list, tuple)):
-arrow_schema = pa.Schema.from_pandas(pdf, preserve_index=False)
+inferred_types = [pa.infer_type(s, mask=s.isna(), from_pandas=True)
+  for s in (pdf[c] for c in pdf)]
 struct = StructType()
-for name, field in zip(schema, arrow_schema):
-struct.add(name, from_arrow_type(field.type), 
nullable=field.nullable)
+for name, t in zip(schema, inferred_types):
+struct.add(name, from_arrow_type(t), nullable=True)

Review comment:
   Why don't we follow nullability anymore?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


HyukjinKwon commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641702673


   cc @BryanCutler FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


HeartSaVioR edited a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641702431


   So looks like there're two different issues - 1) resource leaks in general 
2) concurrent usage on KV store.
   
   @srowen and me have been talking about the issue 1), while @zhli1142015 is 
pointing out 2) - race condition - as store can be closed while the other 
thread "already" obtained the iterator and not closed yet. It should help to 
avoid 2) if we fix the issue 1), but it doesn't completely prevent the chance 
of 2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


HeartSaVioR commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641702431


   So looks like there're two different issues - 1) resource leaks in general 
2) concurrent usage on KV store.
   
   @srowen and me have been talking about the issue 1), while @zhli1142015 is 
pointing out 2) - race condition - as store can be closed while the other 
thread obtained the iterator. It should help to avoid 2) if we fix the issue 
1), but it doesn't completely prevent the chance of 2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641701924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641701924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641701635


   **[Test build #123720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123720/testReport)**
 for PR 28412 at commit 
[`d6c7d98`](https://github.com/apache/spark/commit/d6c7d988bd9e39caebb9a33f8c01ee230b6c2a39).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


srowen commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641700733


   Of course relying on finalize is wrong, but I don't think the intent was to 
rely on finalize. Not closing these iterators is a bug. I see one case it 
clearly isn't; there may be others but haven't spotted them. It'd be nice to 
fix them all instead of the change in this patch but we may want to fix what we 
can see and also make the change in this patch for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


HeartSaVioR commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641700315


   Oh OK. Sorry I missed. At least we close it properly when calling 
`closeableIterator()`, but still in question for using KVStoreView directly. If 
we are confident on guaranteeing that we close the iterator in any case, 
`finalize` method would be just redundant. Instead, finalize method is there 
with excuse.
   
   
https://github.com/apache/spark/blob/f3771c6b47d0b3aef10b86586289a1f675c7cfe2/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L194-L203
   
   Here "eventually" would matter, and in reality it is even not guaranteed to 
be called. We know why it's deprecated in Java 9. If the method should be there 
to do something then we are missing something.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


HeartSaVioR edited a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641700315


   Oh OK. Sorry I missed. At least we close it properly when calling 
`closeableIterator()`, but still in question for using KVStoreView directly. If 
we are confident on guaranteeing that we close the iterator in any case, 
`finalize` method would be just redundant. Instead, finalize method is there 
with excuse.
   
   
https://github.com/apache/spark/blob/f3771c6b47d0b3aef10b86586289a1f675c7cfe2/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L194-L203
   
   Here "eventually" would matter (it could be after DB is closed, as the 
semantic of eventually), and in reality it is even not guaranteed to be called. 
We know why it's deprecated in Java 9. If the method should be there to do 
something then we are missing something.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore

2020-06-09 Thread GitBox


SaurabhChawla100 commented on pull request #28768:
URL: https://github.com/apache/spark/pull/28768#issuecomment-641699117


   > The change looks good - we may need to reflect the "actual change" into PR 
title and description, as it's no longer same as initial proposal.
   
   I have updated the PR title and description in this 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641696834







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641696834







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641622318


   **[Test build #123710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123710/testReport)**
 for PR 28733 at commit 
[`af33605`](https://github.com/apache/spark/commit/af3360538f28c2400b33a6c22c560b8c78d78c51).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


SparkQA commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641696287


   **[Test build #123710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123710/testReport)**
 for PR 28733 at commit 
[`af33605`](https://github.com/apache/spark/commit/af3360538f28c2400b33a6c22c560b8c78d78c51).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait PredicateHelper extends Logging `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437837726



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 "trying again...")
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   retried = true
-
 case e: Exception =>
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   throw e
   }
 }
 
+// Create a LevelDB and start a background thread to dump data to LevelDB
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")

Review comment:
   ok, will update it. Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437837539



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 "trying again...")
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   retried = true
-
 case e: Exception =>
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   throw e
   }
 }
 
+// Create a LevelDB and start a background thread to dump data to LevelDB
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")

Review comment:
   Foreground, before switching method is called. Yeah I meant that. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437837248



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 "trying again...")
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   retried = true
-
 case e: Exception =>
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   throw e
   }
 }
 
+// Create a LevelDB and start a background thread to dump data to LevelDB
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")

Review comment:
   do you mean exception thrown in the foreground thread or in the 
background thread? If it's in foreground thread, we can use a try-catch block 
to close the store when an exception is thrown.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


AmplabJenkins removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641694556







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


AmplabJenkins commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641694556







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


SparkQA removed a comment on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641612499


   **[Test build #123707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123707/testReport)**
 for PR 28733 at commit 
[`934a412`](https://github.com/apache/spark/commit/934a41277ab764aea19fab6d1b99d0de041221e9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion

2020-06-09 Thread GitBox


SparkQA commented on pull request #28733:
URL: https://github.com/apache/spark/pull/28733#issuecomment-641694008


   **[Test build #123707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123707/testReport)**
 for PR 28733 at commit 
[`934a412`](https://github.com/apache/spark/commit/934a41277ab764aea19fab6d1b99d0de041221e9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moskvax commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns

2020-06-09 Thread GitBox


moskvax commented on pull request #28743:
URL: https://github.com/apache/spark/pull/28743#issuecomment-641693507


   @HyukjinKwon @viirya Please review when you've got a moment. Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641693298


   cc. @vanzin @squito
   Please take a look at this PR - looks like this is a notable improvement on 
reducing load latency in SHS.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox


HyukjinKwon edited a comment on pull request #28745:
URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935


   BTW, I skimmed `Aggregate` related analysis and I think we're all good with 
the current change if I didn't miss anything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox


HyukjinKwon commented on pull request #28745:
URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935


   BTW, I skimmed `Aggregator` related analysis and I think we're all good with 
the current change if I didn't miss anything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r437833266



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 "trying again...")
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   retried = true
-
 case e: Exception =>
   store.close()
   memoryManager.release(appId, attempt.info.attemptId)
-  lease.rollback()
   throw e
   }
 }
 
+// Create a LevelDB and start a background thread to dump data to LevelDB
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")

Review comment:
   There are still chances here exceptions have been thrown, and then we’ll 
leak HybridStore instance with memory being allocated, and never be released.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


cloud-fan commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437834431



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##
@@ -39,6 +39,18 @@ trait DateTimeFormatterHelper {
 }
   }
 
+  private def verifyLocalDate(
+  accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): 
Unit = {
+if (accessor.isSupported(field) && candidate.isSupported(field)) {
+  val actual = accessor.get(field)
+  val expected = candidate.get(field)
+  if (actual != expected) {
+throw new DateTimeException(s"Conflict found: Field $field $actual 
differs from" +
+  s" $field $expected derived from $candidate")

Review comment:
   can you show an example of this error message? let's see if it looks 
good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


cloud-fan commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437834522



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
##
@@ -433,4 +433,35 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
   assert(formatter.format(date(1970, 4, 10)) == "100")
 }
   }
+
+  test("SPARK-31939: Fix Parsing day of year when year field pattern is 
missing") {
+// resolved to queryable LocaleDate or fail directly
+val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true)
+assert(f0.parse("2020-29-60") === date(2020, 2, 29))
+assertParsingError(f0.parse("2020-02-60"))
+val f1 = TimestampFormatter("-MM-DD", UTC, isParsing = true)
+assert(f1.parse("2020-02-60") === date(2020, 2, 29))
+assertParsingError(f1.parse("2020-03-60"))
+val f2 = TimestampFormatter("-MM-dd-DD", UTC, isParsing = true)
+assert(f2.parse("2020-02-29-60") === date(2020, 2, 29))
+assertParsingError(f2.parse("2020-03-01-60"))
+val f3 = TimestampFormatter("-DDD", UTC, isParsing = true)
+assert(f3.parse("2020-366") === date(2020, 12, 31))
+assertParsingError(f3.parse("2019-366"))
+
+// unresolved and need to check manually(SPARK-31939 fixed)
+val f4 = TimestampFormatter("DDD", UTC, isParsing = true)
+assert(f4.parse("365") === date(1970, 12, 31))
+assertParsingError(f4.parse("366")) // 1970 is not a leap year
+val f5 = TimestampFormatter("MM-DD", UTC, isParsing = true)
+assert(f5.parse("03-60") === date(1970, 3, 1))
+assertParsingError(f5.parse("02-60"))
+val f6 = TimestampFormatter("MM-dd-DD", UTC, isParsing = true)
+assert(f6.parse("02-28-59") === date(1970, 2, 28))
+assertParsingError(f6.parse("02-28-60"))
+assertParsingError(f6.parse("02-28-58"))
+val f7 = TimestampFormatter("dd-DD", UTC, isParsing = true)
+assert(f7.parse("28-59") === date(1970, 2, 28))
+assertParsingError(f7.parse("27-59"))
+  }

Review comment:
   can we also add tests in the .sql file?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


cloud-fan commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437833956



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
##
@@ -433,4 +433,35 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
   assert(formatter.format(date(1970, 4, 10)) == "100")
 }
   }
+
+  test("SPARK-31939: Fix Parsing day of year when year field pattern is 
missing") {
+// resolved to queryable LocaleDate or fail directly
+val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true)

Review comment:
   shall we test both timestamp and date formatter? We can put the test in 
the base class.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-09 Thread GitBox


srowen commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-641691044


   @HeartSaVioR no there are many more, just look for `closeableIterator()`. Of 
course the question is where it's missing -- line 42, others. But the pattern 
nearly works. In any event, regardless of post-hoc cleanup, it's best to close 
these much more proactively where possible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing

2020-06-09 Thread GitBox


cloud-fan commented on a change in pull request #28766:
URL: https://github.com/apache/spark/pull/28766#discussion_r437833606



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##
@@ -39,6 +39,18 @@ trait DateTimeFormatterHelper {
 }
   }
 
+  private def verifyLocalDate(
+  accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): 
Unit = {
+if (accessor.isSupported(field) && candidate.isSupported(field)) {

Review comment:
   `candidate.isSupported(field)` this is always true?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD

2020-06-09 Thread GitBox


uncleGen commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-641690951


   @maropu I have updated the PR description, take a review please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >