[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
AmplabJenkins removed a comment on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641736976 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
AmplabJenkins commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641736976 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
SparkQA removed a comment on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641721923 **[Test build #123725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)** for PR 28743 at commit [`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
moskvax commented on a change in pull request #28743: URL: https://github.com/apache/spark/pull/28743#discussion_r437872692 ## File path: python/pyspark/sql/pandas/serializers.py ## @@ -150,15 +151,22 @@ def _create_batch(self, series): series = ((s, None) if not isinstance(s, (list, tuple)) else s for s in series) def create_array(s, t): -mask = s.isnull() +# Create with __arrow_array__ if the series' backing array implements it +series_array = getattr(s, 'array', s._values) +if hasattr(series_array, "__arrow_array__"): +return series_array.__arrow_array__(type=t) + # Ensure timestamp series are in expected form for Spark internal representation if t is not None and pa.types.is_timestamp(t): s = _check_series_convert_timestamps_internal(s, self._timezone) -elif type(s.dtype) == pd.CategoricalDtype: +elif is_categorical_dtype(s.dtype): Review comment: By the way, this change was made as `CategoricalDtype` is only imported into the root pandas namespace after pandas 0.24.0, which was causing `AttributeError` when testing with earlier versions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
SparkQA commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641736450 **[Test build #123725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)** for PR 28743 at commit [`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-641736377 > Also, could you add some benchmark classes in https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark ? Hello @maropu, I checked available benchmarks and I can see that there is already benchmark that can be utilized. https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala I am using Databrick’s Toolkit for all my testing (10TB, 100TB datasets). TPCDSQueryBenchmark is based on Databrick’s Toolkit. I was able to test spilling with TPCDSQueryBenchmark benchmark. I executed benchmark in the following way: /opt/spark/bin/spark-submit --class org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --conf 'spark.sql.sortMergeJoinExec.buffer.spill.threshold=6000' --conf 'spark.sql.sortMergeJoinExec.buffer.in.memory.threshold=1000' '/tmp/spark-sql_2.11-2.4.6-SNAPSHOT-tests.jar' --data-location '/user/testusera1/tpcds/datasets-1g/sf1-parquet/useDecimal=true,useDate=true,filterNull=false' --query-filter 'q14a' It runs fine and I am able to pass to it Spark config parameters to trigger spilling. I believe that we do not need new benchmark. Do you agree? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
gengliangwang commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641735784 @zhli1142015 sorry I left comments in the code before I read the discussion in the PR. So, before you update the related code, could you describe an end-to-end use case that can reproduce the error page in PR description? Does it only happen when leveldb is evicted or UI server is shutdown? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
yaooqinn commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437871769 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala ## @@ -433,4 +433,35 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(formatter.format(date(1970, 4, 10)) == "100") } } + + test("SPARK-31939: Fix Parsing day of year when year field pattern is missing") { +// resolved to queryable LocaleDate or fail directly +val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true) Review comment: Sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
yaooqinn commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437871990 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -39,6 +39,18 @@ trait DateTimeFormatterHelper { } } + private def verifyLocalDate( + accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = { +if (accessor.isSupported(field) && candidate.isSupported(field)) { Review comment: For the time being, yes. I can remove this condition ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -39,6 +39,18 @@ trait DateTimeFormatterHelper { } } + private def verifyLocalDate( + accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = { +if (accessor.isSupported(field) && candidate.isSupported(field)) { + val actual = accessor.get(field) + val expected = candidate.get(field) + if (actual != expected) { +throw new DateTimeException(s"Conflict found: Field $field $actual differs from" + + s" $field $expected derived from $candidate") Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins removed a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641732806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
gengliangwang commented on a change in pull request #28769: URL: https://github.com/apache/spark/pull/28769#discussion_r437870387 ## File path: common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java ## @@ -276,6 +276,41 @@ public void testNegativeIndexValues() throws Exception { assertEquals(expected, results); } + @Test + public void testCloseLevelDBIterator() throws Exception { +// SPARK-31929: test when LevelDB.close() is called, related LevelDBIterators +// are closed. And files opened by iterators are also closed. +File dbpathForCloseTest = File +.createTempFile( Review comment: please change to indents to two spaces in this file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641732806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
gengliangwang commented on a change in pull request #28769: URL: https://github.com/apache/spark/pull/28769#discussion_r437869878 ## File path: common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java ## @@ -189,7 +198,12 @@ public void delete(Class type, Object naturalKey) throws Exception { @Override public Iterator iterator() { try { - return new LevelDBIterator<>(type, LevelDB.this, this); + LevelDBIterator iterator = new LevelDBIterator<>( + type, Review comment: Nit: put all the parameters to one line? ``` LevelDBIterator iterator = new LevelDBIterator<>(type, LevelDB.this, this);``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
dilipbiswal commented on pull request #28773: URL: https://github.com/apache/spark/pull/28773#issuecomment-641732487 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
SparkQA removed a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641689848 **[Test build #123719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123719/testReport)** for PR 28412 at commit [`1e514b9`](https://github.com/apache/spark/commit/1e514b910a56b719a08d6f7a7689a2a53dcc06a5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
gengliangwang commented on a change in pull request #28769: URL: https://github.com/apache/spark/pull/28769#discussion_r437869753 ## File path: common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java ## @@ -256,6 +275,7 @@ void closeIterator(LevelDBIterator it) throws IOException { DB _db = this._db.get(); if (_db != null) { it.close(); +iteratorTracker.remove(it); Review comment: shall we remove it even when `_db` is null? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
SparkQA commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641732044 **[Test build #123719 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123719/testReport)** for PR 28412 at commit [`1e514b9`](https://github.com/apache/spark/commit/1e514b910a56b719a08d6f7a7689a2a53dcc06a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
AmplabJenkins commented on pull request #28773: URL: https://github.com/apache/spark/pull/28773#issuecomment-641725711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
AmplabJenkins removed a comment on pull request #28773: URL: https://github.com/apache/spark/pull/28773#issuecomment-641725711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
SparkQA removed a comment on pull request #28773: URL: https://github.com/apache/spark/pull/28773#issuecomment-641651098 **[Test build #123711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123711/testReport)** for PR 28773 at commit [`1013ac8`](https://github.com/apache/spark/commit/1013ac8064c1e380f4be4c297c165fae1a20602e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28773: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
SparkQA commented on pull request #28773: URL: https://github.com/apache/spark/pull/28773#issuecomment-641724830 **[Test build #123711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123711/testReport)** for PR 28773 at commit [`1013ac8`](https://github.com/apache/spark/commit/1013ac8064c1e380f4be4c297c165fae1a20602e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
AmplabJenkins removed a comment on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641722284 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/28349/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
AmplabJenkins removed a comment on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641722278 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
AmplabJenkins commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641722278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
SparkQA commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641721923 **[Test build #123725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123725/testReport)** for PR 28743 at commit [`403f579`](https://github.com/apache/spark/commit/403f5796fdb7decf7c174b28efc6aa6bf2367186). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all
AmplabJenkins removed a comment on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-641719947 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123716/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641720102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641720102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all
SparkQA removed a comment on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-641679815 **[Test build #123716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123716/testReport)** for PR 27507 at commit [`ca6c1c5`](https://github.com/apache/spark/commit/ca6c1c5eef73ae1a3d33f17acebcfcc3d77d9d63). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all
AmplabJenkins removed a comment on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-641719937 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
SparkQA commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641719773 **[Test build #123724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123724/testReport)** for PR 28769 at commit [`84e9012`](https://github.com/apache/spark/commit/84e9012b49af708ca1b4e5f22f495d8ef38f3122). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all
AmplabJenkins commented on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-641719937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options
SparkQA commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-641719758 **[Test build #123723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123723/testReport)** for PR 28776 at commit [`f6cca6b`](https://github.com/apache/spark/commit/f6cca6b5163acef655d0c0e3d6cd4848b00314e0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all
SparkQA commented on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-641719783 **[Test build #123716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123716/testReport)** for PR 27507 at commit [`ca6c1c5`](https://github.com/apache/spark/commit/ca6c1c5eef73ae1a3d33f17acebcfcc3d77d9d63). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options
AmplabJenkins removed a comment on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-641717889 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641186609 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
cloud-fan commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641719161 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
moskvax commented on a change in pull request #28743: URL: https://github.com/apache/spark/pull/28743#discussion_r437858625 ## File path: python/pyspark/sql/tests/test_arrow.py ## @@ -30,10 +30,14 @@ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest from pyspark.util import _exception_message +from distutils.version import LooseVersion if have_pandas: import pandas as pd from pandas.util.testing import assert_frame_equal +pandas_version = LooseVersion(pd.__version__) +else: +pandas_version = LooseVersion("0") Review comment: Nice, will update This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
cloud-fan commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641718840 cc @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moskvax commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
moskvax commented on a change in pull request #28743: URL: https://github.com/apache/spark/pull/28743#discussion_r437858389 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -394,10 +394,11 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone): # Create the Spark schema from list of names passed in with Arrow types if isinstance(schema, (list, tuple)): -arrow_schema = pa.Schema.from_pandas(pdf, preserve_index=False) +inferred_types = [pa.infer_type(s, mask=s.isna(), from_pandas=True) + for s in (pdf[c] for c in pdf)] struct = StructType() -for name, field in zip(schema, arrow_schema): -struct.add(name, from_arrow_type(field.type), nullable=field.nullable) +for name, t in zip(schema, inferred_types): +struct.add(name, from_arrow_type(t), nullable=True) Review comment: `infer_type` only returns a type, not a `field`, which would supposedly have nullability information. But it appears that in the implementation of `Schema.from_pandas` ([link](https://github.com/apache/arrow/blob/b058cf0d1c26ad7984c104bb84322cc7dcc66f00/python/pyarrow/types.pxi#L1328)), inferring nullability was not actually done and the default `nullable=True` would always be returned. So this change is just following the existing behaviour of `Schema.from_pandas`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.
AmplabJenkins removed a comment on pull request #28774: URL: https://github.com/apache/spark/pull/28774#issuecomment-641718511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.
AmplabJenkins commented on pull request #28774: URL: https://github.com/apache/spark/pull/28774#issuecomment-641718511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28776: [3.0][SPARK-31935][SQL] Hadoop file system config should be effective in data source options
AmplabJenkins commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-641717889 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.
SparkQA removed a comment on pull request #28774: URL: https://github.com/apache/spark/pull/28774#issuecomment-641673972 **[Test build #123714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123714/testReport)** for PR 28774 at commit [`c2b6b86`](https://github.com/apache/spark/commit/c2b6b86d2c450d35d9451929eab71eaeed9801c1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #28776: [SPARK-31935][SQL] Hadoop file system config should be effective in data source options
gengliangwang opened a new pull request #28776: URL: https://github.com/apache/spark/pull/28776 ### What changes were proposed in this pull request? Mkae Hadoop file system config effective in data source options. From `org.apache.hadoop.fs.FileSystem.java`: ``` public static FileSystem get(URI uri, Configuration conf) throws IOException { String scheme = uri.getScheme(); String authority = uri.getAuthority(); if (scheme == null && authority == null) { // use default FS return get(conf); } if (scheme != null && authority == null) { // no authority URI defaultUri = getDefaultUri(conf); if (scheme.equals(defaultUri.getScheme())// if scheme matches default && defaultUri.getAuthority() != null) { // & default has authority return get(defaultUri, conf); // return default } } String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme); if (conf.getBoolean(disableCacheName, false)) { return createFileSystem(uri, conf); } return CACHE.get(uri, conf); } ``` Before changes, the file system configurations in data source options are not propagated in `DataSource.scala`. After changes, we can specify authority and URI schema related configurations for scanning file systems. This problem only exists in data source V1. In V2, we already use `sparkSession.sessionState.newHadoopConfWithOptions(options)` in `FileTable`. ### Why are the changes needed? Allow users to specify authority and URI schema related Hadoop configurations for file source reading. ### Does this PR introduce _any_ user-facing change? Yes, the file system related Hadoop configuration in data source option will be effective on reading. ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28776: [SPARK-31935][SQL] Hadoop file system config should be effective in data source options
gengliangwang commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-641717785 This PR backports https://github.com/apache/spark/pull/28760 to branch-3.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28774: [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function.
SparkQA commented on pull request #28774: URL: https://github.com/apache/spark/pull/28774#issuecomment-641717749 **[Test build #123714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123714/testReport)** for PR 28774 at commit [`c2b6b86`](https://github.com/apache/spark/commit/c2b6b86d2c450d35d9451929eab71eaeed9801c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xccui commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
xccui commented on pull request #28768: URL: https://github.com/apache/spark/pull/28768#issuecomment-641714353 Sorry that I didn't realize the potential impact of using `SparkException` or `NoSuchElementException`. +1 to this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641708063 > Of course relying on finalize is wrong, but I don't think the intent was to rely on finalize. Not closing these iterators is a bug. I see one case it clearly isn't; there may be others but haven't spotted them. It'd be nice to fix them all instead of the change in this patch but we may want to fix what we can see and also make the change in this patch for now. Thanks for your comments, i get your point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins removed a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641710484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641710484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
SparkQA commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641710231 **[Test build #123722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123722/testReport)** for PR 28412 at commit [`0d36074`](https://github.com/apache/spark/commit/0d360741d6c8cf2a1f4b597a860636e512226875). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
baohe-zhang commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437849260 ## File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala ## @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.io.IOException +import java.util.Collection +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicBoolean + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.kvstore._ + +/** + * A implementation of KVStore that accelerates event logs loading. + * + * When rebuilding the application state from event logs, HybridStore will + * write data to InMemoryStore at first and use a background thread to dump + * data to LevelDB once the writing to InMemoryStore is completed. + */ + +private[history] class HybridStore extends KVStore { + + private val inMemoryStore = new InMemoryStore() + + private var levelDB: LevelDB = null + + // Flag to indicate whether we should use inMemoryStore or levelDB + private[history] val shouldUseInMemoryStore = new AtomicBoolean(true) + + // A background thread that dumps data from inMemoryStore to levelDB + private var backgroundThread: Thread = null + + // A hash map that stores all classes that had been writen to inMemoryStore + private val klassMap = new ConcurrentHashMap[Class[_], Boolean] + + override def getMetadata[T](klass: Class[T]): T = { +getStore().getMetadata(klass) + } + + override def setMetadata(value: Object): Unit = { +getStore().setMetadata(value) + } + + override def read[T](klass: Class[T], naturalKey: Object): T = { +getStore().read(klass, naturalKey) + } + + override def write(value: Object): Unit = { +val store = getStore() +store.write(value) + +if (backgroundThread == null) { + // New classes won't be dumped once the background thread is started + klassMap.putIfAbsent(value.getClass(), true) +} + } + + override def delete(klass: Class[_], naturalKey: Object): Unit = { +if (backgroundThread != null) { + throw new RuntimeException("delete() shouldn't be called after " + +"the hybrid store begins switching to levelDB") +} + +getStore().delete(klass, naturalKey) + } + + override def view[T](klass: Class[T]): KVStoreView[T] = { +getStore().view(klass) + } + + override def count(klass: Class[_]): Long = { +getStore().count(klass) + } + + override def count(klass: Class[_], index: String, indexedValue: Object): Long = { +getStore().count(klass, index, indexedValue) + } + + override def close(): Unit = { +try { + if (backgroundThread != null && backgroundThread.isAlive()) { +// The background thread is still running, wait for it to finish +backgroundThread.join() + } + if (levelDB != null) { Review comment: Sorry I missed this comment. Yeah, you are right, join() can throw InterruptedException. I will update the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641708172 > So looks like there're two different issues - 1) resource leaks in general 2) concurrent usage on KV store. > > @srowen and me have been talking about the issue 1), while @zhli1142015 is pointing out 2) - race condition - as store can be closed while the other thread "already" obtained the iterator and not closed yet. It should help to avoid 2) if we fix the issue 1), but it doesn't completely prevent the chance of 2). Thanks for you summary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641708063 > Of course relying on finalize is wrong, but I don't think the intent was to rely on finalize. Not closing these iterators is a bug. I see one case it clearly isn't; there may be others but haven't spotted them. It'd be nice to fix them all instead of the change in this patch but we may want to fix what we can see and also make the change in this patch for now. Thanks for your commetns, i get your point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 edited a comment on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
Ngone51 edited a comment on pull request #28768: URL: https://github.com/apache/spark/pull/28768#issuecomment-641707231 cc @xccui @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
Ngone51 commented on pull request #28768: URL: https://github.com/apache/spark/pull/28768#issuecomment-641707231 cc @xccui This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration
AmplabJenkins removed a comment on pull request #28775: URL: https://github.com/apache/spark/pull/28775#issuecomment-641706855 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration
AmplabJenkins commented on pull request #28775: URL: https://github.com/apache/spark/pull/28775#issuecomment-641706855 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration
SparkQA commented on pull request #28775: URL: https://github.com/apache/spark/pull/28775#issuecomment-641706549 **[Test build #123721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123721/testReport)** for PR 28775 at commit [`3d55ef8`](https://github.com/apache/spark/commit/3d55ef8a63f1d6a698a63882d5421f4eb385240b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration
Ngone51 commented on pull request #28775: URL: https://github.com/apache/spark/pull/28775#issuecomment-641706359 Synced with @cloud-fan , it's still better to use `ConfigEntry`. cc @akshatb1 @srowen @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #28775: [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry instead of the constant for the configuration
Ngone51 opened a new pull request #28775: URL: https://github.com/apache/spark/pull/28775 ### What changes were proposed in this pull request? This PR replaces constant config with the `ConfigEntry`. ### Why are the changes needed? For better code maintenance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437846179 ## File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.io.IOException +import java.util.Collection +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicBoolean + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.kvstore._ + +/** + * An implementation of KVStore that accelerates event logs loading. + * + * When rebuilding the application state from event logs, HybridStore will + * write data to InMemoryStore at first and use a background thread to dump + * data to LevelDB once the app store is restored. We don't expect write + * operations (except the case for caching) after calling switch to level DB. + */ + +private[history] class HybridStore extends KVStore { + + private val inMemoryStore = new InMemoryStore() + + private var levelDB: LevelDB = null + + // Flag to indicate whether we should use inMemoryStore or levelDB + private val shouldUseInMemoryStore = new AtomicBoolean(true) + + // Flag to indicate whether this hybrid store is closed, use this flag + // to avoid starting background thread after the store is closed + private val closed = new AtomicBoolean(false) + + // A background thread that dumps data from inMemoryStore to levelDB + private var backgroundThread: Thread = null + + // A hash map that stores all classes that had been writen to inMemoryStore + private val klassMap = new ConcurrentHashMap[Class[_], Boolean] + + override def getMetadata[T](klass: Class[T]): T = { +getStore().getMetadata(klass) + } + + override def setMetadata(value: Object): Unit = { +getStore().setMetadata(value) + } + + override def read[T](klass: Class[T], naturalKey: Object): T = { +getStore().read(klass, naturalKey) + } + + override def write(value: Object): Unit = { +getStore().write(value) + +if (backgroundThread == null) { + // New classes won't be dumped once the background thread is started + klassMap.putIfAbsent(value.getClass(), true) +} + } + + override def delete(klass: Class[_], naturalKey: Object): Unit = { +if (backgroundThread != null) { + throw new IllegalStateException("delete() shouldn't be called after " + +"the hybrid store begins switching to levelDB") +} + +getStore().delete(klass, naturalKey) + } + + override def view[T](klass: Class[T]): KVStoreView[T] = { +getStore().view(klass) + } + + override def count(klass: Class[_]): Long = { +getStore().count(klass) + } + + override def count(klass: Class[_], index: String, indexedValue: Object): Long = { +getStore().count(klass, index, indexedValue) + } + + override def close(): Unit = { +closed.set(true) + +if (backgroundThread != null && backgroundThread.isAlive()) { + // The background thread is still running, wait for it to finish + backgroundThread.join() +} + +try { + if (levelDB != null) { +levelDB.close() + } +} catch { + case ioe: IOException => throw ioe +} finally { + inMemoryStore.close() +} + } + + override def removeAllByIndexValues[T]( + klass: Class[T], + index: String, + indexValues: Collection[_]): Boolean = { +if (backgroundThread != null) { + throw new IllegalStateException("removeAllByIndexValues() shouldn't be " + +"called after the hybrid store begins switching to levelDB") +} + +getStore().removeAllByIndexValues(klass, index, indexValues) + } + + def setLevelDB(levelDB: LevelDB): Unit = { +this.levelDB = levelDB + } + + /** + * This method is called when the writing is done for inMemoryStore. A + * background thread will be created and be started to dump data in inMemoryStore + * to levelDB. Once the dumping is completed, the underlying kvstore will be + * switched to levelDB. + */ + def switchToLevelDB(listener: HybridStore.SwitchToLevelDBListener): Unit = { +if (closed.get) { +
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437846179 ## File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.io.IOException +import java.util.Collection +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicBoolean + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.kvstore._ + +/** + * An implementation of KVStore that accelerates event logs loading. + * + * When rebuilding the application state from event logs, HybridStore will + * write data to InMemoryStore at first and use a background thread to dump + * data to LevelDB once the app store is restored. We don't expect write + * operations (except the case for caching) after calling switch to level DB. + */ + +private[history] class HybridStore extends KVStore { + + private val inMemoryStore = new InMemoryStore() + + private var levelDB: LevelDB = null + + // Flag to indicate whether we should use inMemoryStore or levelDB + private val shouldUseInMemoryStore = new AtomicBoolean(true) + + // Flag to indicate whether this hybrid store is closed, use this flag + // to avoid starting background thread after the store is closed + private val closed = new AtomicBoolean(false) + + // A background thread that dumps data from inMemoryStore to levelDB + private var backgroundThread: Thread = null + + // A hash map that stores all classes that had been writen to inMemoryStore + private val klassMap = new ConcurrentHashMap[Class[_], Boolean] + + override def getMetadata[T](klass: Class[T]): T = { +getStore().getMetadata(klass) + } + + override def setMetadata(value: Object): Unit = { +getStore().setMetadata(value) + } + + override def read[T](klass: Class[T], naturalKey: Object): T = { +getStore().read(klass, naturalKey) + } + + override def write(value: Object): Unit = { +getStore().write(value) + +if (backgroundThread == null) { + // New classes won't be dumped once the background thread is started + klassMap.putIfAbsent(value.getClass(), true) +} + } + + override def delete(klass: Class[_], naturalKey: Object): Unit = { +if (backgroundThread != null) { + throw new IllegalStateException("delete() shouldn't be called after " + +"the hybrid store begins switching to levelDB") +} + +getStore().delete(klass, naturalKey) + } + + override def view[T](klass: Class[T]): KVStoreView[T] = { +getStore().view(klass) + } + + override def count(klass: Class[_]): Long = { +getStore().count(klass) + } + + override def count(klass: Class[_], index: String, indexedValue: Object): Long = { +getStore().count(klass, index, indexedValue) + } + + override def close(): Unit = { +closed.set(true) + +if (backgroundThread != null && backgroundThread.isAlive()) { + // The background thread is still running, wait for it to finish + backgroundThread.join() +} + +try { + if (levelDB != null) { +levelDB.close() + } +} catch { + case ioe: IOException => throw ioe +} finally { + inMemoryStore.close() +} + } + + override def removeAllByIndexValues[T]( + klass: Class[T], + index: String, + indexValues: Collection[_]): Boolean = { +if (backgroundThread != null) { + throw new IllegalStateException("removeAllByIndexValues() shouldn't be " + +"called after the hybrid store begins switching to levelDB") +} + +getStore().removeAllByIndexValues(klass, index, indexValues) + } + + def setLevelDB(levelDB: LevelDB): Unit = { +this.levelDB = levelDB + } + + /** + * This method is called when the writing is done for inMemoryStore. A + * background thread will be created and be started to dump data in inMemoryStore + * to levelDB. Once the dumping is completed, the underlying kvstore will be + * switched to levelDB. + */ + def switchToLevelDB(listener: HybridStore.SwitchToLevelDBListener): Unit = { +if (closed.get) { +
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437845789 ## File path: core/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala ## @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history + +import java.io.IOException +import java.util.Collection +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicBoolean + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.kvstore._ + +/** + * A implementation of KVStore that accelerates event logs loading. + * + * When rebuilding the application state from event logs, HybridStore will + * write data to InMemoryStore at first and use a background thread to dump + * data to LevelDB once the writing to InMemoryStore is completed. + */ + +private[history] class HybridStore extends KVStore { + + private val inMemoryStore = new InMemoryStore() + + private var levelDB: LevelDB = null + + // Flag to indicate whether we should use inMemoryStore or levelDB + private[history] val shouldUseInMemoryStore = new AtomicBoolean(true) + + // A background thread that dumps data from inMemoryStore to levelDB + private var backgroundThread: Thread = null + + // A hash map that stores all classes that had been writen to inMemoryStore + private val klassMap = new ConcurrentHashMap[Class[_], Boolean] + + override def getMetadata[T](klass: Class[T]): T = { +getStore().getMetadata(klass) + } + + override def setMetadata(value: Object): Unit = { +getStore().setMetadata(value) + } + + override def read[T](klass: Class[T], naturalKey: Object): T = { +getStore().read(klass, naturalKey) + } + + override def write(value: Object): Unit = { +val store = getStore() +store.write(value) + +if (backgroundThread == null) { + // New classes won't be dumped once the background thread is started + klassMap.putIfAbsent(value.getClass(), true) +} + } + + override def delete(klass: Class[_], naturalKey: Object): Unit = { +if (backgroundThread != null) { + throw new RuntimeException("delete() shouldn't be called after " + +"the hybrid store begins switching to levelDB") +} + +getStore().delete(klass, naturalKey) + } + + override def view[T](klass: Class[T]): KVStoreView[T] = { +getStore().view(klass) + } + + override def count(klass: Class[_]): Long = { +getStore().count(klass) + } + + override def count(klass: Class[_], index: String, indexedValue: Object): Long = { +getStore().count(klass, index, indexedValue) + } + + override def close(): Unit = { +try { + if (backgroundThread != null && backgroundThread.isAlive()) { +// The background thread is still running, wait for it to finish +backgroundThread.join() + } + if (levelDB != null) { Review comment: Friendly reminder about the comment here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #27648: [SPARK-24994][SQL] : Support filter pushdown for short and byte without explicit casting
HyukjinKwon closed pull request #27648: URL: https://github.com/apache/spark/pull/27648 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
HyukjinKwon commented on a change in pull request #28743: URL: https://github.com/apache/spark/pull/28743#discussion_r437844741 ## File path: python/pyspark/sql/tests/test_arrow.py ## @@ -30,10 +30,14 @@ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest from pyspark.util import _exception_message +from distutils.version import LooseVersion if have_pandas: import pandas as pd from pandas.util.testing import assert_frame_equal +pandas_version = LooseVersion(pd.__version__) +else: +pandas_version = LooseVersion("0") Review comment: I would do something like: ```python pandas_version = None if have_pandas: import pandas as pd from pandas.util.testing import assert_frame_equal pandas_version = LooseVersion(pd.__version__) ``` if the linter is not happy with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
HyukjinKwon commented on a change in pull request #28743: URL: https://github.com/apache/spark/pull/28743#discussion_r437843397 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -394,10 +394,11 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone): # Create the Spark schema from list of names passed in with Arrow types if isinstance(schema, (list, tuple)): -arrow_schema = pa.Schema.from_pandas(pdf, preserve_index=False) +inferred_types = [pa.infer_type(s, mask=s.isna(), from_pandas=True) + for s in (pdf[c] for c in pdf)] struct = StructType() -for name, field in zip(schema, arrow_schema): -struct.add(name, from_arrow_type(field.type), nullable=field.nullable) +for name, t in zip(schema, inferred_types): +struct.add(name, from_arrow_type(t), nullable=True) Review comment: Why don't we follow nullability anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
HyukjinKwon commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641702673 cc @BryanCutler FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
HeartSaVioR edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641702431 So looks like there're two different issues - 1) resource leaks in general 2) concurrent usage on KV store. @srowen and me have been talking about the issue 1), while @zhli1142015 is pointing out 2) - race condition - as store can be closed while the other thread "already" obtained the iterator and not closed yet. It should help to avoid 2) if we fix the issue 1), but it doesn't completely prevent the chance of 2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
HeartSaVioR commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641702431 So looks like there're two different issues - 1) resource leaks in general 2) concurrent usage on KV store. @srowen and me have been talking about the issue 1), while @zhli1142015 is pointing out 2) - race condition - as store can be closed while the other thread obtained the iterator. It should help to avoid 2) if we fix the issue 1), but it doesn't completely prevent the chance of 2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641701924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins removed a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641701924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
SparkQA commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641701635 **[Test build #123720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123720/testReport)** for PR 28412 at commit [`d6c7d98`](https://github.com/apache/spark/commit/d6c7d988bd9e39caebb9a33f8c01ee230b6c2a39). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
srowen commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641700733 Of course relying on finalize is wrong, but I don't think the intent was to rely on finalize. Not closing these iterators is a bug. I see one case it clearly isn't; there may be others but haven't spotted them. It'd be nice to fix them all instead of the change in this patch but we may want to fix what we can see and also make the change in this patch for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
HeartSaVioR commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641700315 Oh OK. Sorry I missed. At least we close it properly when calling `closeableIterator()`, but still in question for using KVStoreView directly. If we are confident on guaranteeing that we close the iterator in any case, `finalize` method would be just redundant. Instead, finalize method is there with excuse. https://github.com/apache/spark/blob/f3771c6b47d0b3aef10b86586289a1f675c7cfe2/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L194-L203 Here "eventually" would matter, and in reality it is even not guaranteed to be called. We know why it's deprecated in Java 9. If the method should be there to do something then we are missing something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
HeartSaVioR edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641700315 Oh OK. Sorry I missed. At least we close it properly when calling `closeableIterator()`, but still in question for using KVStoreView directly. If we are confident on guaranteeing that we close the iterator in any case, `finalize` method would be just redundant. Instead, finalize method is there with excuse. https://github.com/apache/spark/blob/f3771c6b47d0b3aef10b86586289a1f675c7cfe2/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L194-L203 Here "eventually" would matter (it could be after DB is closed, as the semantic of eventually), and in reality it is even not guaranteed to be called. We know why it's deprecated in Java 9. If the method should be there to do something then we are missing something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #28768: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
SaurabhChawla100 commented on pull request #28768: URL: https://github.com/apache/spark/pull/28768#issuecomment-641699117 > The change looks good - we may need to reflect the "actual change" into PR title and description, as it's no longer same as initial proposal. I have updated the PR title and description in this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641696834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641696834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641622318 **[Test build #123710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123710/testReport)** for PR 28733 at commit [`af33605`](https://github.com/apache/spark/commit/af3360538f28c2400b33a6c22c560b8c78d78c51). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641696287 **[Test build #123710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123710/testReport)** for PR 28733 at commit [`af33605`](https://github.com/apache/spark/commit/af3360538f28c2400b33a6c22c560b8c78d78c51). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait PredicateHelper extends Logging ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
baohe-zhang commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437837726 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) "trying again...") store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() retried = true - case e: Exception => store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() throw e } } +// Create a LevelDB and start a background thread to dump data to LevelDB +logInfo(s"Leasing disk manager space for app $appId / ${attempt.info.attemptId}...") Review comment: ok, will update it. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437837539 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) "trying again...") store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() retried = true - case e: Exception => store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() throw e } } +// Create a LevelDB and start a background thread to dump data to LevelDB +logInfo(s"Leasing disk manager space for app $appId / ${attempt.info.attemptId}...") Review comment: Foreground, before switching method is called. Yeah I meant that. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
baohe-zhang commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437837248 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) "trying again...") store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() retried = true - case e: Exception => store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() throw e } } +// Create a LevelDB and start a background thread to dump data to LevelDB +logInfo(s"Leasing disk manager space for app $appId / ${attempt.info.attemptId}...") Review comment: do you mean exception thrown in the foreground thread or in the background thread? If it's in foreground thread, we can use a try-catch block to close the store when an exception is thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641694556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
AmplabJenkins commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641694556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA removed a comment on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641612499 **[Test build #123707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123707/testReport)** for PR 28733 at commit [`934a412`](https://github.com/apache/spark/commit/934a41277ab764aea19fab6d1b99d0de041221e9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
SparkQA commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-641694008 **[Test build #123707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123707/testReport)** for PR 28733 at commit [`934a412`](https://github.com/apache/spark/commit/934a41277ab764aea19fab6d1b99d0de041221e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moskvax commented on pull request #28743: [SPARK-31920][PYTHON] Fix pandas conversion using Arrow with __arrow_array__ columns
moskvax commented on pull request #28743: URL: https://github.com/apache/spark/pull/28743#issuecomment-641693507 @HyukjinKwon @viirya Please review when you've got a moment. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641693298 cc. @vanzin @squito Please take a look at this PR - looks like this is a notable improvement on reducing load latency in SHS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon edited a comment on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935 BTW, I skimmed `Aggregate` related analysis and I think we're all good with the current change if I didn't miss anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935 BTW, I skimmed `Aggregator` related analysis and I think we're all good with the current change if I didn't miss anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on a change in pull request #28412: URL: https://github.com/apache/spark/pull/28412#discussion_r437833266 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -1263,17 +1241,34 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) "trying again...") store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() retried = true - case e: Exception => store.close() memoryManager.release(appId, attempt.info.attemptId) - lease.rollback() throw e } } +// Create a LevelDB and start a background thread to dump data to LevelDB +logInfo(s"Leasing disk manager space for app $appId / ${attempt.info.attemptId}...") Review comment: There are still chances here exceptions have been thrown, and then we’ll leak HybridStore instance with memory being allocated, and never be released. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
cloud-fan commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437834431 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -39,6 +39,18 @@ trait DateTimeFormatterHelper { } } + private def verifyLocalDate( + accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = { +if (accessor.isSupported(field) && candidate.isSupported(field)) { + val actual = accessor.get(field) + val expected = candidate.get(field) + if (actual != expected) { +throw new DateTimeException(s"Conflict found: Field $field $actual differs from" + + s" $field $expected derived from $candidate") Review comment: can you show an example of this error message? let's see if it looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
cloud-fan commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437834522 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala ## @@ -433,4 +433,35 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(formatter.format(date(1970, 4, 10)) == "100") } } + + test("SPARK-31939: Fix Parsing day of year when year field pattern is missing") { +// resolved to queryable LocaleDate or fail directly +val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true) +assert(f0.parse("2020-29-60") === date(2020, 2, 29)) +assertParsingError(f0.parse("2020-02-60")) +val f1 = TimestampFormatter("-MM-DD", UTC, isParsing = true) +assert(f1.parse("2020-02-60") === date(2020, 2, 29)) +assertParsingError(f1.parse("2020-03-60")) +val f2 = TimestampFormatter("-MM-dd-DD", UTC, isParsing = true) +assert(f2.parse("2020-02-29-60") === date(2020, 2, 29)) +assertParsingError(f2.parse("2020-03-01-60")) +val f3 = TimestampFormatter("-DDD", UTC, isParsing = true) +assert(f3.parse("2020-366") === date(2020, 12, 31)) +assertParsingError(f3.parse("2019-366")) + +// unresolved and need to check manually(SPARK-31939 fixed) +val f4 = TimestampFormatter("DDD", UTC, isParsing = true) +assert(f4.parse("365") === date(1970, 12, 31)) +assertParsingError(f4.parse("366")) // 1970 is not a leap year +val f5 = TimestampFormatter("MM-DD", UTC, isParsing = true) +assert(f5.parse("03-60") === date(1970, 3, 1)) +assertParsingError(f5.parse("02-60")) +val f6 = TimestampFormatter("MM-dd-DD", UTC, isParsing = true) +assert(f6.parse("02-28-59") === date(1970, 2, 28)) +assertParsingError(f6.parse("02-28-60")) +assertParsingError(f6.parse("02-28-58")) +val f7 = TimestampFormatter("dd-DD", UTC, isParsing = true) +assert(f7.parse("28-59") === date(1970, 2, 28)) +assertParsingError(f7.parse("27-59")) + } Review comment: can we also add tests in the .sql file? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
cloud-fan commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437833956 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala ## @@ -433,4 +433,35 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(formatter.format(date(1970, 4, 10)) == "100") } } + + test("SPARK-31939: Fix Parsing day of year when year field pattern is missing") { +// resolved to queryable LocaleDate or fail directly +val f0 = TimestampFormatter("-dd-DD", UTC, isParsing = true) Review comment: shall we test both timestamp and date formatter? We can put the test in the base class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
srowen commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641691044 @HeartSaVioR no there are many more, just look for `closeableIterator()`. Of course the question is where it's missing -- line 42, others. But the pattern nearly works. In any event, regardless of post-hoc cleanup, it's best to close these much more proactively where possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28766: [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing
cloud-fan commented on a change in pull request #28766: URL: https://github.com/apache/spark/pull/28766#discussion_r437833606 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -39,6 +39,18 @@ trait DateTimeFormatterHelper { } } + private def verifyLocalDate( + accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = { +if (accessor.isSupported(field) && candidate.isSupported(field)) { Review comment: `candidate.isSupported(field)` this is always true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD
uncleGen commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-641690951 @maropu I have updated the PR description, take a review please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org