[GitHub] [spark] AmplabJenkins commented on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins commented on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642426069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425765 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
gengliangwang commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-642425715 @wangyum @maropu @viirya @dilipbiswal @AngersZh @cloud-fan Thanks for the review. I think this PR is ready to be merged once the tests are passed. Let me know if you still have more comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123824/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA commented on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642425419 **[Test build #123796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123796/testReport)** for PR 28791 at commit [`dd4ab2e`](https://github.com/apache/spark/commit/dd4ab2e75db2e3a3ea136d907e71aa24f3f0bab9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA removed a comment on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642325096 **[Test build #123796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123796/testReport)** for PR 28791 at commit [`dd4ab2e`](https://github.com/apache/spark/commit/dd4ab2e75db2e3a3ea136d907e71aa24f3f0bab9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642424761 **[Test build #123824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642323162 **[Test build #123795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123795/testReport)** for PR 28764 at commit [`cace933`](https://github.com/apache/spark/commit/cace933e23bf3af44a12e64150687bcfee350c01). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425084 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425022 **[Test build #123795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123795/testReport)** for PR 28764 at commit [`cace933`](https://github.com/apache/spark/commit/cace933e23bf3af44a12e64150687bcfee350c01). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425069 **[Test build #123824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642424761 **[Test build #123824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438556665 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.0.0") Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438556136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala ## @@ -326,20 +326,34 @@ object RebaseDateTime { */ private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = { val instant = microsToInstant(micros) -var ldt = instant.atZone(zoneId).toLocalDateTime +val zonedDateTime = instant.atZone(zoneId) +var ldt = zonedDateTime.toLocalDateTime if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) { ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime) } val cal = new Calendar.Builder() - // `gregory` is a hybrid calendar that supports both - // the Julian and Gregorian calendar systems + // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems .setCalendarType("gregory") .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth) .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond) - // Local time-line can overlaps, such as at an autumn daylight savings cutover. - // This setting selects the original local timestamp mapped to the given `micros`. - .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt) .build() +// A local timestamp can have 2 instants in the cases of switching from: +// 1. Summer to winter time. +// 2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST +// to HKT on 18 November, 1945 01:59:59 AM. +// Below we check that the original `instant` is earlier or later instant. If it is an earlier +// instant, we take the standard and DST offsets of the previous day otherwise of the next one. +val trans = zoneId.getRules.getTransition(ldt) +if (trans != null && trans.isOverlap) { Review comment: I wrote the comment above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438556301 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: via pre-calculated offsets and switch points. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642422771 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642422771 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422303 **[Test build #123823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123823/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
maropu commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642421193 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
zhengruifeng commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-642420940 @mengxr OK, I will be more patient for reviewing. actually, I did not ping Owen in some of those PRs, I will involve more ML committers/contributors in future PRs and tickets. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28751: [SPARK-31926][SQL][test-hive1.2] Fix concurrency issue for ThriftCLIService to getPortNumber
cloud-fan commented on a change in pull request #28751: URL: https://github.com/apache/spark/pull/28751#discussion_r438553842 ## File path: project/SparkBuild.scala ## @@ -480,7 +480,8 @@ object SparkParallelTestGrouping { "org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite", "org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite", "org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite", -"org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInBinarySuite", "org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite" Review comment: Can we just run these 2 test suites one by one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420191 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123808/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420188 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642374887 **[Test build #123808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123808/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420188 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642419817 **[Test build #123808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123808/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
cloud-fan commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438550078 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala ## @@ -326,20 +326,34 @@ object RebaseDateTime { */ private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = { val instant = microsToInstant(micros) -var ldt = instant.atZone(zoneId).toLocalDateTime +val zonedDateTime = instant.atZone(zoneId) +var ldt = zonedDateTime.toLocalDateTime if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) { ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime) } val cal = new Calendar.Builder() - // `gregory` is a hybrid calendar that supports both - // the Julian and Gregorian calendar systems + // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems .setCalendarType("gregory") .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth) .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond) - // Local time-line can overlaps, such as at an autumn daylight savings cutover. - // This setting selects the original local timestamp mapped to the given `micros`. - .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt) .build() +// A local timestamp can have 2 instants in the cases of switching from: +// 1. Summer to winter time. +// 2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST +// to HKT on 18 November, 1945 01:59:59 AM. +// Below we check that the original `instant` is earlier or later instant. If it is an earlier +// instant, we take the standard and DST offsets of the previous day otherwise of the next one. +val trans = zoneId.getRules.getTransition(ldt) +if (trans != null && trans.isOverlap) { Review comment: when will we go into this expensive branch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
cloud-fan commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438549793 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: what do you mean by "optimized rebase"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413943 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123810/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413824 **[Test build #123810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123810/testReport)** for PR 28593 at commit [`a6a9bd4`](https://github.com/apache/spark/commit/a6a9bd431fa401be36173a2866f6d56138472f2d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642378726 **[Test build #123810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123810/testReport)** for PR 28593 at commit [`a6a9bd4`](https://github.com/apache/spark/commit/a6a9bd431fa401be36173a2866f6d56138472f2d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413936 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
AmplabJenkins removed a comment on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
AmplabJenkins commented on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
SparkQA commented on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413193 **[Test build #123822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123822/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438545420 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.0.0") Review comment: Let's change it to 3.1.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-642410273 cc @hvanhovell FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r437941258 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 + */ + def nth_value(columnName: String, offset: Int): Column = { Review comment: @beliefer, how is it different from `lag`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
AmplabJenkins removed a comment on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408668 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
AmplabJenkins commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408668 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
SparkQA commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408240 **[Test build #123821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123821/testReport)** for PR 28776 at commit [`da8d48d`](https://github.com/apache/spark/commit/da8d48d7984dd523f44c564ead9f7d5fb9cdd4ef). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
HyukjinKwon commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642406590 Thank you @dongjoon-hyun! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
dongjoon-hyun closed pull request #28798: URL: https://github.com/apache/spark/pull/28798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
dongjoon-hyun commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642406211 Merged to master/3.0. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
AmplabJenkins removed a comment on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642404308 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
AmplabJenkins commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642404308 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
SparkQA removed a comment on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642394190 **[Test build #123820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123820/testReport)** for PR 28798 at commit [`acafffc`](https://github.com/apache/spark/commit/acafffc4eed5f518f3dbf153e3b77379757b8968). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
SparkQA commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642403994 **[Test build #123820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123820/testReport)** for PR 28798 at commit [`acafffc`](https://github.com/apache/spark/commit/acafffc4eed5f518f3dbf153e3b77379757b8968). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28790: [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package
dongjoon-hyun commented on pull request #28790: URL: https://github.com/apache/spark/pull/28790#issuecomment-642399726 Since this is not related to UT running, merged to master/3.0. Thank you all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28790: [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package
dongjoon-hyun closed pull request #28790: URL: https://github.com/apache/spark/pull/28790 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642397230 Got it. Thanks, @gengliangwang . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642396828 ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/SSLOptions.scala:71: type Server is not a member of object org.eclipse.jetty.util.ssl.SslContextFactory [error] val sslContextFactory = new SslContextFactory.Server() [error] ^ [info] No documentation generated with unsuccessful compiler run [error] one error found ``` It seems to be a different commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun edited a comment on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642396828 ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/SSLOptions.scala:71: type Server is not a member of object org.eclipse.jetty.util.ssl.SslContextFactory [error] val sslContextFactory = new SslContextFactory.Server() [error] ^ [info] No documentation generated with unsuccessful compiler run [error] one error found ``` It seems to be a different commit. I'll take a look~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
gengliangwang commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642396997 The latest run is ongoing and I don't think the error was related https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123818/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642396556 Is it the same at the last commit, e3bb417? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
HyukjinKwon commented on pull request #28795: URL: https://github.com/apache/spark/pull/28795#issuecomment-642396796 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642396236 Oh.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
dongjoon-hyun closed pull request #28795: URL: https://github.com/apache/spark/pull/28795 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
AmplabJenkins commented on pull request #28795: URL: https://github.com/apache/spark/pull/28795#issuecomment-642395035 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
AmplabJenkins removed a comment on pull request #28795: URL: https://github.com/apache/spark/pull/28795#issuecomment-642395035 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
SparkQA removed a comment on pull request #28795: URL: https://github.com/apache/spark/pull/28795#issuecomment-642382801 **[Test build #123811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123811/testReport)** for PR 28795 at commit [`e7a9974`](https://github.com/apache/spark/commit/e7a99746deea89e98cd800c5cdee9a8cedbf6088). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
SparkQA commented on pull request #28795: URL: https://github.com/apache/spark/pull/28795#issuecomment-642394663 **[Test build #123811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123811/testReport)** for PR 28795 at commit [`e7a9974`](https://github.com/apache/spark/commit/e7a99746deea89e98cd800c5cdee9a8cedbf6088). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
gengliangwang commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642394456 ^^^ the document generation error seems not related. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
AmplabJenkins removed a comment on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642394362 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
AmplabJenkins commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642394362 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
SparkQA commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642394190 **[Test build #123820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123820/testReport)** for PR 28798 at commit [`acafffc`](https://github.com/apache/spark/commit/acafffc4eed5f518f3dbf153e3b77379757b8968). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #26016: [SPARK-24914][SQL] Calculated table statistic to improve data size estimate for ORC which manually settable for other columnar formats
attilapiros commented on pull request #26016: URL: https://github.com/apache/spark/pull/26016#issuecomment-642393978 gentle ping @dongjoon-hyun @dbtsai This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642392722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123804/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins removed a comment on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642392773 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123814/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642392713 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA removed a comment on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642386799 **[Test build #123814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123814/testReport)** for PR 28796 at commit [`58b5159`](https://github.com/apache/spark/commit/58b51596d6e8b5d61b3b52c54d96b07384dc4498). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins removed a comment on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642392767 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28790: [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package
HyukjinKwon commented on pull request #28790: URL: https://github.com/apache/spark/pull/28790#issuecomment-642392791 Made a PR for https://github.com/apache/spark/pull/28790#issuecomment-642383125, https://github.com/apache/spark/pull/28798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
HyukjinKwon opened a new pull request #28798: URL: https://github.com/apache/spark/pull/28798 ### What changes were proposed in this pull request? This is similar with https://github.com/apache/spark/commit/64cb6f7066134a0b9e441291992d2da73de5d918 The test `StreamingLogisticRegressionWithSGDTests.test_training_and_prediction` seems also flaky. This PR just increases the timeout to 3 mins too. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123787/testReport/pyspark.mllib.tests.test_streaming_algorithms/StreamingLogisticRegressionWithSGDTests/test_training_and_prediction/ ``` Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 330, in test_training_and_prediction eventually(condition, timeout=60.0) File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/testing/utils.py", line 90, in eventually % (timeout, lastValue)) AssertionError: Test failed due to timeout after 60 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, 0.7, 0.78, 0.8, 0.74, 0.77, 0.75, 0.76, 0.76, 0.75 ``` ### Why are the changes needed? To make PR builds more stable. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Jenkins will test them out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642392767 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642392708 **[Test build #123814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123814/testReport)** for PR 28796 at commit [`58b5159`](https://github.com/apache/spark/commit/58b51596d6e8b5d61b3b52c54d96b07384dc4498). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642392713 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
dongjoon-hyun commented on a change in pull request #28795: URL: https://github.com/apache/spark/pull/28795#discussion_r438529975 ## File path: python/pyspark/sql/tests/test_udf.py ## @@ -357,6 +359,32 @@ def test_udf_registration_returns_udf(self): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg from df group by name order by name desc").first() +self.assertEqual(row.asDict(), Row(name='b', avg=102.0).asDict()) Review comment: Got it~ No problem~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
SparkQA commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642392143 **[Test build #123804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123804/testReport)** for PR 28769 at commit [`a9983a9`](https://github.com/apache/spark/commit/a9983a9874ed07afc28d2a4400de025353ceffbd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
SparkQA removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642344609 **[Test build #123804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123804/testReport)** for PR 28769 at commit [`a9983a9`](https://github.com/apache/spark/commit/a9983a9874ed07afc28d2a4400de025353ceffbd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28797: [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber
dongjoon-hyun commented on pull request #28797: URL: https://github.com/apache/spark/pull/28797#issuecomment-642392011 Thank you, @yaooqinn ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
dongjoon-hyun closed pull request #28796: URL: https://github.com/apache/spark/pull/28796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP][test-hadoop3.2] Fix the test case for Hadoop2/3
AmplabJenkins removed a comment on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642390807 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Yarn Hadoop classpath for no-hadoop build
dbtsai commented on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642390812 @tgravescs +1 with detailed documentation. Users can alwasys turn `spark.yarn.populateHadoopClasspath` to `true` if it's desired. I'll work on the documentation. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on pull request #28787: URL: https://github.com/apache/spark/pull/28787#issuecomment-642390800 The build https://github.com/apache/spark/pull/28787#issuecomment-642292477 failed on the assert for JDK calls: ```scala val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) assert(earlierMicros + MICROS_PER_HOUR === laterMicros) ``` because jenkins uses "old" JDK: ``` JENKINS_MASTER_HOSTNAME=amp-jenkins-master JAVA_HOME=/usr/java/jdk1.8.0_191 ``` which has an outdated time zone database, see https://bugs.openjdk.java.net/browse/JDK-8228469 ``` Hong Kong's 1941-06-15 spring-forward transition was at 03:00, not 03:30. Its 1945 transition from JST to HKT was on 11-18 at 02:00, not 09-15 at 00:00. In 1946 its spring-forward transition was on 04-21 at 00:00, not the previous day at 03:30. From 1946 through 1952 its fall-back transitions occurred at 04:30, not at 03:30. In 1947 its fall-back transition was on 11-30, not 12-30. (Thanks to P Chan.) ``` @cloud-fan @HyukjinKwon @dongjoon-hyun @srowen Can we upgrade JDK on jenkins machines? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP][test-hadoop3.2] Fix the test case for Hadoop2/3
AmplabJenkins commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642390807 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28782: [SPARK-31954][SQL] Delete duplicate testcase in HiveQuerySuite
SparkQA commented on pull request #28782: URL: https://github.com/apache/spark/pull/28782#issuecomment-642390533 **[Test build #123819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123819/testReport)** for PR 28782 at commit [`c7b81d2`](https://github.com/apache/spark/commit/c7b81d24f3f19de202bf82ab598efd41efd38bfa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP][test-hadoop3.2] Fix the test case for Hadoop2/3
SparkQA commented on pull request #28796: URL: https://github.com/apache/spark/pull/28796#issuecomment-642390537 **[Test build #123818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123818/testReport)** for PR 28796 at commit [`e3bb417`](https://github.com/apache/spark/commit/e3bb417220c3d77e0cba0715fe31438f853e546b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28777: [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs
HyukjinKwon commented on pull request #28777: URL: https://github.com/apache/spark/pull/28777#issuecomment-642390534 Merged to branch-3.0 as well! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28751: [SPARK-31926][SQL][test-hive1.2] Fix concurrency issue for ThriftCLIService to getPortNumber
yaooqinn commented on a change in pull request #28751: URL: https://github.com/apache/spark/pull/28751#discussion_r438528186 ## File path: project/SparkBuild.scala ## @@ -480,7 +480,8 @@ object SparkParallelTestGrouping { "org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite", "org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite", "org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite", -"org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInBinarySuite", "org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite" Review comment: Any way to run these test JVM-individually with maven? It seems not to be able to start 2 thrift servers with different kinds of transport modes on the shared spark session in one JVM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28795: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
HyukjinKwon commented on a change in pull request #28795: URL: https://github.com/apache/spark/pull/28795#discussion_r438527805 ## File path: python/pyspark/sql/tests/test_udf.py ## @@ -357,6 +359,32 @@ def test_udf_registration_returns_udf(self): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg from df group by name order by name desc").first() +self.assertEqual(row.asDict(), Row(name='b', avg=102.0).asDict()) Review comment: I think we could compare them as are. It's just to prevent an issue such as SPARK-29748 or similar issues in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #28796: [SPARK-31935][SQL][TESTS][FOLLOWUP][test-hadoop3.2] Fix the test case for Hadoop2/3
gengliangwang commented on a change in pull request #28796: URL: https://github.com/apache/spark/pull/28796#discussion_r438527195 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala ## @@ -142,7 +142,8 @@ class DataSourceSuite extends SharedSparkSession with PrivateMethodTester { val message = intercept[java.io.IOException] { dataSource invokePrivate checkAndGlobPathIfNecessary(false, false) }.getMessage -assert(message.equals("No FileSystem for scheme: nonexistsFs")) +val expectMessage = "No FileSystem for scheme nonexistFS" Review comment: I see :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
AmplabJenkins removed a comment on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642388783 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123802/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on pull request #28782: [SPARK-31954][SQL] Delete duplicate testcase in HiveQuerySuite
GuoPhilipse commented on pull request #28782: URL: https://github.com/apache/spark/pull/28782#issuecomment-642389039 have adjusted test case order and rename test result file to look more pretty This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28797: [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber
AmplabJenkins removed a comment on pull request #28797: URL: https://github.com/apache/spark/pull/28797#issuecomment-642388995 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org