[spark] branch master updated: [SPARK-27252][SQL] Make current_date() independent from time zones
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06abd06 [SPARK-27252][SQL] Make current_date() independent from time zones 06abd06 is described below commit 06abd06112965cd73417ccceacdbd94b6b3d2793 Author: Maxim Gekk AuthorDate: Thu Mar 28 18:44:08 2019 -0700 [SPARK-27252][SQL] Make current_date() independent from time zones ## What changes were proposed in this pull request? This makes the `CurrentDate` expression and `current_date` function independent from time zone settings. New result is number of days since epoch in `UTC` time zone. Previously, Spark shifted the current date (in `UTC` time zone) according the session time zone which violets definition of `DateType` - number of days since epoch (which is an absolute point in time, midnight of Jan 1 1970 in UTC time). The changes makes `CurrentDate` consistent to `CurrentTimestamp` which is independent from time zone too. ## How was this patch tested? The changes were tested by existing test suites like `DateExpressionsSuite`. Closes #24185 from MaxGekk/current-date. Lead-authored-by: Maxim Gekk Co-authored-by: Maxim Gekk Signed-off-by: Wenchen Fan --- docs/sql-migration-guide-upgrade.md| 4 +++- .../catalyst/expressions/datetimeExpressions.scala | 18 ++-- .../sql/catalyst/optimizer/finishAnalysis.scala| 25 +++--- .../expressions/DateExpressionsSuite.scala | 6 +++--- .../execution/streaming/MicroBatchExecution.scala | 2 +- .../scala/org/apache/spark/sql/functions.scala | 2 +- .../apache/spark/sql/DataFrameAggregateSuite.scala | 2 +- .../org/apache/spark/sql/DateFunctionsSuite.scala | 19 +--- 8 files changed, 39 insertions(+), 39 deletions(-) diff --git a/docs/sql-migration-guide-upgrade.md b/docs/sql-migration-guide-upgrade.md index b8597f0..3ba89c0 100644 --- a/docs/sql-migration-guide-upgrade.md +++ b/docs/sql-migration-guide-upgrade.md @@ -103,7 +103,9 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.4 and earlier, the `current_timestamp` function returns a timestamp with millisecond resolution only. Since Spark 3.0, the function can return the result with microsecond resolution if the underlying clock available on the system offers such resolution. - - In Spark version 2.4 abd earlier, when reading a Hive Serde table with Spark native data sources(parquet/orc), Spark will infer the actual file schema and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer the schema anymore. This should not cause any problems to end users, but if it does, please set `spark.sql.hive.caseSensitiveInferenceMode` to `INFER_AND_SAVE`. + - In Spark version 2.4 and earlier, when reading a Hive Serde table with Spark native data sources(parquet/orc), Spark will infer the actual file schema and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer the schema anymore. This should not cause any problems to end users, but if it does, please set `spark.sql.hive.caseSensitiveInferenceMode` to `INFER_AND_SAVE`. + + - In Spark version 2.4 and earlier, the `current_date` function returns the current date shifted according to the SQL config `spark.sql.session.timeZone`. Since Spark 3.0, the function always returns the current date in the `UTC` time zone. - Since Spark 3.0, `TIMESTAMP` literals are converted to strings using the SQL config `spark.sql.session.timeZone`, and `DATE` literals are formatted using the UTC time zone. In Spark version 2.4 and earlier, both conversions use the default time zone of the Java virtual machine. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 7878a87..3cda989 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.sql.Timestamp -import java.time.{Instant, LocalDate, ZoneId} +import java.time.{Instant, LocalDate, ZoneId, ZoneOffset} import java.time.temporal.IsoFields import java.util.{Locale, TimeZone} @@ -52,30 +52,26 @@ trait TimeZoneAwareExpression extends Expression { @transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get) } +// scalastyle:off line.size.limit /** - * Returns the current date at the start of query evaluation. + * Returns the current date in the UTC time zone at the start of query evaluation. * All calls of current_date within
[spark] branch branch-2.3 updated: [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new 530ec52 [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4) 530ec52 is described below commit 530ec5247ccbe049cea8747195fc2011e71ad0f9 Author: Shixiong Zhu AuthorDate: Thu Mar 28 11:13:11 2019 -0700 [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4) ## What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/24211 to 2.4 ## How was this patch tested? Jenkins Closes #24229 from zsxwing/SPARK-27275-2.4. Authored-by: Shixiong Zhu Signed-off-by: Wenchen Fan (cherry picked from commit 298e4fa6f8054c54e246f91b70d62174ccdb9413) Signed-off-by: Wenchen Fan --- .../spark/network/crypto/TransportCipher.java | 53 ++ .../spark/network/crypto/AuthEngineSuite.java | 85 ++ .../spark/network/crypto/AuthIntegrationSuite.java | 47 ++-- 3 files changed, 167 insertions(+), 18 deletions(-) diff --git a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java index e04524d..608350c 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java +++ b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java @@ -44,7 +44,8 @@ public class TransportCipher { @VisibleForTesting static final String ENCRYPTION_HANDLER_NAME = "TransportEncryption"; private static final String DECRYPTION_HANDLER_NAME = "TransportDecryption"; - private static final int STREAM_BUFFER_SIZE = 1024 * 32; + @VisibleForTesting + static final int STREAM_BUFFER_SIZE = 1024 * 32; private final Properties conf; private final String cipher; @@ -84,7 +85,8 @@ public class TransportCipher { return outIv; } - private CryptoOutputStream createOutputStream(WritableByteChannel ch) throws IOException { + @VisibleForTesting + CryptoOutputStream createOutputStream(WritableByteChannel ch) throws IOException { return new CryptoOutputStream(cipher, conf, ch, key, new IvParameterSpec(outIv)); } @@ -104,7 +106,8 @@ public class TransportCipher { .addFirst(DECRYPTION_HANDLER_NAME, new DecryptionHandler(this)); } - private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + @VisibleForTesting + static class EncryptionHandler extends ChannelOutboundHandlerAdapter { private final ByteArrayWritableChannel byteChannel; private final CryptoOutputStream cos; @@ -116,7 +119,12 @@ public class TransportCipher { @Override public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) throws Exception { - ctx.write(new EncryptedMessage(cos, msg, byteChannel), promise); + ctx.write(createEncryptedMessage(msg), promise); +} + +@VisibleForTesting +EncryptedMessage createEncryptedMessage(Object msg) { + return new EncryptedMessage(cos, msg, byteChannel); } @Override @@ -161,10 +169,12 @@ public class TransportCipher { } } - private static class EncryptedMessage extends AbstractFileRegion { + @VisibleForTesting + static class EncryptedMessage extends AbstractFileRegion { private final boolean isByteBuf; private final ByteBuf buf; private final FileRegion region; +private final long count; private long transferred; private CryptoOutputStream cos; @@ -186,11 +196,12 @@ public class TransportCipher { this.byteRawChannel = new ByteArrayWritableChannel(STREAM_BUFFER_SIZE); this.cos = cos; this.byteEncChannel = ch; + this.count = isByteBuf ? buf.readableBytes() : region.count(); } @Override public long count() { - return isByteBuf ? buf.readableBytes() : region.count(); + return count; } @Override @@ -242,22 +253,38 @@ public class TransportCipher { public long transferTo(WritableByteChannel target, long position) throws IOException { Preconditions.checkArgument(position == transfered(), "Invalid position."); + if (transferred == count) { +return 0; + } + + long totalBytesWritten = 0L; do { if (currentEncrypted == null) { encryptMore(); } -int bytesWritten = currentEncrypted.remaining(); -target.write(currentEncrypted); -bytesWritten -= currentEncrypted.remaining(); -transferred += bytesWritten; -if (!currentEncrypted.hasRemaining()) { +long remaining = currentEncrypted.remaining(); +if (remaining == 0) { + //
[spark] branch branch-2.4 updated: [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 298e4fa [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4) 298e4fa is described below commit 298e4fa6f8054c54e246f91b70d62174ccdb9413 Author: Shixiong Zhu AuthorDate: Thu Mar 28 11:13:11 2019 -0700 [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4) ## What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/24211 to 2.4 ## How was this patch tested? Jenkins Closes #24229 from zsxwing/SPARK-27275-2.4. Authored-by: Shixiong Zhu Signed-off-by: Wenchen Fan --- .../spark/network/crypto/TransportCipher.java | 53 ++ .../spark/network/crypto/AuthEngineSuite.java | 85 ++ .../spark/network/crypto/AuthIntegrationSuite.java | 47 ++-- 3 files changed, 167 insertions(+), 18 deletions(-) diff --git a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java index b64e4b7..0b674cc 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java +++ b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java @@ -44,7 +44,8 @@ public class TransportCipher { @VisibleForTesting static final String ENCRYPTION_HANDLER_NAME = "TransportEncryption"; private static final String DECRYPTION_HANDLER_NAME = "TransportDecryption"; - private static final int STREAM_BUFFER_SIZE = 1024 * 32; + @VisibleForTesting + static final int STREAM_BUFFER_SIZE = 1024 * 32; private final Properties conf; private final String cipher; @@ -84,7 +85,8 @@ public class TransportCipher { return outIv; } - private CryptoOutputStream createOutputStream(WritableByteChannel ch) throws IOException { + @VisibleForTesting + CryptoOutputStream createOutputStream(WritableByteChannel ch) throws IOException { return new CryptoOutputStream(cipher, conf, ch, key, new IvParameterSpec(outIv)); } @@ -104,7 +106,8 @@ public class TransportCipher { .addFirst(DECRYPTION_HANDLER_NAME, new DecryptionHandler(this)); } - private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + @VisibleForTesting + static class EncryptionHandler extends ChannelOutboundHandlerAdapter { private final ByteArrayWritableChannel byteChannel; private final CryptoOutputStream cos; @@ -116,7 +119,12 @@ public class TransportCipher { @Override public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) throws Exception { - ctx.write(new EncryptedMessage(cos, msg, byteChannel), promise); + ctx.write(createEncryptedMessage(msg), promise); +} + +@VisibleForTesting +EncryptedMessage createEncryptedMessage(Object msg) { + return new EncryptedMessage(cos, msg, byteChannel); } @Override @@ -161,10 +169,12 @@ public class TransportCipher { } } - private static class EncryptedMessage extends AbstractFileRegion { + @VisibleForTesting + static class EncryptedMessage extends AbstractFileRegion { private final boolean isByteBuf; private final ByteBuf buf; private final FileRegion region; +private final long count; private long transferred; private CryptoOutputStream cos; @@ -186,11 +196,12 @@ public class TransportCipher { this.byteRawChannel = new ByteArrayWritableChannel(STREAM_BUFFER_SIZE); this.cos = cos; this.byteEncChannel = ch; + this.count = isByteBuf ? buf.readableBytes() : region.count(); } @Override public long count() { - return isByteBuf ? buf.readableBytes() : region.count(); + return count; } @Override @@ -242,22 +253,38 @@ public class TransportCipher { public long transferTo(WritableByteChannel target, long position) throws IOException { Preconditions.checkArgument(position == transferred(), "Invalid position."); + if (transferred == count) { +return 0; + } + + long totalBytesWritten = 0L; do { if (currentEncrypted == null) { encryptMore(); } -int bytesWritten = currentEncrypted.remaining(); -target.write(currentEncrypted); -bytesWritten -= currentEncrypted.remaining(); -transferred += bytesWritten; -if (!currentEncrypted.hasRemaining()) { +long remaining = currentEncrypted.remaining(); +if (remaining == 0) { + // Just for safety to avoid endless loop. It usually won't happen, but since the + // underlying
[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes
shaneknapp commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477709115 woot! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] asfgit closed pull request #186: testing how-to for k8s changes
asfgit closed pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: testing how-to for k8s changes
This is an automated email from the ASF dual-hosted git repository. shaneknapp pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 5f4985d testing how-to for k8s changes 5f4985d is described below commit 5f4985db6340efa0107ebe34bb285ab8ceb74f65 Author: shane knapp AuthorDate: Thu Mar 28 11:03:09 2019 -0700 testing how-to for k8s changes i think that this will be quite useful. :) Author: shane knapp Closes #186 from shaneknapp/add-k8s-testing-instructions. --- developer-tools.md| 48 ++ site/developer-tools.html | 49 +++ 2 files changed, 97 insertions(+) diff --git a/developer-tools.md b/developer-tools.md index 43ad445..00d57cd 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -175,6 +175,54 @@ You can check the coverage report visually by HTMLs under `/.../spark/python/tes Please check other available options via `python/run-tests[-with-coverage] --help`. +Testing K8S + +If you have made changes to the K8S bindings in Apache Spark, it would behoove you to test locally before submitting a PR. This is relatively simple to do, but it will require a local (to you) installation of [minikube](https://kubernetes.io/docs/setup/minikube/). Due to how minikube interacts with the host system, please be sure to set things up as follows: + +- minikube version v0.34.1 (or greater, but backwards-compatibility between versions is spotty) +- You must use a VM driver! Running minikube with the `--vm-driver=none` option requires that the user launching minikube/k8s have root access. Our Jenkins workers use the [kvm2](https://github.com/kubernetes/minikube/blob/master/docs/drivers.md#kvm2-driver) drivers. More details [here](https://github.com/kubernetes/minikube/blob/master/docs/drivers.md). +- kubernetes version v1.13.3 (can be set by executing `minikube config set kubernetes-version v1.13.3`) + +Once you have minikube properly set up, and have successfully completed the [quick start](https://kubernetes.io/docs/setup/minikube/#quickstart), you can test your changes locally. All subsequent commands should be run from your root spark/ repo directory: + +1) Build a tarball to test against: + +``` +export DATE=`date "+%Y%m%d"` +export REVISION=`git rev-parse --short HEAD` +export ZINC_PORT=$(python -S -c "import random; print random.randrange(3030,4030)") + +./dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ + -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver +``` + +2) Use that tarball and run the K8S integration tests: + +``` +PVC_TMP_DIR=$(mktemp -d) +export PVC_TESTS_HOST_PATH=$PVC_TMP_DIR +export PVC_TESTS_VM_PATH=$PVC_TMP_DIR + +minikube --vm-driver= start --memory 6000 --cpus 8 + +minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH} --9p-version=9p2000.L --gid=0 --uid=185 & + +MOUNT_PID=$(jobs -rp) + +kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts || true + +./resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh \ +--spark-tgz ${WORKSPACE}/spark-*.tgz + +kill -9 $MOUNT_PID +minikube stop +``` + +After the run is completed, the integration test logs are saved here: `./resource-managers/kubernetes/integration-tests/target/integration-tests.log` + +Getting logs from the pods and containers directly is an exercise left to the reader. + +Kubernetes, and more importantly, minikube have rapid release cycles, and point releases have been found to be buggy and/or break older and existing functionality. If you are having trouble getting tests to pass on Jenkins, but locally things work, don't hesitate to file a Jira issue. ScalaTest Issues diff --git a/site/developer-tools.html b/site/developer-tools.html index e676d7b..17da11c 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -353,6 +353,55 @@ Generating HTML files for PySpark coverage under /.../spark/python/test_coverage Please check other available options via python/run-tests[-with-coverage] --help. +Testing K8S + +If you have made changes to the K8S bindings in Apache Spark, it would behoove you to test locally before submitting a PR. This is relatively simple to do, but it will require a local (to you) installation of https://kubernetes.io/docs/setup/minikube/;>minikube. Due to how minikube interacts with the host system, please be sure to set things up as follows: + + + minikube version v0.34.1 (or greater, but backwards-compatibility between versions is spotty) + You must use a VM driver! Running minikube with the --vm-driver=none option requires that the user launching minikube/k8s have root access. Our Jenkins workers use the
[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes
tgravescs commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477704250 ok let us know if it doesn't work This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes
shaneknapp commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477698576 alright, i'll try and push from my end and see what happens. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes
tgravescs commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477697258 I'm not sure on permissions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp edited a comment on issue #186: testing how-to for k8s changes
shaneknapp edited a comment on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477691831 > lgtm, note I didn't actually try the steps but probably will be in the next couple week.s FWIW, these commands are pulled directly from the k8s integration test build, so things should be relatively simple to get set up on a dev's end. regarding merging -- do i have perms to run the merge script in the website dir, or does someone else do that for me? ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp closed pull request #186: testing how-to for k8s changes
shaneknapp closed pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes
shaneknapp commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477691831 > lgtm, note I didn't actually try the steps but probably will be in the next couple week.s FWIW, these commands are pulled directly from the k8s integration test build, so things should be relatively simple to get set up on a dev's end. regarding merging -- do i have perms to run the merge script in the website dir, or will someone else do that for me? ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp opened a new pull request #186: testing how-to for k8s changes
shaneknapp opened a new pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186 i think that this will be quite useful. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes
tgravescs commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477689498 lgtm, note I didn't actually try the steps but probably will be in the next couple week.s This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR] Move java file to java directory
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 50cded5 [MINOR] Move java file to java directory 50cded5 is described below commit 50cded590f8c16ba3263a5ecba6805fb06dd8a64 Author: Xianyang Liu AuthorDate: Thu Mar 28 12:11:00 2019 -0500 [MINOR] Move java file to java directory ## What changes were proposed in this pull request? move ```scala org.apache.spark.sql.execution.streaming.BaseStreamingSource org.apache.spark.sql.execution.streaming.BaseStreamingSink ``` to java directory ## How was this patch tested? Existing UT. Closes #24222 from ConeyLiu/move-scala-to-java. Authored-by: Xianyang Liu Signed-off-by: Sean Owen --- .../org/apache/spark/sql/execution/streaming/BaseStreamingSink.java | 0 .../org/apache/spark/sql/execution/streaming/BaseStreamingSource.java | 0 .../{scala => java}/org/apache/spark/sql/execution/streaming/Offset.java | 0 3 files changed, 0 insertions(+), 0 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java similarity index 100% rename from sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java rename to sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java similarity index 100% rename from sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java rename to sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java similarity index 100% rename from sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java rename to sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes
shaneknapp commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477676857 > this is great, thanks for writing these up Shane. not a problem! after spending 2-3 days helping someone test their PR before finding out they were using the `--vm-driver=none` arg, i decided that this needed to be documented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] shaneknapp commented on a change in pull request #186: testing how-to for k8s changes
shaneknapp commented on a change in pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#discussion_r270094568 ## File path: site/sitemap.xml ## @@ -760,19 +760,19 @@ weekly - https://spark.apache.org/news/ + https://spark.apache.org/streaming/ Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only want to use default pool and do not set spark.scheduler.pool for the session
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 43bf4ae [SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only want to use default pool and do not set spark.scheduler.pool for the session 43bf4ae is described below commit 43bf4ae6417fcb15d0fbc7880f14f307c164d464 Author: zhoukang AuthorDate: Thu Mar 28 09:24:16 2019 -0500 [SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only want to use default pool and do not set spark.scheduler.pool for the session ## What changes were proposed in this pull request? When using fair scheduler mode for thrift server, we may have unpredictable result. ``` val pool = sessionToActivePool.get(parentSession.getSessionHandle) if (pool != null) { sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, pool) } ``` The cause is we use thread pool to execute queries for thriftserver, and when we call setLocalProperty we may have unpredictab behavior. ``` /** * Set a local property that affects jobs submitted from this thread, such as the Spark fair * scheduler pool. User-defined properties may also be set here. These properties are propagated * through to worker tasks and can be accessed there via * [[org.apache.spark.TaskContext#getLocalProperty]]. * * These properties are inherited by child threads spawned from this thread. This * may have unexpected consequences when working with thread pools. The standard java * implementation of thread pools have worker threads spawn other worker threads. * As a result, local properties may propagate unpredictably. */ def setLocalProperty(key: String, value: String) { if (value == null) { localProperties.get.remove(key) } else { localProperties.get.setProperty(key, value) } } ``` I post an example on https://jira.apache.org/jira/browse/SPARK-26914 . ## How was this patch tested? UT Closes #23826 from caneGuy/zhoukang/fix-scheduler-error. Authored-by: zhoukang Signed-off-by: Sean Owen --- .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index 1772fe6..b05307e 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -226,9 +226,9 @@ private[hive] class SparkExecuteStatementOperation( parentSession.getUsername) sqlContext.sparkContext.setJobGroup(statementId, statement) val pool = sessionToActivePool.get(parentSession.getSessionHandle) -if (pool != null) { - sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, pool) -} +// It may have unpredictably behavior since we use thread pools to execute quries and +// the 'spark.scheduler.pool' may not be 'default' when we did not set its value.(SPARK-26914) + sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, pool) try { result = sqlContext.sql(statement) logDebug(result.queryExecution.toString()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes
tgravescs commented on issue #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#issuecomment-477592577 this is great, thanks for writing these up Shane. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] tgravescs commented on a change in pull request #186: testing how-to for k8s changes
tgravescs commented on a change in pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#discussion_r269997574 ## File path: developer-tools.md ## @@ -175,6 +175,55 @@ You can check the coverage report visually by HTMLs under `/.../spark/python/tes Please check other available options via `python/run-tests[-with-coverage] --help`. +Testing K8S + +If you have made changes to the K8S bindings in Apache Spark, it would behoove you to test locally before submitting a PR. This is relatively simple to do, but it will require a local (to you) installation of [minikube](https://kubernetes.io/docs/setup/minikube/). Due to how minikube interacts with the host system, please be sure to set things up as follows: + +- Review comment: extra empty line here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][CORE] Remove import scala.collection.Set in TaskSchedulerImpl
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e4a968d [MINOR][CORE] Remove import scala.collection.Set in TaskSchedulerImpl e4a968d is described below commit e4a968d82947a68d2daedc8e5fb9adf57cf9bd5b Author: Wenchen Fan AuthorDate: Thu Mar 28 21:12:18 2019 +0900 [MINOR][CORE] Remove import scala.collection.Set in TaskSchedulerImpl ## What changes were proposed in this pull request? I was playing with the scheduler and found this weird thing. In `TaskSchedulerImpl` we import `scala.collection.Set` without any reason. This is bad in practice, as it silently changes the actual class when we simply type `Set`, which by default should point to the immutable set. This change only affects one method: `getExecutorsAliveOnHost`. I checked all the caller side and none of them need a general `Set` type. ## How was this patch tested? N/A Closes #24231 from cloud-fan/minor. Authored-by: Wenchen Fan Signed-off-by: Hyukjin Kwon --- .../main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala| 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index db0fbfe..1ef3566 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -22,7 +22,6 @@ import java.util.{Locale, Timer, TimerTask} import java.util.concurrent.{ConcurrentHashMap, TimeUnit} import java.util.concurrent.atomic.AtomicLong -import scala.collection.Set import scala.collection.mutable.{ArrayBuffer, BitSet, HashMap, HashSet} import scala.util.Random @@ -829,8 +828,8 @@ private[spark] class TaskSchedulerImpl( * Get a snapshot of the currently blacklisted nodes for the entire application. This is * thread-safe -- it can be called without a lock on the TaskScheduler. */ - def nodeBlacklist(): scala.collection.immutable.Set[String] = { - blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(scala.collection.immutable.Set()) + def nodeBlacklist(): Set[String] = { +blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(Set.empty) } // By default, rack is unknown - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] felixcheung commented on a change in pull request #186: testing how-to for k8s changes
felixcheung commented on a change in pull request #186: testing how-to for k8s changes URL: https://github.com/apache/spark-website/pull/186#discussion_r269868047 ## File path: site/sitemap.xml ## @@ -760,19 +760,19 @@ weekly - https://spark.apache.org/news/ + https://spark.apache.org/streaming/ Review comment: revert this file? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org