[spark] branch master updated: [SPARK-27252][SQL] Make current_date() independent from time zones

2019-03-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 06abd06  [SPARK-27252][SQL] Make current_date() independent from time 
zones
06abd06 is described below

commit 06abd06112965cd73417ccceacdbd94b6b3d2793
Author: Maxim Gekk 
AuthorDate: Thu Mar 28 18:44:08 2019 -0700

[SPARK-27252][SQL] Make current_date() independent from time zones

## What changes were proposed in this pull request?

This makes the `CurrentDate` expression and `current_date` function 
independent from time zone settings. New result is number of days since epoch 
in `UTC` time zone. Previously, Spark shifted the current date (in `UTC` time 
zone) according the session time zone which violets definition of `DateType` - 
number of days since epoch (which is an absolute point in time, midnight of Jan 
1 1970 in UTC time).

The changes makes `CurrentDate` consistent to `CurrentTimestamp` which is 
independent from time zone too.

## How was this patch tested?

The changes were tested by existing test suites like `DateExpressionsSuite`.

Closes #24185 from MaxGekk/current-date.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Maxim Gekk 
Signed-off-by: Wenchen Fan 
---
 docs/sql-migration-guide-upgrade.md|  4 +++-
 .../catalyst/expressions/datetimeExpressions.scala | 18 ++--
 .../sql/catalyst/optimizer/finishAnalysis.scala| 25 +++---
 .../expressions/DateExpressionsSuite.scala |  6 +++---
 .../execution/streaming/MicroBatchExecution.scala  |  2 +-
 .../scala/org/apache/spark/sql/functions.scala |  2 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala |  2 +-
 .../org/apache/spark/sql/DateFunctionsSuite.scala  | 19 +---
 8 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index b8597f0..3ba89c0 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -103,7 +103,9 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, the `current_timestamp` function returns 
a timestamp with millisecond resolution only. Since Spark 3.0, the function can 
return the result with microsecond resolution if the underlying clock available 
on the system offers such resolution.
 
-  - In Spark version 2.4 abd earlier, when reading a Hive Serde table with 
Spark native data sources(parquet/orc), Spark will infer the actual file schema 
and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer 
the schema anymore. This should not cause any problems to end users, but if it 
does, please set `spark.sql.hive.caseSensitiveInferenceMode` to 
`INFER_AND_SAVE`.
+  - In Spark version 2.4 and earlier, when reading a Hive Serde table with 
Spark native data sources(parquet/orc), Spark will infer the actual file schema 
and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer 
the schema anymore. This should not cause any problems to end users, but if it 
does, please set `spark.sql.hive.caseSensitiveInferenceMode` to 
`INFER_AND_SAVE`.
+
+  - In Spark version 2.4 and earlier, the `current_date` function returns the 
current date shifted according to the SQL config `spark.sql.session.timeZone`. 
Since Spark 3.0, the function always returns the current date in the `UTC` time 
zone.
 
   - Since Spark 3.0, `TIMESTAMP` literals are converted to strings using the 
SQL config `spark.sql.session.timeZone`, and `DATE` literals are formatted 
using the UTC time zone. In Spark version 2.4 and earlier, both conversions use 
the default time zone of the Java virtual machine.
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 7878a87..3cda989 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import java.sql.Timestamp
-import java.time.{Instant, LocalDate, ZoneId}
+import java.time.{Instant, LocalDate, ZoneId, ZoneOffset}
 import java.time.temporal.IsoFields
 import java.util.{Locale, TimeZone}
 
@@ -52,30 +52,26 @@ trait TimeZoneAwareExpression extends Expression {
   @transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)
 }
 
+// scalastyle:off line.size.limit
 /**
- * Returns the current date at the start of query evaluation.
+ * Returns the current date in the UTC time zone at the start of query 
evaluation.
  * All calls of current_date within 

[spark] branch branch-2.3 updated: [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4)

2019-03-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 530ec52  [SPARK-27275][CORE] Fix potential corruption in 
EncryptedMessage.transferTo (2.4)
530ec52 is described below

commit 530ec5247ccbe049cea8747195fc2011e71ad0f9
Author: Shixiong Zhu 
AuthorDate: Thu Mar 28 11:13:11 2019 -0700

[SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo 
(2.4)

## What changes were proposed in this pull request?

Backport https://github.com/apache/spark/pull/24211 to 2.4

## How was this patch tested?

Jenkins

Closes #24229 from zsxwing/SPARK-27275-2.4.

Authored-by: Shixiong Zhu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 298e4fa6f8054c54e246f91b70d62174ccdb9413)
Signed-off-by: Wenchen Fan 
---
 .../spark/network/crypto/TransportCipher.java  | 53 ++
 .../spark/network/crypto/AuthEngineSuite.java  | 85 ++
 .../spark/network/crypto/AuthIntegrationSuite.java | 47 ++--
 3 files changed, 167 insertions(+), 18 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
index e04524d..608350c 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
@@ -44,7 +44,8 @@ public class TransportCipher {
   @VisibleForTesting
   static final String ENCRYPTION_HANDLER_NAME = "TransportEncryption";
   private static final String DECRYPTION_HANDLER_NAME = "TransportDecryption";
-  private static final int STREAM_BUFFER_SIZE = 1024 * 32;
+  @VisibleForTesting
+  static final int STREAM_BUFFER_SIZE = 1024 * 32;
 
   private final Properties conf;
   private final String cipher;
@@ -84,7 +85,8 @@ public class TransportCipher {
 return outIv;
   }
 
-  private CryptoOutputStream createOutputStream(WritableByteChannel ch) throws 
IOException {
+  @VisibleForTesting
+  CryptoOutputStream createOutputStream(WritableByteChannel ch) throws 
IOException {
 return new CryptoOutputStream(cipher, conf, ch, key, new 
IvParameterSpec(outIv));
   }
 
@@ -104,7 +106,8 @@ public class TransportCipher {
   .addFirst(DECRYPTION_HANDLER_NAME, new DecryptionHandler(this));
   }
 
-  private static class EncryptionHandler extends ChannelOutboundHandlerAdapter 
{
+  @VisibleForTesting
+  static class EncryptionHandler extends ChannelOutboundHandlerAdapter {
 private final ByteArrayWritableChannel byteChannel;
 private final CryptoOutputStream cos;
 
@@ -116,7 +119,12 @@ public class TransportCipher {
 @Override
 public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise 
promise)
   throws Exception {
-  ctx.write(new EncryptedMessage(cos, msg, byteChannel), promise);
+  ctx.write(createEncryptedMessage(msg), promise);
+}
+
+@VisibleForTesting
+EncryptedMessage createEncryptedMessage(Object msg) {
+  return new EncryptedMessage(cos, msg, byteChannel);
 }
 
 @Override
@@ -161,10 +169,12 @@ public class TransportCipher {
 }
   }
 
-  private static class EncryptedMessage extends AbstractFileRegion {
+  @VisibleForTesting
+  static class EncryptedMessage extends AbstractFileRegion {
 private final boolean isByteBuf;
 private final ByteBuf buf;
 private final FileRegion region;
+private final long count;
 private long transferred;
 private CryptoOutputStream cos;
 
@@ -186,11 +196,12 @@ public class TransportCipher {
   this.byteRawChannel = new ByteArrayWritableChannel(STREAM_BUFFER_SIZE);
   this.cos = cos;
   this.byteEncChannel = ch;
+  this.count = isByteBuf ? buf.readableBytes() : region.count();
 }
 
 @Override
 public long count() {
-  return isByteBuf ? buf.readableBytes() : region.count();
+  return count;
 }
 
 @Override
@@ -242,22 +253,38 @@ public class TransportCipher {
 public long transferTo(WritableByteChannel target, long position) throws 
IOException {
   Preconditions.checkArgument(position == transfered(), "Invalid 
position.");
 
+  if (transferred == count) {
+return 0;
+  }
+
+  long totalBytesWritten = 0L;
   do {
 if (currentEncrypted == null) {
   encryptMore();
 }
 
-int bytesWritten = currentEncrypted.remaining();
-target.write(currentEncrypted);
-bytesWritten -= currentEncrypted.remaining();
-transferred += bytesWritten;
-if (!currentEncrypted.hasRemaining()) {
+long remaining = currentEncrypted.remaining();
+if (remaining == 0)  {
+  // 

[spark] branch branch-2.4 updated: [SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo (2.4)

2019-03-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 298e4fa  [SPARK-27275][CORE] Fix potential corruption in 
EncryptedMessage.transferTo (2.4)
298e4fa is described below

commit 298e4fa6f8054c54e246f91b70d62174ccdb9413
Author: Shixiong Zhu 
AuthorDate: Thu Mar 28 11:13:11 2019 -0700

[SPARK-27275][CORE] Fix potential corruption in EncryptedMessage.transferTo 
(2.4)

## What changes were proposed in this pull request?

Backport https://github.com/apache/spark/pull/24211 to 2.4

## How was this patch tested?

Jenkins

Closes #24229 from zsxwing/SPARK-27275-2.4.

Authored-by: Shixiong Zhu 
Signed-off-by: Wenchen Fan 
---
 .../spark/network/crypto/TransportCipher.java  | 53 ++
 .../spark/network/crypto/AuthEngineSuite.java  | 85 ++
 .../spark/network/crypto/AuthIntegrationSuite.java | 47 ++--
 3 files changed, 167 insertions(+), 18 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
index b64e4b7..0b674cc 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
@@ -44,7 +44,8 @@ public class TransportCipher {
   @VisibleForTesting
   static final String ENCRYPTION_HANDLER_NAME = "TransportEncryption";
   private static final String DECRYPTION_HANDLER_NAME = "TransportDecryption";
-  private static final int STREAM_BUFFER_SIZE = 1024 * 32;
+  @VisibleForTesting
+  static final int STREAM_BUFFER_SIZE = 1024 * 32;
 
   private final Properties conf;
   private final String cipher;
@@ -84,7 +85,8 @@ public class TransportCipher {
 return outIv;
   }
 
-  private CryptoOutputStream createOutputStream(WritableByteChannel ch) throws 
IOException {
+  @VisibleForTesting
+  CryptoOutputStream createOutputStream(WritableByteChannel ch) throws 
IOException {
 return new CryptoOutputStream(cipher, conf, ch, key, new 
IvParameterSpec(outIv));
   }
 
@@ -104,7 +106,8 @@ public class TransportCipher {
   .addFirst(DECRYPTION_HANDLER_NAME, new DecryptionHandler(this));
   }
 
-  private static class EncryptionHandler extends ChannelOutboundHandlerAdapter 
{
+  @VisibleForTesting
+  static class EncryptionHandler extends ChannelOutboundHandlerAdapter {
 private final ByteArrayWritableChannel byteChannel;
 private final CryptoOutputStream cos;
 
@@ -116,7 +119,12 @@ public class TransportCipher {
 @Override
 public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise 
promise)
   throws Exception {
-  ctx.write(new EncryptedMessage(cos, msg, byteChannel), promise);
+  ctx.write(createEncryptedMessage(msg), promise);
+}
+
+@VisibleForTesting
+EncryptedMessage createEncryptedMessage(Object msg) {
+  return new EncryptedMessage(cos, msg, byteChannel);
 }
 
 @Override
@@ -161,10 +169,12 @@ public class TransportCipher {
 }
   }
 
-  private static class EncryptedMessage extends AbstractFileRegion {
+  @VisibleForTesting
+  static class EncryptedMessage extends AbstractFileRegion {
 private final boolean isByteBuf;
 private final ByteBuf buf;
 private final FileRegion region;
+private final long count;
 private long transferred;
 private CryptoOutputStream cos;
 
@@ -186,11 +196,12 @@ public class TransportCipher {
   this.byteRawChannel = new ByteArrayWritableChannel(STREAM_BUFFER_SIZE);
   this.cos = cos;
   this.byteEncChannel = ch;
+  this.count = isByteBuf ? buf.readableBytes() : region.count();
 }
 
 @Override
 public long count() {
-  return isByteBuf ? buf.readableBytes() : region.count();
+  return count;
 }
 
 @Override
@@ -242,22 +253,38 @@ public class TransportCipher {
 public long transferTo(WritableByteChannel target, long position) throws 
IOException {
   Preconditions.checkArgument(position == transferred(), "Invalid 
position.");
 
+  if (transferred == count) {
+return 0;
+  }
+
+  long totalBytesWritten = 0L;
   do {
 if (currentEncrypted == null) {
   encryptMore();
 }
 
-int bytesWritten = currentEncrypted.remaining();
-target.write(currentEncrypted);
-bytesWritten -= currentEncrypted.remaining();
-transferred += bytesWritten;
-if (!currentEncrypted.hasRemaining()) {
+long remaining = currentEncrypted.remaining();
+if (remaining == 0)  {
+  // Just for safety to avoid endless loop. It usually won't happen, 
but since the
+  // underlying 

[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477709115
 
 
   woot!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] asfgit closed pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
asfgit closed pull request #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: testing how-to for k8s changes

2019-03-28 Thread shaneknapp
This is an automated email from the ASF dual-hosted git repository.

shaneknapp pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 5f4985d  testing how-to for k8s changes
5f4985d is described below

commit 5f4985db6340efa0107ebe34bb285ab8ceb74f65
Author: shane knapp 
AuthorDate: Thu Mar 28 11:03:09 2019 -0700

testing how-to for k8s changes

i think that this will be quite useful.  :)

Author: shane knapp 

Closes #186 from shaneknapp/add-k8s-testing-instructions.
---
 developer-tools.md| 48 ++
 site/developer-tools.html | 49 +++
 2 files changed, 97 insertions(+)

diff --git a/developer-tools.md b/developer-tools.md
index 43ad445..00d57cd 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -175,6 +175,54 @@ You can check the coverage report visually by HTMLs under 
`/.../spark/python/tes
 
 Please check other available options via `python/run-tests[-with-coverage] 
--help`.
 
+Testing K8S
+
+If you have made changes to the K8S bindings in Apache Spark, it would behoove 
you to test locally before submitting a PR.  This is relatively simple to do, 
but it will require a local (to you) installation of 
[minikube](https://kubernetes.io/docs/setup/minikube/).  Due to how minikube 
interacts with the host system, please be sure to set things up as follows:
+
+- minikube version v0.34.1 (or greater, but backwards-compatibility between 
versions is spotty)
+- You must use a VM driver!  Running minikube with the `--vm-driver=none` 
option requires that the user launching minikube/k8s have root access.  Our 
Jenkins workers use the 
[kvm2](https://github.com/kubernetes/minikube/blob/master/docs/drivers.md#kvm2-driver)
 drivers.  More details 
[here](https://github.com/kubernetes/minikube/blob/master/docs/drivers.md).
+- kubernetes version v1.13.3 (can be set by executing `minikube config set 
kubernetes-version v1.13.3`)
+
+Once you have minikube properly set up, and have successfully completed the 
[quick start](https://kubernetes.io/docs/setup/minikube/#quickstart), you can 
test your changes locally.  All subsequent commands should be run from your 
root spark/ repo directory:
+
+1) Build a tarball to test against:
+
+```
+export DATE=`date "+%Y%m%d"`
+export REVISION=`git rev-parse --short HEAD`
+export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+
+./dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz 
-DzincPort=${ZINC_PORT} \
+ -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
+```
+
+2) Use that tarball and run the K8S integration tests:
+
+```
+PVC_TMP_DIR=$(mktemp -d)
+export PVC_TESTS_HOST_PATH=$PVC_TMP_DIR
+export PVC_TESTS_VM_PATH=$PVC_TMP_DIR
+
+minikube --vm-driver= start --memory 6000 --cpus 8
+
+minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH} 
--9p-version=9p2000.L --gid=0 --uid=185 &
+
+MOUNT_PID=$(jobs -rp)
+
+kubectl create clusterrolebinding serviceaccounts-cluster-admin 
--clusterrole=cluster-admin --group=system:serviceaccounts || true
+
+./resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
 \
+--spark-tgz ${WORKSPACE}/spark-*.tgz
+
+kill -9 $MOUNT_PID
+minikube stop
+```
+
+After the run is completed, the integration test logs are saved here:  
`./resource-managers/kubernetes/integration-tests/target/integration-tests.log`
+
+Getting logs from the pods and containers directly is an exercise left to the 
reader.
+
+Kubernetes, and more importantly, minikube have rapid release cycles, and 
point releases have been found to be buggy and/or break older and existing 
functionality.  If you are having trouble getting tests to pass on Jenkins, but 
locally things work, don't hesitate to file a Jira issue.
 
 ScalaTest Issues
 
diff --git a/site/developer-tools.html b/site/developer-tools.html
index e676d7b..17da11c 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -353,6 +353,55 @@ Generating HTML files for PySpark coverage under 
/.../spark/python/test_coverage
 
 Please check other available options via 
python/run-tests[-with-coverage] --help.
 
+Testing K8S
+
+If you have made changes to the K8S bindings in Apache Spark, it would 
behoove you to test locally before submitting a PR.  This is relatively simple 
to do, but it will require a local (to you) installation of https://kubernetes.io/docs/setup/minikube/;>minikube.  Due to how 
minikube interacts with the host system, please be sure to set things up as 
follows:
+
+
+  minikube version v0.34.1 (or greater, but backwards-compatibility 
between versions is spotty)
+  You must use a VM driver!  Running minikube with the 
--vm-driver=none option requires that the user launching 
minikube/k8s have root access.  Our Jenkins workers use the 

[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
tgravescs commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477704250
 
 
   ok let us know if it doesn't work


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477698576
 
 
   alright, i'll try and push from my end and see what happens.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
tgravescs commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477697258
 
 
   I'm not sure on permissions


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp edited a comment on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp edited a comment on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477691831
 
 
   > lgtm, note I didn't actually try the steps but probably will be in the 
next couple week.s
   
   FWIW, these commands are pulled directly from the k8s integration test 
build, so things should be relatively simple to get set up on a dev's end.
   
   regarding merging -- do i have perms to run the merge script in the website 
dir, or does someone else do that for me?  ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp closed pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp closed pull request #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477691831
 
 
   > lgtm, note I didn't actually try the steps but probably will be in the 
next couple week.s
   
   FWIW, these commands are pulled directly from the k8s integration test 
build, so things should be relatively simple to get set up on a dev's end.
   
   regarding merging -- do i have perms to run the merge script in the website 
dir, or will someone else do that for me?  ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp opened a new pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp opened a new pull request #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186
 
 
   i think that this will be quite useful.  :)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
tgravescs commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477689498
 
 
   lgtm, note I didn't actually try the steps but probably will be in the next 
couple week.s


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR] Move java file to java directory

2019-03-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 50cded5  [MINOR] Move java file to java directory
50cded5 is described below

commit 50cded590f8c16ba3263a5ecba6805fb06dd8a64
Author: Xianyang Liu 
AuthorDate: Thu Mar 28 12:11:00 2019 -0500

[MINOR] Move java file to java directory

## What changes were proposed in this pull request?

move
```scala
org.apache.spark.sql.execution.streaming.BaseStreamingSource
org.apache.spark.sql.execution.streaming.BaseStreamingSink
```
to java directory

## How was this patch tested?

Existing UT.

Closes #24222 from ConeyLiu/move-scala-to-java.

Authored-by: Xianyang Liu 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/execution/streaming/BaseStreamingSink.java   | 0
 .../org/apache/spark/sql/execution/streaming/BaseStreamingSource.java | 0
 .../{scala => java}/org/apache/spark/sql/execution/streaming/Offset.java  | 0
 3 files changed, 0 insertions(+), 0 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java
similarity index 100%
rename from 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java
rename to 
sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSink.java
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java
similarity index 100%
rename from 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java
rename to 
sql/core/src/main/java/org/apache/spark/sql/execution/streaming/BaseStreamingSource.java
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
similarity index 100%
rename from 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
rename to 
sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477676857
 
 
   > this is great, thanks for writing these up Shane.
   
   not a problem!  after spending 2-3 days helping someone test their PR before 
finding out they were using the `--vm-driver=none` arg, i decided that this 
needed to be documented.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on a change in pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
shaneknapp commented on a change in pull request #186: testing how-to for k8s 
changes
URL: https://github.com/apache/spark-website/pull/186#discussion_r270094568
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -760,19 +760,19 @@
   weekly
 
 
-  https://spark.apache.org/news/
+  https://spark.apache.org/streaming/
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only want to use default pool and do not set spark.scheduler.pool for the session

2019-03-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 43bf4ae  [SPARK-26914][SQL] Fix scheduler pool may be unpredictable 
when we only want to use default pool and do not set spark.scheduler.pool for 
the session
43bf4ae is described below

commit 43bf4ae6417fcb15d0fbc7880f14f307c164d464
Author: zhoukang 
AuthorDate: Thu Mar 28 09:24:16 2019 -0500

[SPARK-26914][SQL] Fix scheduler pool may be unpredictable when we only 
want to use default pool and do not set spark.scheduler.pool for the session

## What changes were proposed in this pull request?

When using fair scheduler mode for thrift server, we may have unpredictable 
result.
```
val pool = sessionToActivePool.get(parentSession.getSessionHandle)
if (pool != null) {
   
sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, 
pool)
}
```
The cause is we use thread pool to execute queries for thriftserver, and 
when we call setLocalProperty we may have unpredictab behavior.

```
/**
   * Set a local property that affects jobs submitted from this thread, 
such as the Spark fair
   * scheduler pool. User-defined properties may also be set here. These 
properties are propagated
   * through to worker tasks and can be accessed there via
   * [[org.apache.spark.TaskContext#getLocalProperty]].
   *
   * These properties are inherited by child threads spawned from this 
thread. This
   * may have unexpected consequences when working with thread pools. The 
standard java
   * implementation of thread pools have worker threads spawn other worker 
threads.
   * As a result, local properties may propagate unpredictably.
   */
  def setLocalProperty(key: String, value: String) {
if (value == null) {
  localProperties.get.remove(key)
} else {
  localProperties.get.setProperty(key, value)
}
  }
```

I post an example on https://jira.apache.org/jira/browse/SPARK-26914 .

## How was this patch tested?
UT

Closes #23826 from caneGuy/zhoukang/fix-scheduler-error.

Authored-by: zhoukang 
Signed-off-by: Sean Owen 
---
 .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 1772fe6..b05307e 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -226,9 +226,9 @@ private[hive] class SparkExecuteStatementOperation(
   parentSession.getUsername)
 sqlContext.sparkContext.setJobGroup(statementId, statement)
 val pool = sessionToActivePool.get(parentSession.getSessionHandle)
-if (pool != null) {
-  
sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, 
pool)
-}
+// It may have unpredictably behavior since we use thread pools to execute 
quries and
+// the 'spark.scheduler.pool' may not be 'default' when we did not set its 
value.(SPARK-26914)
+
sqlContext.sparkContext.setLocalProperty(SparkContext.SPARK_SCHEDULER_POOL, 
pool)
 try {
   result = sqlContext.sql(statement)
   logDebug(result.queryExecution.toString())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] tgravescs commented on issue #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
tgravescs commented on issue #186: testing how-to for k8s changes
URL: https://github.com/apache/spark-website/pull/186#issuecomment-477592577
 
 
   this is great, thanks for writing these up Shane.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] tgravescs commented on a change in pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
tgravescs commented on a change in pull request #186: testing how-to for k8s 
changes
URL: https://github.com/apache/spark-website/pull/186#discussion_r269997574
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -175,6 +175,55 @@ You can check the coverage report visually by HTMLs under 
`/.../spark/python/tes
 
 Please check other available options via `python/run-tests[-with-coverage] 
--help`.
 
+Testing K8S
+
+If you have made changes to the K8S bindings in Apache Spark, it would behoove 
you to test locally before submitting a PR.  This is relatively simple to do, 
but it will require a local (to you) installation of 
[minikube](https://kubernetes.io/docs/setup/minikube/).  Due to how minikube 
interacts with the host system, please be sure to set things up as follows:
+
+- 
 
 Review comment:
   extra empty line here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][CORE] Remove import scala.collection.Set in TaskSchedulerImpl

2019-03-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e4a968d  [MINOR][CORE] Remove import scala.collection.Set in 
TaskSchedulerImpl
e4a968d is described below

commit e4a968d82947a68d2daedc8e5fb9adf57cf9bd5b
Author: Wenchen Fan 
AuthorDate: Thu Mar 28 21:12:18 2019 +0900

[MINOR][CORE] Remove import scala.collection.Set in TaskSchedulerImpl

## What changes were proposed in this pull request?

I was playing with the scheduler and found this weird thing. In 
`TaskSchedulerImpl` we import `scala.collection.Set` without any reason. This 
is bad in practice, as it silently changes the actual class when we simply type 
`Set`, which by default should point to the immutable set.

This change only affects one method: `getExecutorsAliveOnHost`. I checked 
all the caller side and none of them need a general `Set` type.

## How was this patch tested?

N/A

Closes #24231 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: Hyukjin Kwon 
---
 .../main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala| 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index db0fbfe..1ef3566 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -22,7 +22,6 @@ import java.util.{Locale, Timer, TimerTask}
 import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
 import java.util.concurrent.atomic.AtomicLong
 
-import scala.collection.Set
 import scala.collection.mutable.{ArrayBuffer, BitSet, HashMap, HashSet}
 import scala.util.Random
 
@@ -829,8 +828,8 @@ private[spark] class TaskSchedulerImpl(
* Get a snapshot of the currently blacklisted nodes for the entire 
application.  This is
* thread-safe -- it can be called without a lock on the TaskScheduler.
*/
-  def nodeBlacklist(): scala.collection.immutable.Set[String] = {
-
blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(scala.collection.immutable.Set())
+  def nodeBlacklist(): Set[String] = {
+blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(Set.empty)
   }
 
   // By default, rack is unknown


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] felixcheung commented on a change in pull request #186: testing how-to for k8s changes

2019-03-28 Thread GitBox
felixcheung commented on a change in pull request #186: testing how-to for k8s 
changes
URL: https://github.com/apache/spark-website/pull/186#discussion_r269868047
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -760,19 +760,19 @@
   weekly
 
 
-  https://spark.apache.org/news/
+  https://spark.apache.org/streaming/
 
 Review comment:
   revert this file?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org