date:20200402

[GitHub] [spark] dongjoon-hyun commented on issue #28090: [SPARK-31321][SQL] Remove SaveMode check in v2 FileWriteBuilder

2020-04-02 Thread GitBox

dongjoon-hyun commented on issue #28090: [SPARK-31321][SQL] Remove SaveMode 
check in v2 FileWriteBuilder
URL: https://github.com/apache/spark/pull/28090#issuecomment-607965303
 
 
   Thank you, @cloud-fan and @yaooqinn .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] 
Write Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402460915
 
 

 ##
 File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriter.scala
 ##
 @@ -45,20 +48,19 @@ private[avro] class AvroOutputWriter(
* Overrides the couple of methods responsible for generating the output 
streams / files so
* that the data can be correctly partitioned
*/
-  private val recordWriter: RecordWriter[AvroKey[GenericRecord], NullWritable] 
=
-new AvroKeyOutputFormat[GenericRecord]() {
-
+  private val recordWriter: RecordWriter[AvroKey[GenericRecord], NullWritable] 
= {
+val sparkVersion = Map(SPARK_VERSION_METADATA_KEY -> 
SPARK_VERSION_SHORT).asJava
 
 Review comment:
   +1 for @MaxGekk 's comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607961900
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607961915
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120724/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607961900
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607961915
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120724/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] 
Write Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402457869
 
 

 ##
 File path: 
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.avro;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.CodecFactory;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.AvroKey;
+import org.apache.avro.mapreduce.AvroKeyOutputFormat;
+import org.apache.avro.mapreduce.Syncable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+// A variant of `AvroKeyOutputFormat`, which is used to inject the custom 
`RecordWriterFactory` so
+// that we can set avro file metadata.
+public class SparkAvroKeyOutputFormat extends 
AvroKeyOutputFormat {
+  public SparkAvroKeyOutputFormat(Map metadata) {
+super(new SparkRecordWriterFactory(metadata));
+  }
+
+  static class SparkRecordWriterFactory extends 
RecordWriterFactory {
+private final Map metadata;
+SparkRecordWriterFactory(Map metadata) {
+  this.metadata = metadata;
+}
+
+protected RecordWriter, NullWritable> create(
+Schema writerSchema,
+GenericData dataModel,
+CodecFactory compressionCodec,
+OutputStream outputStream,
+int syncInterval) throws IOException {
+  return new SparkAvroKeyRecordWriter(
+writerSchema, dataModel, compressionCodec, outputStream, syncInterval, 
metadata);
+}
+  }
+}
+
+// This a fork of org.apache.avro.mapreduce.AvroKeyRecordWriter, in order to 
set file metadata.
+class SparkAvroKeyRecordWriter extends RecordWriter, 
NullWritable>
+implements Syncable {
+  private final DataFileWriter mAvroFileWriter;
+  public SparkAvroKeyRecordWriter(
+  Schema writerSchema,
+  GenericData dataModel,
+  CodecFactory compressionCodec,
+  OutputStream outputStream,
+  int syncInterval,
+  Map metadata) throws IOException {
 
 Review comment:
   This looks good to generalize later. Do you have another candidate in mind 
for adding later?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607854256
 
 
   **[Test build #120724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120724/testReport)**
 for PR 28100 at commit 
[`e32cc30`](https://github.com/apache/spark/commit/e32cc30e470edc149427e0988a3c20c2220303c2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28100: [SPARK-31249][CORE]Fix flaky CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-04-02 Thread GitBox

SparkQA commented on issue #28100: [SPARK-31249][CORE]Fix flaky 
CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
URL: https://github.com/apache/spark/pull/28100#issuecomment-607960654
 
 
   **[Test build #120724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120724/testReport)**
 for PR 28100 at commit 
[`e32cc30`](https://github.com/apache/spark/commit/e32cc30e470edc149427e0988a3c20c2220303c2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607958830
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] 
Decommission cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607958185
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120727/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607958837
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25429/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607958830
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607958837
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25429/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] 
Decommission cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607958176
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607958176
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607958185
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120727/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607958209
 
 
   **[Test build #120731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120731/testReport)**
 for PR 28101 at commit 
[`29d4599`](https://github.com/apache/spark/commit/29d4599bc542d3c87a1489933e679a31080801a6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607956174
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120728/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

SparkQA commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607957637
 
 
   **[Test build #120727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120727/testReport)**
 for PR 27864 at commit 
[`076dd67`](https://github.com/apache/spark/commit/076dd671d9a9ac3c8d5926279f54517b8b39426a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] 
Write Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402453349
 
 

 ##
 File path: 
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.avro;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.CodecFactory;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.AvroKey;
+import org.apache.avro.mapreduce.AvroKeyOutputFormat;
+import org.apache.avro.mapreduce.Syncable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+// A variant of `AvroKeyOutputFormat`, which is used to inject the custom 
`RecordWriterFactory` so
+// that we can set avro file metadata.
+public class SparkAvroKeyOutputFormat extends 
AvroKeyOutputFormat {
+  public SparkAvroKeyOutputFormat(Map metadata) {
+super(new SparkRecordWriterFactory(metadata));
+  }
+
+  static class SparkRecordWriterFactory extends 
RecordWriterFactory {
+private final Map metadata;
+SparkRecordWriterFactory(Map metadata) {
+  this.metadata = metadata;
+}
+
+protected RecordWriter, NullWritable> create(
+Schema writerSchema,
+GenericData dataModel,
+CodecFactory compressionCodec,
+OutputStream outputStream,
+int syncInterval) throws IOException {
+  return new SparkAvroKeyRecordWriter(
+writerSchema, dataModel, compressionCodec, outputStream, syncInterval, 
metadata);
+}
+  }
+}
+
+// This a fork of org.apache.avro.mapreduce.AvroKeyRecordWriter, in order to 
set file metadata.
+class SparkAvroKeyRecordWriter extends RecordWriter, 
NullWritable>
+implements Syncable {
+  private final DataFileWriter mAvroFileWriter;
+  public SparkAvroKeyRecordWriter(
+  Schema writerSchema,
+  GenericData dataModel,
+  CodecFactory compressionCodec,
+  OutputStream outputStream,
+  int syncInterval,
+  Map metadata) throws IOException {
+this.mAvroFileWriter = new 
DataFileWriter(dataModel.createDatumWriter(writerSchema));
+for (Map.Entry entry : metadata.entrySet()) {
+  this.mAvroFileWriter.setMeta(entry.getKey(), entry.getValue());
+}
+this.mAvroFileWriter.setCodec(compressionCodec);
+this.mAvroFileWriter.setSyncInterval(syncInterval);
+this.mAvroFileWriter.create(writerSchema, outputStream);
+  }
+
+  public void write(AvroKey record, NullWritable ignore) throws IOException 
{
 
 Review comment:
   @cloud-fan . ~Do we need to check the effect in terms of performance because 
this is an additional wrapper technically?~ Oh, never mind.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #27864: [SPARK-20732][CORE] Decommission 
cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607871644
 
 
   **[Test build #120727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120727/testReport)**
 for PR 27864 at commit 
[`076dd67`](https://github.com/apache/spark/commit/076dd671d9a9ac3c8d5926279f54517b8b39426a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

dongjoon-hyun commented on a change in pull request #28102: [SPARK-31327][SQL] 
Write Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402453349
 
 

 ##
 File path: 
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.avro;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.CodecFactory;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.AvroKey;
+import org.apache.avro.mapreduce.AvroKeyOutputFormat;
+import org.apache.avro.mapreduce.Syncable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+// A variant of `AvroKeyOutputFormat`, which is used to inject the custom 
`RecordWriterFactory` so
+// that we can set avro file metadata.
+public class SparkAvroKeyOutputFormat extends 
AvroKeyOutputFormat {
+  public SparkAvroKeyOutputFormat(Map metadata) {
+super(new SparkRecordWriterFactory(metadata));
+  }
+
+  static class SparkRecordWriterFactory extends 
RecordWriterFactory {
+private final Map metadata;
+SparkRecordWriterFactory(Map metadata) {
+  this.metadata = metadata;
+}
+
+protected RecordWriter, NullWritable> create(
+Schema writerSchema,
+GenericData dataModel,
+CodecFactory compressionCodec,
+OutputStream outputStream,
+int syncInterval) throws IOException {
+  return new SparkAvroKeyRecordWriter(
+writerSchema, dataModel, compressionCodec, outputStream, syncInterval, 
metadata);
+}
+  }
+}
+
+// This a fork of org.apache.avro.mapreduce.AvroKeyRecordWriter, in order to 
set file metadata.
+class SparkAvroKeyRecordWriter extends RecordWriter, 
NullWritable>
+implements Syncable {
+  private final DataFileWriter mAvroFileWriter;
+  public SparkAvroKeyRecordWriter(
+  Schema writerSchema,
+  GenericData dataModel,
+  CodecFactory compressionCodec,
+  OutputStream outputStream,
+  int syncInterval,
+  Map metadata) throws IOException {
+this.mAvroFileWriter = new 
DataFileWriter(dataModel.createDatumWriter(writerSchema));
+for (Map.Entry entry : metadata.entrySet()) {
+  this.mAvroFileWriter.setMeta(entry.getKey(), entry.getValue());
+}
+this.mAvroFileWriter.setCodec(compressionCodec);
+this.mAvroFileWriter.setSyncInterval(syncInterval);
+this.mAvroFileWriter.create(writerSchema, outputStream);
+  }
+
+  public void write(AvroKey record, NullWritable ignore) throws IOException 
{
 
 Review comment:
   @cloud-fan . Do we need to check the effect in terms of performance because 
this is an additional wrapper technically?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607956166
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607956166
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607956174
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120728/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] 
ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955698
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] 
ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120726/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality 
wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607884832
 
 
   **[Test build #120728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120728/testReport)**
 for PR 27207 at commit 
[`8d35ea1`](https://github.com/apache/spark/commit/8d35ea14edfb8bd7cc276020784c13caf030532a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] 
ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955698
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] 
ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120726/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] 
ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607867530
 
 
   **[Test build #120726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)**
 for PR 27724 at commit 
[`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time 
measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607955673
 
 
   **[Test build #120728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120728/testReport)**
 for PR 27207 at commit 
[`8d35ea1`](https://github.com/apache/spark/commit/8d35ea14edfb8bd7cc276020784c13caf030532a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec 
should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607954854
 
 
   **[Test build #120726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)**
 for PR 27724 at commit 
[`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

dongjoon-hyun commented on issue #28102: [SPARK-31327][SQL] Write Spark version 
into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#issuecomment-607954215
 
 
   Thank you for pinging me, @cloud-fan .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

MaxGekk commented on a change in pull request #28102: [SPARK-31327][SQL] Write 
Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402442002
 
 

 ##
 File path: 
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.avro;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.CodecFactory;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.AvroKey;
+import org.apache.avro.mapreduce.AvroKeyOutputFormat;
+import org.apache.avro.mapreduce.Syncable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+// A variant of `AvroKeyOutputFormat`, which is used to inject the custom 
`RecordWriterFactory` so
+// that we can set avro file metadata.
+public class SparkAvroKeyOutputFormat extends 
AvroKeyOutputFormat {
+  public SparkAvroKeyOutputFormat(Map metadata) {
+super(new SparkRecordWriterFactory(metadata));
+  }
+
+  static class SparkRecordWriterFactory extends 
RecordWriterFactory {
+private final Map metadata;
+SparkRecordWriterFactory(Map metadata) {
+  this.metadata = metadata;
+}
+
+protected RecordWriter, NullWritable> create(
+Schema writerSchema,
+GenericData dataModel,
+CodecFactory compressionCodec,
+OutputStream outputStream,
+int syncInterval) throws IOException {
+  return new SparkAvroKeyRecordWriter(
+writerSchema, dataModel, compressionCodec, outputStream, syncInterval, 
metadata);
+}
+  }
+}
+
+// This a fork of org.apache.avro.mapreduce.AvroKeyRecordWriter, in order to 
set file metadata.
+class SparkAvroKeyRecordWriter extends RecordWriter, 
NullWritable>
+implements Syncable {
+  private final DataFileWriter mAvroFileWriter;
+  public SparkAvroKeyRecordWriter(
+  Schema writerSchema,
+  GenericData dataModel,
+  CodecFactory compressionCodec,
+  OutputStream outputStream,
+  int syncInterval,
+  Map metadata) throws IOException {
+this.mAvroFileWriter = new 
DataFileWriter(dataModel.createDatumWriter(writerSchema));
+for (Map.Entry entry : metadata.entrySet()) {
+  this.mAvroFileWriter.setMeta(entry.getKey(), entry.getValue());
+}
 
 Review comment:
   this is the diff, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

MaxGekk commented on a change in pull request #28102: [SPARK-31327][SQL] Write 
Spark version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#discussion_r402443422
 
 

 ##
 File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriter.scala
 ##
 @@ -45,20 +48,19 @@ private[avro] class AvroOutputWriter(
* Overrides the couple of methods responsible for generating the output 
streams / files so
* that the data can be correctly partitioned
*/
-  private val recordWriter: RecordWriter[AvroKey[GenericRecord], NullWritable] 
=
-new AvroKeyOutputFormat[GenericRecord]() {
-
+  private val recordWriter: RecordWriter[AvroKey[GenericRecord], NullWritable] 
= {
+val sparkVersion = Map(SPARK_VERSION_METADATA_KEY -> 
SPARK_VERSION_SHORT).asJava
 
 Review comment:
   Could you update the comment 
https://github.com/apache/spark/blob/86cc907448f0102ad0c185e87fcc897d0a32707f/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L50-L51


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a 
plan explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607949595
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120722/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] 
Control a plan explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607949595
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120722/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] 
Control a plan explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607949582
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a 
plan explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607949582
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #28097: [SPARK-31325][SQL][Web UI] Control a 
plan explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607800148
 
 
   **[Test build #120722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120722/testReport)**
 for PR 28097 at commit 
[`6ba5d18`](https://github.com/apache/spark/commit/6ba5d187436ea0d307ff8bfd5780539a57084548).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan explain mode in the events of SQL listeners via SQLConf

2020-04-02 Thread GitBox

SparkQA commented on issue #28097: [SPARK-31325][SQL][Web UI] Control a plan 
explain mode in the events of SQL listeners via SQLConf
URL: https://github.com/apache/spark/pull/28097#issuecomment-607948487
 
 
   **[Test build #120722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120722/testReport)**
 for PR 28097 at commit 
[`6ba5d18`](https://github.com/apache/spark/commit/6ba5d187436ea0d307ff8bfd5780539a57084548).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #28099: [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference

2020-04-02 Thread GitBox

huaxingao commented on a change in pull request #28099: 
[SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference
URL: https://github.com/apache/spark/pull/28099#discussion_r402423891
 
 

 ##
 File path: docs/sql-ref-functions-builtin.md
 ##
 @@ -1,25 +1,26 @@
 ---
 layout: global
-title: Reference
-displayTitle: Reference
+title: Built-in Functions
+displayTitle: Built-in Functions
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
  http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
 
-Spark SQL is Apache Spark's module for working with structured data.
-This guide is a reference for Structured Query Language (SQL) for Apache 
-Spark. This document describes the SQL constructs supported by Spark in detail
-along with usage examples when applicable.
+Spark SQL defines built-in functions to use, a complete list of which can be 
found [here](api/sql/). Among them, Spark SQL has several special categories of 
built-in functions: [Aggregate 
Functions](sql-ref-functions-builtin-aggregate.html) to operate on a group of 
rows, [Array Functions](sql-ref-functions-builtin-array.html) to operate on 
Array columns, and [Date and Time 
Functions](sql-ref-functions-builtin-date-time.html) to operate on Date and 
Time.
 
 Review comment:
   I initially wanted to have scalar built-in functions and aggregate built-in 
functions to be parallel to the sections in UDFs, but then I think it's not 
worthwhile to document scalar built-in functions separately. I guess it makes 
more sense to document some special functions, such as array functions, date 
and time functions, and maybe window functions and string functions as well. I 
also don't want to document too many here, because I need to finish quickly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-04-02 Thread GitBox

nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track 
Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#issuecomment-607930617
 
 
   Following a common pattern in Python projects (e.g. Warehouse, Trio, my own 
Flintrock), I'm specifying broad development dependencies and then using 
pip-tools to compile them down to a pinned list of requirements.
   
   The broad requirements are for contributors to use locally, and the pinned 
requirements are for use by CI and our release tooling.
   
   I believe this should make everyone happy. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

2020-04-02 Thread GitBox

rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax 
(WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607927357
 
 
   > Sorry if this is answered above but why do we handle v2 APIs here together?
   
   Create statement plans are already converted to v2 because the DataSource 
create syntax is supported. If the conversion to v2 plans didn't handle the new 
statement fields, then the new data would be ignored and that's a correctness 
error.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607917767
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25428/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607917750
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607917750
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607917767
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25428/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607916880
 
 
   **[Test build #120730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120730/testReport)**
 for PR 28101 at commit 
[`1654629`](https://github.com/apache/spark/commit/1654629e6d744fb6207de8d7c91ecd4758b342c5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

MaxGekk commented on a change in pull request #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#discussion_r402401376
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ##
 @@ -821,4 +821,20 @@ class DateTimeUtilsSuite extends SparkFunSuite with 
Matchers with SQLHelper {
   days += 1
 }
   }
+
+  test("rebasing overlapped timestamps during daylight saving time") {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

MaxGekk commented on a change in pull request #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#discussion_r402401225
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -1030,7 +1033,13 @@ object DateTimeUtils {
   cal.get(Calendar.SECOND),
   (Math.floorMod(micros, MICROS_PER_SECOND) * NANOS_PER_MICROS).toInt)
   .plusDays(cal.get(Calendar.DAY_OF_MONTH) - 1)
-instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
+val zonedDateTime = localDateTime.atZone(ZoneId.systemDefault)
+val adjustedZdt = if (cal.get(Calendar.DST_OFFSET) == 0) {
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402369581
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 ))
   }
 
+  test("getLatestBatchId") {
+withCountOpenLocalFileSystemAsLocalFileSystem {
+  val scheme = CountOpenLocalFileSystem.scheme
+  withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+withTempDir { file =>
+  val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+s"$scheme:///${file.getCanonicalPath}")
+  for (batchId <- 0 to 2) {
+sinkLog.add(
+  batchId,
+  Array(newFakeSinkFileStatus("/a/b/" + batchId, 
FileStreamSinkLog.ADD_ACTION)))
+  }
+
+  def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+CountOpenLocalFileSystem.pathToNumOpenCalled
+  .get(path).map(_.get()).getOrElse(0)
+  }
+
+  val curCount = getCountForOpenOnMetadataFile(2)
+
+  assert(sinkLog.getLatestBatchId() === Some(2))
 
 Review comment:
   Nit: s/2/2L/


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402362153
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 ))
   }
 
+  test("getLatestBatchId") {
+withCountOpenLocalFileSystemAsLocalFileSystem {
+  val scheme = CountOpenLocalFileSystem.scheme
+  withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+withTempDir { file =>
+  val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+s"$scheme:///${file.getCanonicalPath}")
+  for (batchId <- 0 to 2) {
+sinkLog.add(
+  batchId,
+  Array(newFakeSinkFileStatus("/a/b/" + batchId, 
FileStreamSinkLog.ADD_ACTION)))
+  }
+
+  def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+CountOpenLocalFileSystem.pathToNumOpenCalled
+  .get(path).map(_.get()).getOrElse(0)
 
 Review comment:
   Nit: no linebreak needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402388241
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 ))
   }
 
+  test("getLatestBatchId") {
+withCountOpenLocalFileSystemAsLocalFileSystem {
+  val scheme = CountOpenLocalFileSystem.scheme
+  withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+withTempDir { file =>
+  val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+s"$scheme:///${file.getCanonicalPath}")
+  for (batchId <- 0 to 2) {
+sinkLog.add(
+  batchId,
+  Array(newFakeSinkFileStatus("/a/b/" + batchId, 
FileStreamSinkLog.ADD_ACTION)))
+  }
+
+  def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+CountOpenLocalFileSystem.pathToNumOpenCalled
+  .get(path).map(_.get()).getOrElse(0)
+  }
+
+  val curCount = getCountForOpenOnMetadataFile(2)
+
+  assert(sinkLog.getLatestBatchId() === Some(2))
+  // getLatestBatchId doesn't open the latest metadata log file
+  assert(getCountForOpenOnMetadataFile(2L) === curCount)
 
 Review comment:
   Maybe worth to check other batches as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402386522
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -267,4 +308,38 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 val log = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark, 
input.toString)
 log.allFiles()
   }
+
+  private def withCountOpenLocalFileSystemAsLocalFileSystem(body: => Unit): 
Unit = {
+val optionKey = s"fs.${CountOpenLocalFileSystem.scheme}.impl"
+val originClassForLocalFileSystem = spark.conf.getOption(optionKey)
+try {
+  spark.conf.set(optionKey, classOf[CountOpenLocalFileSystem].getName)
+  body
+} finally {
+  originClassForLocalFileSystem match {
+case Some(fsClazz) => spark.conf.set(optionKey, fsClazz)
+case _ => spark.conf.unset(optionKey)
+  }
+}
+  }
+}
+
+class CountOpenLocalFileSystem extends RawLocalFileSystem {
+  import CountOpenLocalFileSystem._
+
+  override def getUri: URI = {
+URI.create(s"$scheme:///")
+  }
+
+  override def open(f: Path, bufferSize: Int): FSDataInputStream = {
+val path = f.toUri.getPath
+val curVal = pathToNumOpenCalled.getOrElseUpdate(path, new AtomicLong(0))
+curVal.incrementAndGet()
+super.open(f, bufferSize)
+  }
+}
+
+object CountOpenLocalFileSystem {
+  val scheme = s"FileStreamSinkLogSuite${math.abs(Random.nextInt)}fs"
+  val pathToNumOpenCalled = new mutable.HashMap[String, AtomicLong]
 
 Review comment:
   Some reset functionality would be good to make it re-usable. This would also 
make `curCount` disappear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402362508
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 ))
   }
 
+  test("getLatestBatchId") {
+withCountOpenLocalFileSystemAsLocalFileSystem {
+  val scheme = CountOpenLocalFileSystem.scheme
+  withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+withTempDir { file =>
 
 Review comment:
   If it's a dir maybe we can call it `dir` or `path`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

2020-04-02 Thread GitBox

gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] 
FileStreamSink: Avoid reading the metadata log file when finding the latest 
batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402369944
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with 
SharedSparkSession {
 ))
   }
 
+  test("getLatestBatchId") {
+withCountOpenLocalFileSystemAsLocalFileSystem {
+  val scheme = CountOpenLocalFileSystem.scheme
+  withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+withTempDir { file =>
+  val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+s"$scheme:///${file.getCanonicalPath}")
+  for (batchId <- 0 to 2) {
+sinkLog.add(
+  batchId,
+  Array(newFakeSinkFileStatus("/a/b/" + batchId, 
FileStreamSinkLog.ADD_ACTION)))
+  }
+
+  def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+CountOpenLocalFileSystem.pathToNumOpenCalled
+  .get(path).map(_.get()).getOrElse(0)
+  }
+
+  val curCount = getCountForOpenOnMetadataFile(2)
 
 Review comment:
   Nit: s/2/2L/


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607902037
 
 
   @bmarcott  sorry for my delay on this. If you are able to update this, I 
think its really close and it would be nice to get into 3.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402383913
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
 ##
 @@ -196,6 +196,241 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!failedTaskSet)
   }
 
+  def setupTaskScheduler(clock: ManualClock): TaskSchedulerImpl = {
+val conf = new SparkConf()
+sc = new SparkContext("local", "TaskSchedulerImplSuite", conf)
+val taskScheduler = new TaskSchedulerImpl(sc,
+  sc.conf.get(config.TASK_MAX_FAILURES),
+  clock = clock) {
+  override def createTaskSetManager(taskSet: TaskSet, maxTaskFailures: 
Int): TaskSetManager = {
+new TaskSetManager(this, taskSet, maxTaskFailures, 
blacklistTrackerOpt, clock)
+  }
+  override def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
+// Don't shuffle the offers around for this test.  Instead, we'll just 
pass in all
+// the permutations we care about directly.
+offers
+  }
+}
+// Need to initialize a DAGScheduler for the taskScheduler to use for 
callbacks.
+new DAGScheduler(sc, taskScheduler) {
+  override def taskStarted(task: Task[_], taskInfo: TaskInfo): Unit = {}
+
+  override def executorAdded(execId: String, host: String): Unit = {}
+}
+taskScheduler.initialize(new FakeSchedulerBackend)
+val taskSet = FakeTask.createTaskSet(8, 1, 1,
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1"))
+)
+
+// Offer resources first so that when the taskset is submitted it can 
initialize
+// with proper locality level. Otherwise, ANY would be the only locality 
level.
+// See TaskSetManager.computeValidLocalityLevels()
+// This begins the task set as PROCESS_LOCAL locality level
+taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("exec1", "host1", 1)))
+taskScheduler.submitTasks(taskSet)
+taskScheduler
+  }
+
+  test("SPARK-18886 - partial offers (isAllFreeResources = false) reset timer 
before " +
+"any resources have been rejected") {
+val clock = new ManualClock()
+// All tasks created here are local to exec1, host1.
+// Locality level starts at PROCESS_LOCAL.
+val taskScheduler = setupTaskScheduler(clock)
+// Locality levels increase at 3000 ms.
+val advanceAmount = 3000
+
+// Advancing clock increases locality level to NODE_LOCAL.
+clock.advance(advanceAmount)
+
+// If there hasn't yet been any full resource offers,
+// partial resource (isAllFreeResources = false) offers reset delay 
scheduling
+// if this and previous offers were accepted.
+// This line resets the timer and locality level is reset to PROCESS_LOCAL.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec1", "host1", 1)),
+isAllFreeResources = false)
+  .flatten.length === 1)
+
+// This NODE_LOCAL task should not be accepted.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec2", "host1", 1)),
+isAllFreeResources = false)
+  .flatten.isEmpty)
+  }
+
+  test("SPARK-18886 - delay scheduling timer is reset when it accepts all 
resources offered when" +
+"isAllFreeResources = true") {
+val clock = new ManualClock()
+// All tasks created here are local to exec1, host1.
+// Locality level starts at PROCESS_LOCAL.
+val taskScheduler = setupTaskScheduler(clock)
+// Locality levels increase at 3000 ms.
+val advanceAmount = 3000
+
+// Advancing clock increases locality level to NODE_LOCAL.
+clock.advance(advanceAmount)
+
+// If there are no rejects on an all resource offer, delay scheduling is 
reset.
+// This line resets the timer and locality level is reset to PROCESS_LOCAL.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec1", "host1", 1)),
+isAllFreeResources = true)
+  .flatten.length === 1)
+
+// This NODE_LOCAL task should not be accepted.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec2", "host1", 1)),
+isAllFreeResources = false)
+  .flatten.isEmpty)
+  }
+
+  test("SPARK-18886 - partial resource offers (isAllFreeResources = false) 
reset " +
+"time if last full resource offer (isAllResources = true) was accepted as 
well as any " +
+"following partial resource offers") {
+val clock = new ManualClock()
+// All tasks created

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402380407
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
 ##
 @@ -196,6 +196,241 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!failedTaskSet)
   }
 
+  def setupTaskScheduler(clock: ManualClock): TaskSchedulerImpl = {
+val conf = new SparkConf()
+sc = new SparkContext("local", "TaskSchedulerImplSuite", conf)
+val taskScheduler = new TaskSchedulerImpl(sc,
+  sc.conf.get(config.TASK_MAX_FAILURES),
+  clock = clock) {
+  override def createTaskSetManager(taskSet: TaskSet, maxTaskFailures: 
Int): TaskSetManager = {
+new TaskSetManager(this, taskSet, maxTaskFailures, 
blacklistTrackerOpt, clock)
+  }
+  override def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
+// Don't shuffle the offers around for this test.  Instead, we'll just 
pass in all
+// the permutations we care about directly.
+offers
+  }
+}
+// Need to initialize a DAGScheduler for the taskScheduler to use for 
callbacks.
+new DAGScheduler(sc, taskScheduler) {
+  override def taskStarted(task: Task[_], taskInfo: TaskInfo): Unit = {}
+
+  override def executorAdded(execId: String, host: String): Unit = {}
+}
+taskScheduler.initialize(new FakeSchedulerBackend)
+val taskSet = FakeTask.createTaskSet(8, 1, 1,
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1")),
+  Seq(TaskLocation("host1", "exec1"))
+)
+
+// Offer resources first so that when the taskset is submitted it can 
initialize
+// with proper locality level. Otherwise, ANY would be the only locality 
level.
+// See TaskSetManager.computeValidLocalityLevels()
+// This begins the task set as PROCESS_LOCAL locality level
+taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("exec1", "host1", 1)))
+taskScheduler.submitTasks(taskSet)
+taskScheduler
+  }
+
+  test("SPARK-18886 - partial offers (isAllFreeResources = false) reset timer 
before " +
+"any resources have been rejected") {
+val clock = new ManualClock()
+// All tasks created here are local to exec1, host1.
+// Locality level starts at PROCESS_LOCAL.
+val taskScheduler = setupTaskScheduler(clock)
+// Locality levels increase at 3000 ms.
+val advanceAmount = 3000
+
+// Advancing clock increases locality level to NODE_LOCAL.
+clock.advance(advanceAmount)
+
+// If there hasn't yet been any full resource offers,
+// partial resource (isAllFreeResources = false) offers reset delay 
scheduling
+// if this and previous offers were accepted.
+// This line resets the timer and locality level is reset to PROCESS_LOCAL.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec1", "host1", 1)),
+isAllFreeResources = false)
+  .flatten.length === 1)
+
+// This NODE_LOCAL task should not be accepted.
+assert(taskScheduler
+  .resourceOffers(
+IndexedSeq(WorkerOffer("exec2", "host1", 1)),
+isAllFreeResources = false)
+  .flatten.isEmpty)
+  }
+
+  test("SPARK-18886 - delay scheduling timer is reset when it accepts all 
resources offered when" +
+"isAllFreeResources = true") {
 
 Review comment:
   nit need space before isAllFreeResources


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402373537
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
 ##
 @@ -196,6 +196,241 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!failedTaskSet)
   }
 
+  def setupTaskScheduler(clock: ManualClock): TaskSchedulerImpl = {
 
 Review comment:
   there is already a setupScheduler function, perhaps make this name more 
specific for setupTaskSchedulerForLocalityTests


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27897: [SPARK-31113][SQL] Add SHOW 
VIEWS command
URL: https://github.com/apache/spark/pull/27897#issuecomment-607890263
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25427/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27897: [SPARK-31113][SQL] Add SHOW 
VIEWS command
URL: https://github.com/apache/spark/pull/27897#issuecomment-607890248
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS 
command
URL: https://github.com/apache/spark/pull/27897#issuecomment-607890248
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS 
command
URL: https://github.com/apache/spark/pull/27897#issuecomment-607890263
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25427/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add 
SHOW VIEWS command
URL: https://github.com/apache/spark/pull/27897#discussion_r402369650
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -295,6 +295,41 @@ case class AlterViewAsCommand(
   }
 }
 
+/**
+ * A command for users to get views in the given database.
+ * If a databaseName is not given, the current database will be used.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   SHOW VIEWS [(IN|FROM) database_name] [[LIKE] 'identifier_with_wildcards'];
+ * }}}
+ */
+case class ShowViewsCommand(
+databaseName: Option[String],
+tableIdentifierPattern: Option[String]) extends RunnableCommand {
+
+  // The result of SHOW VIEWS has three basic columns: namespace, viewName and 
isTemporary.
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("namespace", StringType, nullable = false)(),
+AttributeReference("viewName", StringType, nullable = false)(),
+AttributeReference("isTemporary", BooleanType, nullable = false)())
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val db = databaseName.getOrElse(catalog.getCurrentDatabase)
+
+// Show the information of views.
+val views = tableIdentifierPattern.map(catalog.listViews(db, _))
+  .getOrElse(catalog.listViews(db, "*"))
+views.map { tableIdent =>
+  val namespace = tableIdent.database.getOrElse("")
 
 Review comment:
   Yea, I see. updated in 6a4348ed9e289c6ca3cc557f9a575e1829f53fc0. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add 
SHOW VIEWS command
URL: https://github.com/apache/spark/pull/27897#discussion_r402369650
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -295,6 +295,41 @@ case class AlterViewAsCommand(
   }
 }
 
+/**
+ * A command for users to get views in the given database.
+ * If a databaseName is not given, the current database will be used.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   SHOW VIEWS [(IN|FROM) database_name] [[LIKE] 'identifier_with_wildcards'];
+ * }}}
+ */
+case class ShowViewsCommand(
+databaseName: Option[String],
+tableIdentifierPattern: Option[String]) extends RunnableCommand {
+
+  // The result of SHOW VIEWS has three basic columns: namespace, viewName and 
isTemporary.
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("namespace", StringType, nullable = false)(),
+AttributeReference("viewName", StringType, nullable = false)(),
+AttributeReference("isTemporary", BooleanType, nullable = false)())
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val db = databaseName.getOrElse(catalog.getCurrentDatabase)
+
+// Show the information of views.
+val views = tableIdentifierPattern.map(catalog.listViews(db, _))
+  .getOrElse(catalog.listViews(db, "*"))
+views.map { tableIdent =>
+  val namespace = tableIdent.database.getOrElse("")
 
 Review comment:
   Yea, I see. updated in 6a4348ed9e289c6ca3cc557f9a575e1829f53fc0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

Eric5553 commented on a change in pull request #27897: [SPARK-31113][SQL] Add 
SHOW VIEWS command
URL: https://github.com/apache/spark/pull/27897#discussion_r402369650
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -295,6 +295,41 @@ case class AlterViewAsCommand(
   }
 }
 
+/**
+ * A command for users to get views in the given database.
+ * If a databaseName is not given, the current database will be used.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   SHOW VIEWS [(IN|FROM) database_name] [[LIKE] 'identifier_with_wildcards'];
+ * }}}
+ */
+case class ShowViewsCommand(
+databaseName: Option[String],
+tableIdentifierPattern: Option[String]) extends RunnableCommand {
+
+  // The result of SHOW VIEWS has three basic columns: namespace, viewName and 
isTemporary.
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("namespace", StringType, nullable = false)(),
+AttributeReference("viewName", StringType, nullable = false)(),
+AttributeReference("isTemporary", BooleanType, nullable = false)())
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val db = databaseName.getOrElse(catalog.getCurrentDatabase)
+
+// Show the information of views.
+val views = tableIdentifierPattern.map(catalog.listViews(db, _))
+  .getOrElse(catalog.listViews(db, "*"))
+views.map { tableIdent =>
+  val namespace = tableIdent.database.getOrElse("")
 
 Review comment:
   I see, updated in 6a4348ed9e289c6ca3cc557f9a575e1829f53fc0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command

2020-04-02 Thread GitBox

SparkQA commented on issue #27897: [SPARK-31113][SQL] Add SHOW VIEWS command
URL: https://github.com/apache/spark/pull/27897#issuecomment-607889396
 
 
   **[Test build #120729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120729/testReport)**
 for PR 27897 at commit 
[`6a4348e`](https://github.com/apache/spark/commit/6a4348ed9e289c6ca3cc557f9a575e1829f53fc0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607885531
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25426/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607885520
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607885531
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25426/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607885520
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time 
measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607884832
 
 
   **[Test build #120728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120728/testReport)**
 for PR 27207 at commit 
[`8d35ea1`](https://github.com/apache/spark/commit/8d35ea14edfb8bd7cc276020784c13caf030532a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402320307
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##
 @@ -319,20 +336,38 @@ private[spark] class TaskSchedulerImpl(
 taskSetsByStageIdAndAttempt -= manager.taskSet.stageId
   }
 }
+resetOnPreviousOffer -= manager.taskSet
 manager.parent.removeSchedulable(manager)
 logInfo(s"Removed TaskSet ${manager.taskSet.id}, whose tasks have all 
completed, from pool" +
   s" ${manager.parent.name}")
   }
 
+  /**
+   * Offers resources to a single [[TaskSetManager]] at a given max allowed 
[[TaskLocality]].
+   *
+   * @param taskSet task set to offer resources to
 
 Review comment:
   this is actually task set manager


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-607884003
 
 
   test this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402322826
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##
 @@ -466,12 +503,28 @@ private[spark] class TaskSchedulerImpl(
 }.sum
   }
 
+  def minTaskLocality(
 
 Review comment:
   make private


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-04-02 Thread GitBox

tgravescs commented on a change in pull request #27207: [SPARK-18886][CORE] 
Make Locality wait time measure resource under utilization due to delay 
scheduling.
URL: https://github.com/apache/spark/pull/27207#discussion_r402360614
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
 ##
 @@ -898,18 +1083,17 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 }
 
 // Here is the main check of this test -- we have the same offers again, 
and we schedule it
-// successfully.  Because the scheduler first tries to schedule with 
locality in mind, at first
-// it won't schedule anything on executor1.  But despite that, we don't 
abort the job.  Then the
-// scheduler tries for ANY locality, and successfully schedules tasks on 
executor1.
+// successfully.  Because the scheduler tries to schedule with locality in 
mind, at first
+// it won't schedule anything on executor1.  But despite that, we don't 
abort the job.
 val secondTaskAttempts = taskScheduler.resourceOffers(offers).flatten
-assert(secondTaskAttempts.size == 2)
-secondTaskAttempts.foreach { taskAttempt => assert("executor1" === 
taskAttempt.executorId) }
 
 Review comment:
   I see we are still on node local locality since we didn't reject any so we 
have to wait the timeout here so these are rejected at this point.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607878852
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120720/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607878845
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607878845
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607878852
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120720/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607746808
 
 
   **[Test build #120720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120720/testReport)**
 for PR 28101 at commit 
[`b761ac1`](https://github.com/apache/spark/commit/b761ac141b3d92f92785a2b7601760a9cec59ed7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

SparkQA commented on issue #28101: [SPARK-31328][SQL] Fix rebasing of 
overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#issuecomment-607877641
 
 
   **[Test build #120720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120720/testReport)**
 for PR 28101 at commit 
[`b761ac1`](https://github.com/apache/spark/commit/b761ac141b3d92f92785a2b7601760a9cec59ed7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] 
Decommission cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607872436
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #27864: [SPARK-20732][CORE] 
Decommission cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607872448
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25425/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607872448
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25425/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607872436
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

SparkQA commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-607871644
 
 
   **[Test build #120727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120727/testReport)**
 for PR 27864 at commit 
[`076dd67`](https://github.com/apache/spark/commit/076dd671d9a9ac3c8d5926279f54517b8b39426a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] prakharjain09 commented on a change in pull request #27864: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-02 Thread GitBox

prakharjain09 commented on a change in pull request #27864: [SPARK-20732][CORE] 
Decommission cache blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#discussion_r402343206
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -1761,6 +1776,57 @@ private[spark] class BlockManager(
 blocksToRemove.size
   }
 
+  def decommissionBlockManager(): Unit = {
+if (!blockManagerDecommissioning) {
+  logInfo("Starting block manager decommissioning process")
+  blockManagerDecommissioning = true
+  decommissionManager = Some(new BlockManagerDecommissionManager)
+  decommissionManager.foreach(_.start())
+}
+  }
+
+  /**
+   * Tries to offload all cached RDD blocks from this BlockManager to peer 
BlockManagers
+   * Visible for testing
+   */
+  def offloadRddCacheBlocks(): Unit = {
+val replicateBlocksInfo = 
master.getReplicateInfoForRDDBlocks(blockManagerId)
+
+if (replicateBlocksInfo.nonEmpty) {
+  logInfo(s"Need to replicate ${replicateBlocksInfo.size} blocks " +
+s"for block manager decommissioning")
+  // Refresh peer list once before starting replication
+  getPeers(true)
+}
+
+// Maximum number of storage replication failure which replicateBlock can 
handle
+// before giving up for one block
 
 Review comment:
   @dongjoon-hyun Renamed to 
"spark.storage.decommission.maxReplicationFailuresPerBlock". Any suggestions on 
2) ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-04-02 Thread GitBox

SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec 
should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607867530
 
 
   **[Test build #120726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)**
 for PR 27724 at commit 
[`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28089: [SPARK-30921][PySpark] Predicates on python udf should not be pushdown through Aggregate

2020-04-02 Thread GitBox

HyukjinKwon commented on issue #28089: [SPARK-30921][PySpark] Predicates on 
python udf should not be pushdown through Aggregate
URL: https://github.com/apache/spark/pull/28089#issuecomment-607866353
 
 
   Will take a look tomorrow


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark 
version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#issuecomment-607864523
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120725/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

cloud-fan commented on a change in pull request #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#discussion_r402337523
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ##
 @@ -821,4 +821,20 @@ class DateTimeUtilsSuite extends SparkFunSuite with 
Matchers with SQLHelper {
   days += 1
 }
   }
+
+  test("rebasing overlapped timestamps during daylight saving time") {
 
 Review comment:
   can we add JIRA ID?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28101: [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time

2020-04-02 Thread GitBox

cloud-fan commented on a change in pull request #28101: [SPARK-31328][SQL] Fix 
rebasing of overlapped local timestamps during daylight saving time
URL: https://github.com/apache/spark/pull/28101#discussion_r402336635
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -1030,7 +1033,13 @@ object DateTimeUtils {
   cal.get(Calendar.SECOND),
   (Math.floorMod(micros, MICROS_PER_SECOND) * NANOS_PER_MICROS).toInt)
   .plusDays(cal.get(Calendar.DAY_OF_MONTH) - 1)
-instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
+val zonedDateTime = localDateTime.atZone(ZoneId.systemDefault)
+val adjustedZdt = if (cal.get(Calendar.DST_OFFSET) == 0) {
 
 Review comment:
   can we add some comment here? people may not understand what 
`cal.get(Calendar.DST_OFFSET) == 0` means


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

AmplabJenkins removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark 
version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#issuecomment-607864513
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

SparkQA removed a comment on issue #28102: [SPARK-31327][SQL] Write Spark 
version into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#issuecomment-607858705
 
 
   **[Test build #120725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120725/testReport)**
 for PR 28102 at commit 
[`5aaf2db`](https://github.com/apache/spark/commit/5aaf2db549e3b6d441ea01fed6bb0440f5eff02d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28102: [SPARK-31327][SQL] Write Spark version into Avro file metadata

2020-04-02 Thread GitBox

AmplabJenkins commented on issue #28102: [SPARK-31327][SQL] Write Spark version 
into Avro file metadata
URL: https://github.com/apache/spark/pull/28102#issuecomment-607864513
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

301 - 400 of 691 matches

Mail list logo