date:20180822

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-08-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r212163122
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -183,34 +178,106 @@ private[sql] object ArrowConverters {
   }
 
   /**
-   * Convert a byte array to an ArrowRecordBatch.
+   * Load a serialized ArrowRecordBatch.
*/
-  private[arrow] def byteArrayToBatch(
+  private[arrow] def loadBatch(
   batchBytes: Array[Byte],
   allocator: BufferAllocator): ArrowRecordBatch = {
-val in = new ByteArrayReadableSeekableByteChannel(batchBytes)
-val reader = new ArrowFileReader(in, allocator)
-
-// Read a batch from a byte stream, ensure the reader is closed
-Utils.tryWithSafeFinally {
-  val root = reader.getVectorSchemaRoot  // throws IOException
-  val unloader = new VectorUnloader(root)
-  reader.loadNextBatch()  // throws IOException
-  unloader.getRecordBatch
-} {
-  reader.close()
-}
+val in = new ByteArrayInputStream(batchBytes)
+MessageSerializer.deserializeRecordBatch(
+  new ReadChannel(Channels.newChannel(in)), allocator)  // throws 
IOException
   }
 
+  /**
+   * Create a DataFrame from a JavaRDD of serialized ArrowRecordBatches.
+   */
   private[sql] def toDataFrame(
-  payloadRDD: JavaRDD[Array[Byte]],
+  arrowBatchRDD: JavaRDD[Array[Byte]],
   schemaString: String,
   sqlContext: SQLContext): DataFrame = {
-val rdd = payloadRDD.rdd.mapPartitions { iter =>
+val schema = DataType.fromJson(schemaString).asInstanceOf[StructType]
+val timeZoneId = sqlContext.sessionState.conf.sessionLocalTimeZone
+val rdd = arrowBatchRDD.rdd.mapPartitions { iter =>
   val context = TaskContext.get()
-  ArrowConverters.fromPayloadIterator(iter.map(new ArrowPayload(_)), 
context)
+  ArrowConverters.fromBatchIterator(iter, schema, timeZoneId, context)
 }
-val schema = DataType.fromJson(schemaString).asInstanceOf[StructType]
 sqlContext.internalCreateDataFrame(rdd, schema)
   }
+
+  /**
+   * Read a file as an Arrow stream and parallelize as an RDD of 
serialized ArrowRecordBatches.
+   */
+  private[sql] def readArrowStreamFromFile(
+  sqlContext: SQLContext,
+  filename: String): JavaRDD[Array[Byte]] = {
+val fileStream = new FileInputStream(filename)
+try {
+  // Create array so that we can safely close the file
+  val batches = getBatchesFromStream(fileStream.getChannel).toArray
+  // Parallelize the record batches to create an RDD
+  JavaRDD.fromRDD(sqlContext.sparkContext.parallelize(batches, 
batches.length))
--- End diff --

@BryanCutler, why did this parallelize with the length of batches size? I 
thought the data size is usually small and wondering if it necessarily speeds 
up in general. Did I misread?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-08-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r212162307
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -183,34 +178,106 @@ private[sql] object ArrowConverters {
   }
 
   /**
-   * Convert a byte array to an ArrowRecordBatch.
+   * Load a serialized ArrowRecordBatch.
*/
-  private[arrow] def byteArrayToBatch(
+  private[arrow] def loadBatch(
   batchBytes: Array[Byte],
   allocator: BufferAllocator): ArrowRecordBatch = {
-val in = new ByteArrayReadableSeekableByteChannel(batchBytes)
-val reader = new ArrowFileReader(in, allocator)
-
-// Read a batch from a byte stream, ensure the reader is closed
-Utils.tryWithSafeFinally {
-  val root = reader.getVectorSchemaRoot  // throws IOException
-  val unloader = new VectorUnloader(root)
-  reader.loadNextBatch()  // throws IOException
-  unloader.getRecordBatch
-} {
-  reader.close()
-}
+val in = new ByteArrayInputStream(batchBytes)
+MessageSerializer.deserializeRecordBatch(
+  new ReadChannel(Channels.newChannel(in)), allocator)  // throws 
IOException
   }
 
+  /**
+   * Create a DataFrame from a JavaRDD of serialized ArrowRecordBatches.
+   */
   private[sql] def toDataFrame(
-  payloadRDD: JavaRDD[Array[Byte]],
+  arrowBatchRDD: JavaRDD[Array[Byte]],
   schemaString: String,
   sqlContext: SQLContext): DataFrame = {
-val rdd = payloadRDD.rdd.mapPartitions { iter =>
+val schema = DataType.fromJson(schemaString).asInstanceOf[StructType]
+val timeZoneId = sqlContext.sessionState.conf.sessionLocalTimeZone
+val rdd = arrowBatchRDD.rdd.mapPartitions { iter =>
   val context = TaskContext.get()
-  ArrowConverters.fromPayloadIterator(iter.map(new ArrowPayload(_)), 
context)
+  ArrowConverters.fromBatchIterator(iter, schema, timeZoneId, context)
 }
-val schema = DataType.fromJson(schemaString).asInstanceOf[StructType]
 sqlContext.internalCreateDataFrame(rdd, schema)
   }
+
+  /**
+   * Read a file as an Arrow stream and parallelize as an RDD of 
serialized ArrowRecordBatches.
+   */
+  private[sql] def readArrowStreamFromFile(
+  sqlContext: SQLContext,
+  filename: String): JavaRDD[Array[Byte]] = {
+val fileStream = new FileInputStream(filename)
--- End diff --

nit: i would do `tryWithResource`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...

2018-08-22 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22163#discussion_r212162206
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -206,14 +211,21 @@ private void writeSortedFile(boolean isLastFile) {
   long recordReadPosition = recordOffsetInPage + uaoSize; // skip over 
record length
   while (dataRemaining > 0) {
 final int toTransfer = Math.min(diskWriteBufferSize, 
dataRemaining);
-Platform.copyMemory(
-  recordPage, recordReadPosition, writeBuffer, 
Platform.BYTE_ARRAY_OFFSET, toTransfer);
-writer.write(writeBuffer, 0, toTransfer);
+if (bufferOffset > 0 && bufferOffset + toTransfer > 
DISK_WRITE_BUFFER_SIZE) {
--- End diff --

Actually, it doesn't need to change.
The result  `numRecordsWritten` has no effect,  tt's only written in 
`diskWriteBuffer` before, but now it's written to `writeBuffer`.
The bytes which has Written will be updated in `commitAndGet`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22180: [SPARK-25174][YARN]Limit the size of diagnostic message ...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22180
  
**[Test build #95132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95132/testReport)**
 for PR 22180 at commit 
[`3271c3f`](https://github.com/apache/spark/commit/3271c3f4100e0d69fe30400a42ab35aaab1c7c48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22180: [SPARK-25174][YARN]Limit the size of diagnostic message ...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22180
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22180: [SPARK-25174][YARN]Limit the size of diagnostic message ...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22180
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2468/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22180: [SPARK-25174][YARN]Limit the size of diagnostic m...

2018-08-22 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/22180#discussion_r212161599
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -368,7 +369,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
 logInfo(s"Final app status: $finalStatus, exitCode: $exitCode" +
   Option(msg).map(msg => s", (reason: $msg)").getOrElse(""))
-finalMsg = msg
+finalMsg = if (msg == null || msg.length <= finalMsgLimitSize) {
--- End diff --

that's better, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-08-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r212161411
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3268,13 +3268,49 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   * Collect a Dataset as Arrow batches and serve stream to PySpark.
*/
   private[sql] def collectAsArrowToPython(): Array[Any] = {
+val timeZoneId = sparkSession.sessionState.conf.sessionLocalTimeZone
+
 withAction("collectAsArrowToPython", queryExecution) { plan =>
-  val iter: Iterator[Array[Byte]] =
-toArrowPayload(plan).collect().iterator.map(_.asPythonSerializable)
-  PythonRDD.serveIterator(iter, "serve-Arrow")
+  PythonRDD.serveToStream("serve-Arrow") { out =>
+val batchWriter = new ArrowBatchStreamWriter(schema, out, 
timeZoneId)
+val arrowBatchRdd = toArrowBatchRdd(plan)
+val numPartitions = arrowBatchRdd.partitions.length
+
+// Store collection results for worst case of 1 to N-1 partitions
--- End diff --

Is it better `0 to N-1 partitions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-08-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r212158131
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -111,65 +113,58 @@ private[sql] object ArrowConverters {
 rowCount += 1
   }
   arrowWriter.finish()
-  writer.writeBatch()
+  val batch = unloader.getRecordBatch()
+  MessageSerializer.serialize(writeChannel, batch)
+  batch.close()
--- End diff --

Should we `tryWithResouce` here too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] Directly ship the StructType objects ...

2018-08-22 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22187
  
Thanks, updated both titles


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...

2018-08-22 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22163#discussion_r212160161
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -206,14 +211,21 @@ private void writeSortedFile(boolean isLastFile) {
   long recordReadPosition = recordOffsetInPage + uaoSize; // skip over 
record length
   while (dataRemaining > 0) {
 final int toTransfer = Math.min(diskWriteBufferSize, 
dataRemaining);
-Platform.copyMemory(
-  recordPage, recordReadPosition, writeBuffer, 
Platform.BYTE_ARRAY_OFFSET, toTransfer);
-writer.write(writeBuffer, 0, toTransfer);
+if (bufferOffset > 0 && bufferOffset + toTransfer > 
DISK_WRITE_BUFFER_SIZE) {
--- End diff --

Oh, I see. If so, I'm afraid you may have to change ` 
writer.recordWritten()`'s behaviour, which just count records one bye one right 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
OK, I reran the tests for the lower column count cases, and the runs with 
the patch consistently show a tiny (1-3%) improvement compared to the master 
branch. So even the lower column count cases benefit a little.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #95131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95131/testReport)**
 for PR 20345 at commit 
[`39462fb`](https://github.com/apache/spark/commit/39462fbee952ec574b4c04d7718fd73bb5f56d9d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2467/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22163
  
**[Test build #95130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95130/testReport)**
 for PR 22163 at commit 
[`f91e18c`](https://github.com/apache/spark/commit/f91e18c7d4b8eab53c4983320a0eab0403c37a48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2466/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-22 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-22 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/22163
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...

2018-08-22 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22163#discussion_r212154100
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -206,14 +211,21 @@ private void writeSortedFile(boolean isLastFile) {
   long recordReadPosition = recordOffsetInPage + uaoSize; // skip over 
record length
   while (dataRemaining > 0) {
 final int toTransfer = Math.min(diskWriteBufferSize, 
dataRemaining);
-Platform.copyMemory(
-  recordPage, recordReadPosition, writeBuffer, 
Platform.BYTE_ARRAY_OFFSET, toTransfer);
-writer.write(writeBuffer, 0, toTransfer);
+if (bufferOffset > 0 && bufferOffset + toTransfer > 
DISK_WRITE_BUFFER_SIZE) {
--- End diff --

Thanks @Ngone51 
If we got `n` records with size n*X < diskWriteBufferSize(same as 
DISK_WRITE_BUFFER_SIZE), then we will only call writer.write() once.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #95129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95129/testReport)**
 for PR 22112 at commit 
[`93f37fa`](https://github.com/apache/spark/commit/93f37fa585462b9ee2fb9e179eab736fbc416d3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2465/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #95128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95128/testReport)**
 for PR 22112 at commit 
[`097092b`](https://github.com/apache/spark/commit/097092be4b2967689082af62715ecc4f78086c30).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2464/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21332: [SPARK-24236][SS] Continuous replacement for ShuffleExch...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21332
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95122/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21332: [SPARK-24236][SS] Continuous replacement for ShuffleExch...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21332
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21332: [SPARK-24236][SS] Continuous replacement for ShuffleExch...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21332
  
**[Test build #95122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95122/testReport)**
 for PR 21332 at commit 
[`06c6cfb`](https://github.com/apache/spark/commit/06c6cfb5e88ffcb742830dc6f38c8bdc9057552d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22112
  
BTW, I think a cleaner fix is to make shuffle files reliable(e.g. put them 
on HDFS), so that Spark will never retry a task from a finished shuffle map 
stage. Then all the problems go away, the randomness is materialized with 
shuffle files and we will not hit correctness issues. This is a big project and 
maybe we can consider it in Spark 3.0.

For now(2.4) I think failing and asking users to checkpoint is better than 
just documenting that `repartition`/`zip` may return wrong results. We also 
have a plan to reduce the possibility of failing later, by marking RDD actions 
as "repeatable".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDo...

2018-08-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22185


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22185
  
LGTM, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22193
  
**[Test build #95127 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95127/testReport)**
 for PR 22193 at commit 
[`394c9d7`](https://github.com/apache/spark/commit/394c9d71ea46ce9d90d5c1c5a0309f29d0f6f5e6).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22193
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95127/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22193
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

2018-08-22 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21756
  
From what I talked to @shrutig offline, the use-case her team has is when 
`UserGroupInformation.getCurrentUser.getAuthenticationMethod == 
AuthenticationMethod.KERBEROS`,  they want to set the remote ugi with 
`AuthMethod.KERBEROS` authentication method as the default remote ugi is using 
`AuthMethod.SIMPLE`.

I'll suggest to add 
```scala
ugi.setAuthenticationMethod(
  
UserGroupInformation.getCurrentUser.getAuthenticationMethod.getAuthMethod)
```
into `def createSparkUser(): UserGroupInformation` instead of customizing 
`SparkHadoopUtil`.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22193
  
**[Test build #95127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95127/testReport)**
 for PR 22193 at commit 
[`394c9d7`](https://github.com/apache/spark/commit/394c9d71ea46ce9d90d5c1c5a0309f29d0f6f5e6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22193
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2463/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22193
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread rdblue

GitHub user rdblue opened a pull request:

https://github.com/apache/spark/pull/22193

[SPARK-25186][SQL] Remove v2 save mode.

## What changes were proposed in this pull request?

This removes `SaveMode` from the v2 write API. Overwrite is temporarily 
implemented by deleting path-based and name-based tables and appending. These 
appends implicitly create tables and should be replaced with CTAS and RTAS.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rdblue/spark remove-v2-save-mode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22193


commit e3fcc83a4a55576821573ceb9a3a56b89218a187
Author: Ryan Blue 
Date:   2018-08-22T21:17:11Z

SPARK-25188: Add WriteConfig to v2 write API.

commit 6dbc6e09b5a50764b29050ac30eb9c4f9f8970b8
Author: Ryan Blue 
Date:   2018-08-22T21:45:17Z

SPARK-25188: BatchPartitionOverwriteSupport extends BatchWriteSupport.

commit 394c9d71ea46ce9d90d5c1c5a0309f29d0f6f5e6
Author: Ryan Blue 
Date:   2018-08-22T23:51:24Z

Remove SaveMode from the v2 write API.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22187
  
Yeah, I agreed with @rednaxelafx to directly ship the StructType objects 
looks like a better solution. +1 for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95121/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22063
  
**[Test build #95121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95121/testReport)**
 for PR 22063 at commit 
[`09c3a3b`](https://github.com/apache/spark/commit/09c3a3b5c45fb2a356b191d5699c6ba497694031).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
Please remove all of the PR template text from the description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread NiharS

GitHub user NiharS opened a pull request:

https://github.com/apache/spark/pull/22192

[SPARK-24918] Executor Plugin API

A continuation of @squito's executor plugin task. By his request I took his 
code and added testing and moved the plugin initialization to a separate thread.

## What changes were proposed in this pull request?

Executor plugins now run on one separate thread, so the executor does not 
wait on them. Added testing.

## How was this patch tested?

Added test cases that test using a sample plugin.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NiharS/spark executorPlugin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22192


commit 44454dd586e35bdf16492c4a8969494bd3b7f8f5
Author: Nihar Sheth 
Date:   2018-08-20T17:53:37Z

[SPARK-24918] Executor Plugin API

This commit adds testing and moves the plugin initialization to
a separate thread.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22185
  
+1, LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22188
  
That does seem counter intuitive, but no idea what could explain that since 
the new code seems like a straight-forward better version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/22190
  
This is related to #21308, which adds `DeleteSupport`. Both 
`BatchOverwriteSupport` and `DeleteSupport` use the same input to remove data 
(`Filter[]`) and can reject deletes that don't align with partition boundaries.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21749: [SPARK-24785] [SHELL] Making sure REPL prints Spa...

2018-08-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21749


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
Thanks @vanzin. In my benchmark tests, the tiny degradation (0.5%) in the 
lower column count cases is pretty consistent, which concerns me a little. I am 
going to re-run those tests in a different environment and see what happens.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21749: [SPARK-24785] [SHELL] Making sure REPL prints Spark UI i...

2018-08-22 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21749
  
@vanzin Thanks for the reviewing. I'll merge it as it, and have another PR 
trying to merge 2.11 and 2.12 branches. I had a positive conversation with 
Scala community to add a proper hook for us, and they agreed with it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22187
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95119/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22187
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22187
  
**[Test build #95119 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95119/testReport)**
 for PR 22187 at commit 
[`86c9c5b`](https://github.com/apache/spark/commit/86c9c5b31f8a7f8ec722ceeedc8e4812ac7159ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21749: [SPARK-24785] [SHELL] Making sure REPL prints Spark UI i...

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21749
  
The patch LGTM. I'd add a comment where Spark-specific code was inserted, 
so it's easy to find later. But ok as is too.

Not sure why it seems like a big deal to have the Scala libraries add an 
initialization hook to that class...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22191
  
**[Test build #95126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95126/testReport)**
 for PR 22191 at commit 
[`ade656a`](https://github.com/apache/spark/commit/ade656a3cf3edb32d80e66d03276584c0c86c4d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22177: [SPARK-25119][Web UI] stages in wrong order within job p...

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22177
  
You're sorting on stage ID but your description says task ID. Just a small 
inconsistency.

This is also against branch-2.3, it should be against master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22177: [SPARK-25119][Web UI] stages in wrong order withi...

2018-08-22 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22177#discussion_r212134537
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -337,7 +337,9 @@ private[ui] class JobPage(parent: JobsTab, store: 
AppStatusStore) extends WebUIP
   store.executorList(false), appStartTime)
 
 val operationGraphContent = 
store.asOption(store.operationGraphForJob(jobId)) match {
-  case Some(operationGraph) => UIUtils.showDagVizForJob(jobId, 
operationGraph)
+  case Some(operationGraph) => UIUtils.showDagVizForJob(jobId, 
operationGraph.sortWith(
+
_.rootCluster.id.replaceAll(RDDOperationGraph.STAGE_CLUSTER_PREFIX, "").toInt
--- End diff --

+1. `replaceAll` also seems like the wrong API to use, `substring` seems 
more correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22177: [SPARK-25119][Web UI] stages in wrong order withi...

2018-08-22 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22177#discussion_r212134229
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -18,18 +18,18 @@
 package org.apache.spark.ui.jobs
 
 import java.util.Locale
+
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable.{Buffer, ListBuffer}
 import scala.xml.{Node, NodeSeq, Unparsed, Utility}
-
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22188
  
LGTM. Will leave here for a bit to see if anyone else comments...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22191
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95125/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22191
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22191
  
**[Test build #95125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95125/testReport)**
 for PR 22191 at commit 
[`eec0ad0`](https://github.com/apache/spark/commit/eec0ad08e390831717203f6d002e3b1218de6d36).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22191
  
**[Test build #95125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95125/testReport)**
 for PR 22191 at commit 
[`eec0ad0`](https://github.com/apache/spark/commit/eec0ad08e390831717203f6d002e3b1218de6d36).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22191
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22191
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread jose-torres

GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/22191

[SPARK-25204][SS] Fix race in rate source test.

## What changes were proposed in this pull request?

Fix a race in the rate source tests. We need a better way of testing 
restart behavior.

## How was this patch tested?

unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark racetest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22191


commit eec0ad08e390831717203f6d002e3b1218de6d36
Author: Jose Torres 
Date:   2018-08-22T22:10:32Z

fix test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95118/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22188
  
**[Test build #95118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95118/testReport)**
 for PR 22188 at commit 
[`697de21`](https://github.com/apache/spark/commit/697de21501acbda3dbcd8ccc13a35ad3723a652e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-08-22 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r212126595
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -51,7 +57,13 @@ private[spark] class KubernetesDriverBuilder(
 provideJavaStep: (
   KubernetesConf[KubernetesDriverSpecificConf]
 => JavaDriverFeatureStep) =
-new JavaDriverFeatureStep(_)) {
+new JavaDriverFeatureStep(_),
+provideTemplateVolumeStep: (KubernetesConf[_ <: 
KubernetesRoleSpecificConf]
+  => TemplateVolumeStep) =
+new TemplateVolumeStep(_),
+provideInitialSpec: KubernetesConf[KubernetesDriverSpecificConf]
--- End diff --

You can make the initial pod and wrap it in the `KubernetesDriverSpec` 
object.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22187
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95117/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22187
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22187
  
**[Test build #95117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95117/testReport)**
 for PR 22187 at commit 
[`0626de7`](https://github.com/apache/spark/commit/0626de74f726622ac3eb251fc9f66aaa3de002d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22190
  
**[Test build #95124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95124/testReport)**
 for PR 22190 at commit 
[`6dbc6e0`](https://github.com/apache/spark/commit/6dbc6e09b5a50764b29050ac30eb9c4f9f8970b8).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95124/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22157: [SPARK-25126][SQL] Avoid creating Reader for all orc fil...

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22157
  
**[Test build #4285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4285/testReport)**
 for PR 22157 at commit 
[`9afdac6`](https://github.com/apache/spark/commit/9afdac6da180b9c8959696941701c734e9a3fe8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22190
  
**[Test build #95124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95124/testReport)**
 for PR 22190 at commit 
[`6dbc6e0`](https://github.com/apache/spark/commit/6dbc6e09b5a50764b29050ac30eb9c4f9f8970b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22190#discussion_r212121224
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/MicroBatchWriteSupport.scala
 ---
@@ -18,27 +18,38 @@
 package org.apache.spark.sql.execution.streaming.sources
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.sources.v2.writer.{BatchWriteSupport, 
DataWriter, DataWriterFactory, WriterCommitMessage}
-import 
org.apache.spark.sql.sources.v2.writer.streaming.{StreamingDataWriterFactory, 
StreamingWriteSupport}
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+import org.apache.spark.sql.sources.v2.writer.{BatchWriteSupport, 
DataWriter, DataWriterFactory, WriteConfig, WriterCommitMessage}
+import 
org.apache.spark.sql.sources.v2.writer.streaming.{StreamingDataWriterFactory, 
StreamingWriteConfig, StreamingWriteSupport}
+import org.apache.spark.sql.types.StructType
 
 /**
  * A [[BatchWriteSupport]] used to hook V2 stream writers into a 
microbatch plan. It implements
  * the non-streaming interface, forwarding the epoch ID determined at 
construction to a wrapped
  * streaming write support.
  */
-class MicroBatchWritSupport(eppchId: Long, val writeSupport: 
StreamingWriteSupport)
+class MicroBatchWriteSupport(eppchId: Long, val writeSupport: 
StreamingWriteSupport)
--- End diff --

This fixed a typo in the class name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2462/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22190#discussion_r212120878
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchPartitionOverwriteSupport.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.writer;
+
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * An interface that adds support to {@link BatchWriteSupport} for a 
replace data operation that
+ * replaces partitions dynamically with the output of a write operation.
+ * 
+ * Data source implementations can implement this interface in addition to 
{@link BatchWriteSupport}
+ * to support write operations that replace all partitions in the output 
table that are present
+ * in the write's output data.
+ * 
+ * This is used to implement INSERT OVERWRITE ... PARTITIONS.
+ */
+public interface BatchPartitionOverwriteSupport extends BatchWriteSupport {
--- End diff --

This class will be used to create a `WriteConfig` that instructs the data 
source to replace partitions in the existing data with partitions of a 
dataframe. The logical plan would be `DynamicPartitionOverwrite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95123/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22190
  
**[Test build #95123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95123/testReport)**
 for PR 22190 at commit 
[`e3fcc83`](https://github.com/apache/spark/commit/e3fcc83a4a55576821573ceb9a3a56b89218a187).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22190#discussion_r212120411
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchPartitionOverwriteSupport.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.writer;
+
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * An interface that adds support to {@link BatchWriteSupport} for a 
replace data operation that
+ * replaces partitions dynamically with the output of a write operation.
+ * 
+ * Data source implementations can implement this interface in addition to 
{@link BatchWriteSupport}
+ * to support write operations that replace all partitions in the output 
table that are present
+ * in the write's output data.
+ * 
+ * This is used to implement INSERT OVERWRITE ... PARTITIONS.
+ */
+public interface BatchPartitionOverwriteSupport {
--- End diff --

This class will be used to create a `WriteConfig` that instructs the data 
source to replace partitions in the existing data with partitions of a 
dataframe. The logical plan would be `DynamicPartitionOverwrite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22190#discussion_r212119716
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchOverwriteSupport.java
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.writer;
+
+import org.apache.spark.sql.catalyst.plans.logical.Filter;
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * An interface that adds support to {@link BatchWriteSupport} for a 
replace data operation that
+ * replaces a subset of the output table with the output of a write 
operation. The subset removed is
+ * determined by a set of filter expressions.
+ * 
+ * Data source implementations can implement this interface in addition to 
{@link BatchWriteSupport}
+ * to support idempotent write operations that replace data matched by a 
set of delete filters with
+ * the result of the write operation.
+ * 
+ * This is used to build idempotent writes. For example, a query that 
produces a daily summary
+ * may be run several times as new data arrives. Each run should replace 
the output of the last
+ * run for a particular day in the partitioned output table. Such a job 
would write using this
+ * WriteSupport and would pass a filter matching the previous job's 
output, like
+ * $"day" === '2018-08-22', to remove that data and commit 
the replacement data at
+ * the same time.
+ */
+public interface BatchOverwriteSupport extends BatchWriteSupport {
--- End diff --

This class will be used to create the `WriteConfig` for idempotent 
overwrite operations. This would be triggered by an overwrite like this (the 
API could be different).

```
df.writeTo("table").overwrite($"day" === "2018-08-22")
```

That would produce a `OverwriteData(source, deleteFilter, query)` logical 
plan, which would result in the exec node calling this to create the write 
config.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22190#discussion_r212118021
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 ---
@@ -279,10 +277,7 @@ private[kafka010] class KafkaSourceProvider extends 
DataSourceRegister
 // We convert the options argument from V2 -> Java map -> scala 
mutable -> scala immutable.
 val producerParams = 
kafkaParamsForProducer(options.asMap.asScala.toMap)
 
-KafkaWriter.validateQuery(
--- End diff --

This query validation happens in `KafkaStreamingWriteSupport`. It was 
duplicated here and in that class. Now, it happens just once when creating the 
scan config.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: SPARK-25188: Add WriteConfig to v2 write API.

2018-08-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22190
  
**[Test build #95123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95123/testReport)**
 for PR 22190 at commit 
[`e3fcc83`](https://github.com/apache/spark/commit/e3fcc83a4a55576821573ceb9a3a56b89218a187).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: SPARK-25188: Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: SPARK-25188: Add WriteConfig to v2 write API.

2018-08-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2461/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22190: SPARK-25188: Add WriteConfig to v2 write API.

2018-08-22 Thread rdblue

GitHub user rdblue opened a pull request:

https://github.com/apache/spark/pull/22190

SPARK-25188: Add WriteConfig to v2 write API.

## What changes were proposed in this pull request?

This updates the v2 write path to a similar structure as the v2 read path. 
Individual writes are configured and tracked using `WriteConfig` (analogous to 
`ScanConfig`) and this config is passed to the methods of `WriteSupport` that 
are specific to a single write, like `commit` and `abort`.

This new config will be used to communicate overwrite options to data 
sources that implement new support classes, `BatchOverwriteSupport` and 
`BatchPartitionOverwriteSupport`. The new config could also be used by 
implementations to get and hold locks to make operations atomic.

Streaming is also updated to use a `StreamingWriteConfig`. Options that are 
specific to a write, like schema, output mode, and write options.

## How was this patch tested?

This is primarily an API change and should pass existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rdblue/spark SPARK-25188-add-write-config

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22190.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22190


commit e3fcc83a4a55576821573ceb9a3a56b89218a187
Author: Ryan Blue 
Date:   2018-08-22T21:17:11Z

SPARK-25188: Add WriteConfig to v2 write API.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] change the generated code of the keyS...

2018-08-22 Thread rednaxelafx

Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/22187
  
So the new solution now is to directly ship the `StructType` object as a 
reference object. Why not? ;-)
I'm +1 on shipping the object directly instead of generating code to 
recreate it on the executor. If other reviewers are also on board with this 
approach, let's update the PR title and the JIRA ticket title to reflect that 
(since we're no longer generating any "names" at all now).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

101 - 200 of 600 matches

Mail list logo