Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83085871
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala
---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83074741
--- Diff:
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
---
@@ -264,6 +266,44 @@ class KafkaSourceSuite
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83074489
--- Diff:
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
---
@@ -264,6 +266,44 @@ class KafkaSourceSuite
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83084266
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
---
@@ -259,15 +260,37 @@ class
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83061090
--- Diff:
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
---
@@ -264,6 +266,44 @@ class KafkaSourceSuite
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83077567
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83082403
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83082993
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -516,12 +568,127 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83083859
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
---
@@ -17,22 +17,78 @@
package
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83079156
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -278,8 +315,14 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r83060899
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -86,7 +93,13 @@ case class StateStoreSaveExec
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r83057793
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r83056944
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r83056792
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/test/JavaStringLength.java ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r83056607
--- Diff: python/pyspark/sql/context.py ---
@@ -202,6 +202,32 @@ def registerFunction(self, name, f,
returnType=StringType
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r83057132
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r82876529
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r82876442
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r82876419
--- Diff: python/pyspark/sql/context.py ---
@@ -202,6 +202,26 @@ def registerFunction(self, name, f,
returnType=StringType
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r82876509
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -17,9 +17,15 @@
package org.apache.spark.sql
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15422
I agree that the data that was already read is probably good. I also think
that this is a pretty big behavior change where there are legitimate cases
(i.e. tons of data and it is fine to miss
Repository: spark
Updated Branches:
refs/heads/master 4bafacaa5 -> 689de9200
[SPARK-17830] Annotate spark.sql package with InterfaceStability
## What changes were proposed in this pull request?
This patch annotates the InterfaceStability level for top level classes in
o.a.spark.sql and o.a.sp
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15392
LGTM, merging to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/9766
+1 to this functionality, but also to the request to add more tests and
documentation. It would also to be good to comment on the idea of using SQL as
a more general way to implement this
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9766#discussion_r82450939
--- Diff: python/pyspark/sql/context.py ---
@@ -202,6 +202,10 @@ def registerFunction(self, name, f,
returnType=StringType
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15354#discussion_r82443073
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
---
@@ -343,4 +343,23 @@ class
Repository: spark
Updated Branches:
refs/heads/master aa3a6841e -> bb1aaf28e
[SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.
## What changes were proposed in this pull request?
Adds the textFile API which exists in DataFrameReader and serves same purpose.
## How was this
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/14087
Thanks, merging to master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15367
No, if we backport this I would plan to continue to backport changes (that
are safe) until the next release. Either way this should not affect what goes
into master.
---
If your project is set
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/14087#discussion_r82293680
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
---
@@ -311,6 +311,37 @@ final class DataStreamReader
private[sql
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/14087#discussion_r82293331
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -378,6 +378,24 @@ class FileStreamSourceSuite extends
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15367
We should definitly vet this PR carefully to make sure its safe. One thing
that is missing from that guide, that I do believe is accepted practice, is
more leeway when the feature is marked
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15380
Merged, can you close this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Repository: spark
Updated Branches:
refs/heads/branch-1.6 376545e4d -> d3890deb7
[SPARK-15062][SQL] Backport fix list type infer serializer issue
This backports
https://github.com/apache/spark/commit/733cbaa3c0ff617a630a9d6937699db37ad2943b
to Branch 1.6. It's a pretty simple patch, and would
Repository: spark
Updated Branches:
refs/heads/branch-2.0 225372adf -> a2bf09588
[SPARK-17780][SQL] Report Throwable to user in StreamExecution
## What changes were proposed in this pull request?
When using an incompatible source for structured streaming, it may throw
NoClassDefFoundError. I
Repository: spark
Updated Branches:
refs/heads/master 79accf45a -> 9a48e60e6
[SPARK-17780][SQL] Report Throwable to user in StreamExecution
## What changes were proposed in this pull request?
When using an incompatible source for structured streaming, it may throw
NoClassDefFoundError. It's
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15352
LGTM, I'm going to merge this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
en
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15352#discussion_r82269155
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -207,13 +207,18 @@ class StreamExecution
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15380
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15362
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81889941
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -525,8 +645,62 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81873789
--- Diff: project/MimaExcludes.scala ---
@@ -53,7 +53,14 @@ object MimaExcludes {
ProblemFilters.exclude[ReversedMissingMethodProblem
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81873207
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81882888
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala
---
@@ -30,8 +30,15 @@ import
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81872358
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
---
@@ -100,28 +110,138 @@ class StreamingQuerySuite extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81872395
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
---
@@ -100,28 +110,138 @@ class StreamingQuerySuite extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81873307
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -511,12 +572,71 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81874159
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -136,16 +145,30 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81872770
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala
---
@@ -30,8 +30,15 @@ import
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81875491
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -525,8 +645,62 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81873537
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -525,8 +645,62 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81872893
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceStatus.scala ---
@@ -26,9 +26,13 @@ import
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81882933
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81874295
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -105,11 +111,14 @@ class StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81873108
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81844299
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,231 @@
+---
+layout: global
+title: Structured Streaming + Kafka
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81840127
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,231 @@
+---
+layout: global
+title: Structured Streaming + Kafka
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15333
This seems like a reasonable simplification to me. A little bit of history
(though this has diverged significantly, so don't take this authoritative): I
think this complexity all stems
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81834388
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,231 @@
+---
+layout: global
+title: Structured Streaming + Kafka
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81836478
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -530,3 +530,8 @@ object StreamExecution
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81836317
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -469,29 +469,49 @@ trait StreamTest extends QueryTest with
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81833814
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,231 @@
+---
+layout: global
+title: Structured Streaming + Kafka
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r81835883
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,282 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/14087#discussion_r81610590
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
---
@@ -21,13 +21,13 @@ import scala.collection.JavaConverters
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
I spent a while playing around with this today on a real cluster, and
overall it is pretty cool! I have a few suggestions we should implement in the
long run, but these can probably be done in
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15274
@HyukjinKwon absolutely. I actually changed the name from `json_parser` to
`from_json` in anticipation of adding `to_json` :)
---
If your project is set up for it, you can reply to this email
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15274
Emailed the list. Seems like a popular feature so far :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15274#discussion_r80837055
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
---
@@ -467,3 +469,26 @@ case class JsonTuple
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
FYI: #15274 adds support for parsing JSON from the key/value into a Spark
SQL `StructType`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/15274
[SPARK-17699] Support for parsing JSON string columns
Spark SQL has great support for reading text files that contain JSON data.
However, in many cases the JSON data is just one column amongst
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80599802
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80567253
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80564114
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80563908
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80568269
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80568479
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80568553
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80568624
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80564169
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80584097
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80563435
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80562234
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80567477
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80563104
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80563033
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80565318
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80568036
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/15102#discussion_r80563543
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -0,0 +1,446 @@
+/*
+ * Licensed to the
Repository: spark
Updated Branches:
refs/heads/master bde85f8b7 -> 8135e0e5e
[SPARK-17153][SQL] Should read partition data when reading new files in
filestream without globbing
## What changes were proposed in this pull request?
When reading file stream with non-globbing path, the results re
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/14803
Thanks, I'm going to merge this to master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15195
I'm sorry, I don't understand the goal of this patch. Recovering from a
checkpoint only makes sense if your sink is stateful. What is the use case you
are trying to support?
-
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
> "I want to be able to add a topicpartition mid stream, but I don't want
to start it from the beginning."
I see, I was thinking only of new topics that appear that match
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
Comparable requirement removed in #15207.
> I think in the absence of prior information about the position in a
topicpartition, you start a new batch on topic B starting from wherever
GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/15207
[SPARK-17643] Remove comparable requirement from Offset
For some sources, it is difficult to provide a global ordering based only
on the data in the offset. Since we don't use compariso
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
For streaming you already know what the global order is, because you know
when you asked for A and B. I agree that we should probably remove the
comparable requirement from `Offset` in favor of
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15197
Thanks for working on this, it does seem like it could be useful. I'm not
sure if this should go into Spark or into a separate package. It really
depends on how many people want this fe
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15201
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/14803
Mostly looks good, I've also asked @tdas to take a look since he wrote this
initially.
A few more cases came to mind while while I was rephrasing your
documentation. Specifi
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/14803#discussion_r80120376
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -512,6 +512,10 @@ csvDF = spark \
These examples generate streaming DataFrames
Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
I asked @koeninger to clarify the specific suggestions he is referring to
above, here's my response:
> [Comments here and on JIRA relating to concerns with the `Offset`
implem
501 - 600 of 8851 matches
Mail list logo