[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83085871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83074741 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83074489 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83084266 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala --- @@ -259,15 +260,37 @@ class

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83061090 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83077567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +568,127 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83083859 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -17,22 +17,78 @@ package

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83079156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -278,8 +315,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83060899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -86,7 +93,13 @@ case class StateStoreSaveExec

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056792 --- Diff: sql/core/src/main/java/org/apache/spark/sql/test/JavaStringLength.java --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056607 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,32 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057132 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876529 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876419 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,26 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -17,9 +17,15 @@ package org.apache.spark.sql

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15422 I agree that the data that was already read is probably good. I also think that this is a pretty big behavior change where there are legitimate cases (i.e. tons of data and it is fine to miss

spark git commit: [SPARK-17830] Annotate spark.sql package with InterfaceStability

2016-10-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4bafacaa5 -> 689de9200 [SPARK-17830] Annotate spark.sql package with InterfaceStability ## What changes were proposed in this pull request? This patch annotates the InterfaceStability level for top level classes in o.a.spark.sql and o.a.sp

[GitHub] spark issue #15392: [SPARK-17830] Annotate spark.sql package with InterfaceS...

2016-10-10 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15392 LGTM, merging to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/9766 +1 to this functionality, but also to the request to add more tests and documentation. It would also to be good to comment on the idea of using SQL as a more general way to implement this

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82450939 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,10 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r82443073 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -343,4 +343,23 @@ class

spark git commit: [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.

2016-10-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master aa3a6841e -> bb1aaf28e [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming. ## What changes were proposed in this pull request? Adds the textFile API which exists in DataFrameReader and serves same purpose. ## How was this

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14087 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 No, if we backport this I would plan to continue to backport changes (that are safe) until the next release. Either way this should not affect what goes into master. --- If your project is set

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293680 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -311,6 +311,37 @@ final class DataStreamReader private[sql

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293331 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -378,6 +378,24 @@ class FileStreamSourceSuite extends

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 We should definitly vet this PR carefully to make sure its safe. One thing that is missing from that guide, that I do believe is accepted practice, is more leeway when the feature is marked

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 Merged, can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

spark git commit: [SPARK-15062][SQL] Backport fix list type infer serializer issue

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 376545e4d -> d3890deb7 [SPARK-15062][SQL] Backport fix list type infer serializer issue This backports https://github.com/apache/spark/commit/733cbaa3c0ff617a630a9d6937699db37ad2943b to Branch 1.6. It's a pretty simple patch, and would

spark git commit: [SPARK-17780][SQL] Report Throwable to user in StreamExecution

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 225372adf -> a2bf09588 [SPARK-17780][SQL] Report Throwable to user in StreamExecution ## What changes were proposed in this pull request? When using an incompatible source for structured streaming, it may throw NoClassDefFoundError. I

spark git commit: [SPARK-17780][SQL] Report Throwable to user in StreamExecution

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 79accf45a -> 9a48e60e6 [SPARK-17780][SQL] Report Throwable to user in StreamExecution ## What changes were proposed in this pull request? When using an incompatible source for structured streaming, it may throw NoClassDefFoundError. It's

[GitHub] spark issue #15352: [SPARK-17780][SQL]Report Throwable to user in StreamExec...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15352 LGTM, I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request #15352: [SPARK-17780][SQL]Report Throwable to user in Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15352#discussion_r82269155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -207,13 +207,18 @@ class StreamExecution

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15362: [SPARK-17643] Remove comparable requirement from Offset ...

2016-10-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15362 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81889941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873789 --- Diff: project/MimaExcludes.scala --- @@ -53,7 +53,14 @@ object MimaExcludes { ProblemFilters.exclude[ReversedMissingMethodProblem

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872395 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -511,12 +572,71 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874159 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -136,16 +145,30 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872770 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81875491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873537 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceStatus.scala --- @@ -26,9 +26,13 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -105,11 +111,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873108 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81844299 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81840127 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark issue #15333: [SPARK-17761][SQL] Remove MutableRow

2016-10-04 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15333 This seems like a reasonable simplification to me. A little bit of history (though this has diverged significantly, so don't take this authoritative): I think this complexity all stems

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81834388 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -530,3 +530,8 @@ object StreamExecution

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836317 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -469,29 +469,49 @@ trait StreamTest extends QueryTest with

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81833814 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81835883 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,282 @@ +/* + * Licensed to

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r81610590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -21,13 +21,13 @@ import scala.collection.JavaConverters

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-29 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 I spent a while playing around with this today on a real cluster, and overall it is pretty cool! I have a few suggestions we should implement in the long run, but these can probably be done in

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2016-09-29 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15274 @HyukjinKwon absolutely. I actually changed the name from `json_parser` to `from_json` in anticipation of adding `to_json` :) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2016-09-28 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15274 Emailed the list. Seems like a popular feature so far :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15274#discussion_r80837055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -467,3 +469,26 @@ case class JsonTuple

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 FYI: #15274 adds support for parsing JSON from the key/value into a Spark SQL `StructType` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15274 [SPARK-17699] Support for parsing JSON string columns Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80599802 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80567253 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80564114 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563908 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568269 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568479 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568553 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568624 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80564169 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80584097 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563435 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80562234 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80567477 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563104 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563033 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80565318 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568036 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563543 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed to the

spark git commit: [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing

2016-09-26 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master bde85f8b7 -> 8135e0e5e [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing ## What changes were proposed in this pull request? When reading file stream with non-globbing path, the results re

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14803 Thanks, I'm going to merge this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15195: [SPARK-17632][SQL]make console sink and other sinks work...

2016-09-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15195 I'm sorry, I don't understand the goal of this patch. Recovering from a checkpoint only makes sense if your sink is stateful. What is the use case you are trying to support? -

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 > "I want to be able to add a topicpartition mid stream, but I don't want to start it from the beginning." I see, I was thinking only of new topics that appear that match

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 Comparable requirement removed in #15207. > I think in the absence of prior information about the position in a topicpartition, you start a new batch on topic B starting from wherever

[GitHub] spark pull request #15207: [SPARK-17643] Remove comparable requirement from ...

2016-09-22 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15207 [SPARK-17643] Remove comparable requirement from Offset For some sources, it is difficult to provide a global ordering based only on the data in the offset. Since we don't use compariso

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 For streaming you already know what the global order is, because you know when you asked for A and B. I agree that we should probably remove the comparable requirement from `Offset` in favor of

[GitHub] spark issue #15197: [SPARK-17631] [SQL] Add HttpStreamSink for structured st...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15197 Thanks for working on this, it does seem like it could be useful. I'm not sure if this should go into Spark or into a separate package. It really depends on how many people want this fe

[GitHub] spark issue #15201: [SPARK-17638][Streaming]Stop JVM StreamingContext when t...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15201 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14803 Mostly looks good, I've also asked @tdas to take a look since he wrote this initially. A few more cases came to mind while while I was rephrasing your documentation. Specifi

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-09-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r80120376 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -512,6 +512,10 @@ csvDF = spark \ These examples generate streaming DataFrames

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 I asked @koeninger to clarify the specific suggestions he is referring to above, here's my response: > [Comments here and on JIRA relating to concerns with the `Offset` implem

<    1   2   3   4   5   6   7   8   9   10   >