[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-11-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15710#discussion_r86057467 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOutputWriter.scala --- @@ -17,125 +17,13 @@ package

[GitHub] spark issue #15723: [SPARK-18214][SQL] Simplify RuntimeReplaceable type coer...

2016-11-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15723 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-11-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15710#discussion_r86056877 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOutputWriter.scala --- @@ -17,125 +17,13 @@ package

[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-11-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15710#discussion_r86057220 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -17,106 +17,16 @@ package

[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-11-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15710#discussion_r86056587 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala --- @@ -0,0 +1,114 @@ +/* + * Licensed

[GitHub] spark issue #15724: [SPARK-18216][SQL] Make Column.expr public

2016-11-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15724 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86053925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-11-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15354 Thanks, I'm going to merge this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15702 Not a dumb question! You can certainly use processing time if those are the semantics you require. I do think there is a little bit of work we need to do to ensure determinism for these

[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Event Time Watermarks

2016-10-31 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15702 @ekl - flaky test... Should we turn it off for now? retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSuite by n...

2016-10-31 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15699 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15696#discussion_r85847969 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -141,15 +139,14 @@ class SimpleTextOutputWriter

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...

2016-10-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r85845880 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...

2016-10-31 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15702 [SPARK-18124] Observed-delay based Even Time Watermarks This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-28 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15626 Thanks for working on this! Could you include examples of the various logs, since we are committing to this specific JSON. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r85583061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -494,3 +495,46 @@ case class JsonToStruct

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r85583042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2936,6 +2936,51 @@ object functions { def from_json(e: Column, schema

[GitHub] spark issue #15453: [SPARK-17770] [CATALYST] making ObjectType public

2016-10-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15453 Thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r85248697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -123,8 +122,9 @@ private[sql] class

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r85248522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -29,4 +31,28 @@ object JacksonUtils { case x

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/10162 I would not bake this logic into the logical operator, I think the general approach of doing it in Dataset is better. I just think that we need to do it in a way that does not change existing

[GitHub] spark issue #15629: [SQL][DOC] updating doc for JSON source to link to jsonl...

2016-10-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15629 I don't think we have to update deprecated methods. This LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15634: [SPARK-18103] [SQL] Rename *FileCatalog to *FileProvider

2016-10-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15634 This is okay, but note that "Provider" is equally overloaded in the Data Source API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #15483: [SPARK-17935][SQL]Add KafkaForeachWriter in external kaf...

2016-10-24 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15483 I will try to take a look soon. Can we close this PR until we have an updated implementation? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15354 It would be really nice to fail in analysis rather than execution. What if it only fails after hours of computation? As a user I'd be upset. I'm also concerned they will think it

[GitHub] spark issue #15483: [SPARK-17935][SQL]Add KafkaForeachWriter in external kaf...

2016-10-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15483 It would be good to post on the design / interfaces before you get too far. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15483: [SPARK-17935][SQL]Add KafkaForeachWriter in external kaf...

2016-10-20 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15483 Thanks for working on this! However, I'm not sure that this is something that we should merge into the core repository (Though I think its an awesome example of how to use the `ForeachW

[GitHub] spark issue #15469: [SPARK-17900][SQL] Graduate a list of Spark SQL APIs to ...

2016-10-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15469 LGTM, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/9766 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #12337: [SPARK-15566] Expose null checking function to Python la...

2016-10-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/12337 +1 to this feature! I think this might be the first step in a better story for people trying to use `nullable = false` as an enforcement mechanism (I'd bring this idea up on [SPARK-17939](

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

2016-10-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/10162 Thanks for working on this! Sorry for letting this PR go stale. While I think this could be a good feature, I'm worried that as its implemented it would be a breaking change (since w

[GitHub] spark issue #13780: [SPARK-16063][SQL] Add storageLevel to Dataset

2016-10-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/13780 Sorry for the delay. I'm going to merge this to master. I'll update the since versions while merging. Thanks for working on this! --- If your project is set up for it, you can rep

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r82297835 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -92,21 +105,64 @@ class TextSocketSource(host: String, port

[GitHub] spark issue #15284: [SPARK-17368] [SQL] Add support for value class serializ...

2016-10-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15284 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15307 This LGTM as a first cut. Thanks for working on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r83110773 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -343,4 +343,23 @@ class

[GitHub] spark issue #15453: [SPARK-17770] [CATALYST] making ObjectType public

2016-10-12 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15453 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83096825 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83074367 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082691 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +568,127 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83079220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -351,25 +403,26 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83085871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83074741 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83074489 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83084266 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala --- @@ -259,15 +260,37 @@ class

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83061090 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -264,6 +266,44 @@ class KafkaSourceSuite

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83077567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +568,127 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83083859 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -17,22 +17,78 @@ package

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83079156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -278,8 +315,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83060899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -86,7 +93,13 @@ case class StateStoreSaveExec

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056792 --- Diff: sql/core/src/main/java/org/apache/spark/sql/test/JavaStringLength.java --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056607 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,32 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057132 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876529 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876419 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,26 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -17,9 +17,15 @@ package org.apache.spark.sql

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15422 I agree that the data that was already read is probably good. I also think that this is a pretty big behavior change where there are legitimate cases (i.e. tons of data and it is fine to miss

[GitHub] spark issue #15392: [SPARK-17830] Annotate spark.sql package with InterfaceS...

2016-10-10 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15392 LGTM, merging to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/9766 +1 to this functionality, but also to the request to add more tests and documentation. It would also to be good to comment on the idea of using SQL as a more general way to implement this

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82450939 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,10 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r82443073 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -343,4 +343,23 @@ class

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14087 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 No, if we backport this I would plan to continue to backport changes (that are safe) until the next release. Either way this should not affect what goes into master. --- If your project is set

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293680 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -311,6 +311,37 @@ final class DataStreamReader private[sql

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293331 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -378,6 +378,24 @@ class FileStreamSourceSuite extends

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 We should definitly vet this PR carefully to make sure its safe. One thing that is missing from that guide, that I do believe is accepted practice, is more leeway when the feature is marked

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 Merged, can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15352: [SPARK-17780][SQL]Report Throwable to user in StreamExec...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15352 LGTM, I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request #15352: [SPARK-17780][SQL]Report Throwable to user in Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15352#discussion_r82269155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -207,13 +207,18 @@ class StreamExecution

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15362: [SPARK-17643] Remove comparable requirement from Offset ...

2016-10-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15362 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81889941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873789 --- Diff: project/MimaExcludes.scala --- @@ -53,7 +53,14 @@ object MimaExcludes { ProblemFilters.exclude[ReversedMissingMethodProblem

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872395 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -511,12 +572,71 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874159 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -136,16 +145,30 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872770 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81875491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873537 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceStatus.scala --- @@ -26,9 +26,13 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -105,11 +111,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873108 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81844299 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81840127 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark issue #15333: [SPARK-17761][SQL] Remove MutableRow

2016-10-04 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15333 This seems like a reasonable simplification to me. A little bit of history (though this has diverged significantly, so don't take this authoritative): I think this complexity all stems

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81834388 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -530,3 +530,8 @@ object StreamExecution

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836317 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -469,29 +469,49 @@ trait StreamTest extends QueryTest with

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81833814 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81835883 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,282 @@ +/* + * Licensed to

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r81610590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -21,13 +21,13 @@ import scala.collection.JavaConverters

<    1   2   3   4   5   6   7   8   9   10   >