[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-25 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766530579 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-24 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766568619 Hmm, I'm fine if you think we should always require a custom function to produce the output. This is an

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-24 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766530579 > Is it too hard requirement to explain the actual use case, especially you've said you have internal customer claiming this feature? I don't think my request requires

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-24 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766530579 > Is it too hard requirement to explain the actual use case, especially you've said you have internal customer claiming this feature? I don't think my request requires

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-23 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766291831 > I understand the functionality is lacking on SS. There's a workaround like foreachBatch -> toRDD -> pipe but streaming operations can't be added after calling pipe. So

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-22 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765879991 > Please just create an executable which prints out stdin (serialized data) and passes to the pipe API... I think it's the easiest way to realize. Ok, ok. I didn't

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-22 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765878887 > It is an issue because encoder only specifies how an object would map to the internal physical structure of the row, and by exposing this pipe API, we are exposing the

[GitHub] [spark] viirya edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

2021-01-22 Thread GitBox
viirya edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765877154 > Yes the question is also applied to RDD.pipe as well, but the serialization is done via `OutputStreamWriter.println` which is relatively "known" - `String.valueOf(T)`