Re: [Structured Streaming] Using File Sink to store to hive table.

Sam Elamin Sat, 11 Feb 2017 00:48:52 -0800

Hey Egor


You can use for each writer or you can write a custom sink. I personally
went with a custom sink since I get a dataframe per batch

https://github.com/samelamin/spark-bigquery/blob/master/src/main/scala/com/samelamin/spark/bigquery/streaming/BigQuerySink.scala

You can have a look at how I implemented something similar to file sink
that in the event if a failure skips batches already written


Also have a look at Micheals reply to me a few days ago on exactly the same
topic. The email subject was called structured streaming. Dropping
duplicates


Regards

Sam

On Sat, 11 Feb 2017 at 07:59, Jacek Laskowski <ja...@japila.pl> wrote:

"Something like that" I've never tried it out myself so I'm only
guessing having a brief look at the API.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Feb 11, 2017 at 1:31 AM, Egor Pahomov <pahomov.e...@gmail.com>
wrote:
> Jacek, so I create cache in ForeachWriter, in all "process()" I write to
it
> and on close I flush? Something like that?
>
> 2017-02-09 12:42 GMT-08:00 Jacek Laskowski <ja...@japila.pl>:
>>
>> Hi,
>>
>> Yes, that's ForeachWriter.
>>
>> Yes, it works with element by element. You're looking for mapPartition
>> and ForeachWriter has partitionId that you could use to implement a
>> similar thing.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov <pahomov.e...@gmail.com>
>> wrote:
>> > Jacek, you mean
>> >
>> >
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter
>> > ? I do not understand how to use it, since it passes every value
>> > separately,
>> > not every partition. And addding to table value by value would not work
>> >
>> > 2017-02-07 12:10 GMT-08:00 Jacek Laskowski <ja...@japila.pl>:
>> >>
>> >> Hi,
>> >>
>> >> Have you considered foreach sink?
>> >>
>> >> Jacek
>> >>
>> >> On 6 Feb 2017 8:39 p.m., "Egor Pahomov" <pahomov.e...@gmail.com>
wrote:
>> >>>
>> >>> Hi, I'm thinking of using Structured Streaming instead of old
>> >>> streaming,
>> >>> but I need to be able to save results to Hive table. Documentation
for
>> >>> file
>> >>> sink
>> >>>
>> >>> says(
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks
):
>> >>> "Supports writes to partitioned tables. ". But being able to write to
>> >>> partitioned directories is not enough to write to the table: someone
>> >>> needs
>> >>> to write to Hive metastore. How can I use Structured Streaming and
>> >>> write to
>> >>> Hive table?
>> >>>
>> >>> --
>> >>> Sincerely yours
>> >>> Egor Pakhomov
>> >
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Egor Pakhomov
>
>
>
>
> --
> Sincerely yours
> Egor Pakhomov

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Structured Streaming] Using File Sink to store to hive table.

Reply via email to