Hey Egor
You can use for each writer or you can write a custom sink. I personally went with a custom sink since I get a dataframe per batch https://github.com/samelamin/spark-bigquery/blob/master/src/main/scala/com/samelamin/spark/bigquery/streaming/BigQuerySink.scala You can have a look at how I implemented something similar to file sink that in the event if a failure skips batches already written Also have a look at Micheals reply to me a few days ago on exactly the same topic. The email subject was called structured streaming. Dropping duplicates Regards Sam On Sat, 11 Feb 2017 at 07:59, Jacek Laskowski <ja...@japila.pl> wrote: "Something like that" I've never tried it out myself so I'm only guessing having a brief look at the API. Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Feb 11, 2017 at 1:31 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > Jacek, so I create cache in ForeachWriter, in all "process()" I write to it > and on close I flush? Something like that? > > 2017-02-09 12:42 GMT-08:00 Jacek Laskowski <ja...@japila.pl>: >> >> Hi, >> >> Yes, that's ForeachWriter. >> >> Yes, it works with element by element. You're looking for mapPartition >> and ForeachWriter has partitionId that you could use to implement a >> similar thing. >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov <pahomov.e...@gmail.com> >> wrote: >> > Jacek, you mean >> > >> > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter >> > ? I do not understand how to use it, since it passes every value >> > separately, >> > not every partition. And addding to table value by value would not work >> > >> > 2017-02-07 12:10 GMT-08:00 Jacek Laskowski <ja...@japila.pl>: >> >> >> >> Hi, >> >> >> >> Have you considered foreach sink? >> >> >> >> Jacek >> >> >> >> On 6 Feb 2017 8:39 p.m., "Egor Pahomov" <pahomov.e...@gmail.com> wrote: >> >>> >> >>> Hi, I'm thinking of using Structured Streaming instead of old >> >>> streaming, >> >>> but I need to be able to save results to Hive table. Documentation for >> >>> file >> >>> sink >> >>> >> >>> says( http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks ): >> >>> "Supports writes to partitioned tables. ". But being able to write to >> >>> partitioned directories is not enough to write to the table: someone >> >>> needs >> >>> to write to Hive metastore. How can I use Structured Streaming and >> >>> write to >> >>> Hive table? >> >>> >> >>> -- >> >>> Sincerely yours >> >>> Egor Pakhomov >> > >> > >> > >> > >> > -- >> > Sincerely yours >> > Egor Pakhomov > > > > > -- > Sincerely yours > Egor Pakhomov --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org