I think adding something like this (if it doesn't already exist) could help make structured streaming easier to use, foreachBatch is not the best API.
On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > I guess the method, query parameter, header, and the payload would be all > different for almost every use case - that makes it hard to generalize and > requires implementation to be pretty much complicated to be flexible enough. > > I'm not aware of any custom sink implementing REST so your best bet would > be simply implementing your own with foreachBatch, but so someone might > jump in and provide a pointer if there is something in the Spark ecosystem. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> wrote: > >> Hi All, >> >> >> We ingest alot of restful APIs into our lake and I'm wondering if it is >> at all possible to created a rest sink in structured streaming? >> >> For now I'm only focusing on restful services that have an incremental ID >> so my sink can just poll for new data then ingest. >> >> I can't seem to find a connector that does this and my gut instinct tells >> me it's probably because it isn't possible due to something completely >> obvious that I am missing >> >> I know some RESTful API obfuscate the IDs to a hash of strings and that >> could be a problem but since I'm planning on focusing on just numerical IDs >> that just get incremented I think I won't be facing that issue >> >> >> Can anyone let me know if this sounds like a daft idea? Will I need >> something like Kafka or kinesis as a buffer and redundancy or am I >> overthinking this? >> >> >> I would love to bounce ideas with people who runs structured streaming >> jobs in production >> >> >> Kind regards >> San >> >> >> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau