Re: Spark batch with Druid

2019-02-13 Thread Gian Merlino
I'd guess the majority of users are just using Druid itself to process Druid data, although there are a few people out there that export it into other systems using techniques like the above. On Wed, Feb 13, 2019 at 2:00 PM Rajiv Mordani wrote: > Am curious to know how people are generally

Re: Spark batch with Druid

2019-02-11 Thread Julian Jaffe
Spark can convert an RDD of JSON strings into an RDD/DataFrame/DataSet of objects parsed from the JSON (something like `sparkSession.read.json(jsonStringRDD)`). You could hook this up to a Druid response, but I would definitely recommend looking through the code that Gian posted instead - it reads

Re: Spark batch with Druid

2019-02-09 Thread Rajiv Mordani
Thanks Julian, See some questions in-line: On 2/6/19, 3:01 PM, "Julian Jaffe" wrote: I think this question is going the other way (e.g. how to read data into Spark, as opposed to into Druid). For that, the quickest and dirtiest approach is probably to use Spark's json

Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
wrote: > > > Hey Rajiv, > > > > There's an unofficial Druid/Spark adapter at: > > https://github.com/metamx/druid-spark-batch. If you want to stick to > > official things, then the best approach would be to use Spark to write > data > > to HDFS or S3 and

Re: Spark batch with Druid

2019-02-06 Thread Julian Jaffe
/SparklineData/spark-druid-olap, but I don't think there's any official guidance on this. On Wed, Feb 6, 2019 at 2:21 PM Gian Merlino wrote: > Hey Rajiv, > > There's an unofficial Druid/Spark adapter at: > https://github.com/metamx/druid-spark-batch. If you want to stick to > official things

Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
Hey Rajiv, There's an unofficial Druid/Spark adapter at: https://github.com/metamx/druid-spark-batch. If you want to stick to official things, then the best approach would be to use Spark to write data to HDFS or S3 and then ingest it into Druid using Druid's Hadoop-based or native batch

Spark batch with Druid

2019-02-06 Thread Rajiv Mordani
Is there a best practice for how to load data from druid to use in a spark batch job? I asked this question on the user alias but got no response hence reposting here. * Rajiv