Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid
ingestion. If the data needs cleaning, we would expect users to employ
something like Spark to do that task, then emit clean data to Kafka or files,
which Druid MSQ can ingest. That is:
Dirty data —> Spark —> Kafka/Files
Hi Julian,
Thank you so much for your contribution on Spark support. As an existing
committer, I would like to help get the Spark connector merged into OSS
(including PR reviews and any other development work that may be needed).
We can move the conversation regarding Spark support into a new
For Spark support, the connector I wrote remains functional but I haven’t
updated the PR for six months or so since it didn’t seem like there was an
appetite for review. If that’s changing I could migrate back some more recent
changes to the OSS PR. Even with an up-to-date patch though I see