Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Paul Rogers
Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid ingestion. If the data needs cleaning, we would expect users to employ something like Spark to do that task, then emit clean data to Kafka or files, which Druid MSQ can ingest. That is: Dirty data —> Spark —> Kafka/Files

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Maytas Monsereenusorn
Hi Julian, Thank you so much for your contribution on Spark support. As an existing committer, I would like to help get the Spark connector merged into OSS (including PR reviews and any other development work that may be needed). We can move the conversation regarding Spark support into a new

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Julian Jaffe
For Spark support, the connector I wrote remains functional but I haven’t updated the PR for six months or so since it didn’t seem like there was an appetite for review. If that’s changing I could migrate back some more recent changes to the OSS PR. Even with an up-to-date patch though I see