Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-29 Thread Will Xu
If there is a spark ingestion option, would you be open to move away from hadoop or there are other factors that might prevent a move? Regards, Will Product@Imply On Mon, Aug 29, 2022 at 8:15 AM Will Lauer wrote: > @Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3 >

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-29 Thread Will Lauer
@Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3 stability, so I can't say for sure, but I understand the need to migrate and all the dependency headaches involved in NOT migrating. At this point, I expect druid moving to hadoop3 makes sense. I suspect that _we_ won't be

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Paul Rogers
Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid ingestion. If the data needs cleaning, we would expect users to employ something like Spark to do that task, then emit clean data to Kafka or files, which Druid MSQ can ingest. That is: Dirty data —> Spark —> Kafka/Files

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Maytas Monsereenusorn
Hi Julian, Thank you so much for your contribution on Spark support. As an existing committer, I would like to help get the Spark connector merged into OSS (including PR reviews and any other development work that may be needed). We can move the conversation regarding Spark support into a new

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Julian Jaffe
For Spark support, the connector I wrote remains functional but I haven’t updated the PR for six months or so since it didn’t seem like there was an appetite for review. If that’s changing I could migrate back some more recent changes to the OSS PR. Even with an up-to-date patch though I see

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-09 Thread Abhishek Agarwal
Yes. We should deprecate it first which is similar to dropping the support (no more active development) but we will still ship it for a release or two. In a way, we are already in that mode to a certain extent. Many features are being built with native ingestion as a first-class citizen. E.g.

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-08 Thread Gian Merlino
It's always good to deprecate things for some time prior to removing them, so we don't need to (nor should we) remove Hadoop 2 support right now. My vote is that in this upcoming release, we should deprecate it. The main problem in my eyes is the one Abhishek brought up: the dependency management

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-03 Thread Abhishek Agarwal
I was thinking that moving from Hadoop 2 to Hadoop 3 will be a low-resistance path than moving from Hadoop to Spark. even if we get that PR merged, it will take good time for spark integration to reach the same level of maturity as Hadoop or Native ingestion. BTW I am not making an argument

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-07-26 Thread Samarth Jain
I am sure there are other companies out there who are still on Hadoop 2.x with migration to Hadoop 3.x being a no-go. If Druid was to drop support for Hadoop 3.x completely, I am afraid it would prevent users from updating to newer versions of Druid which would be a shame. FWIW, we have found in

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-07-26 Thread Abhishek Agarwal
Reviving this conversation again. @Will - Do you still have concerns about HDFS stability? Hadoop 3 has been around for some time now and is very stable as far as I know. The dependencies coming from Hadoop 2 are also old enough that they cause dependency scans to fail. E.g. Log4j 1.x