Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive
In Spark 3.0, if you use the `with-hadoop` Spark distribution that has embedded Hadoop 3.2, you can set `spark.yarn.populateHadoopClasspath=false` to not populate the cluster's hadoop classpath. In this scenario, Spark will use hadoop 3.2 client to connect to hadoop 2.6 which should work fine. In fact, we have production deployment using this way for a while. On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga wrote: > > Greetings, > > Hadoop 2.6 has been removed according to this ticket > https://issues.apache.org/jira/browse/SPARK-25016 > > We run our Spark cluster on K8s in standalone mode. > We access HDFS/Hive running on a Hadoop 2.6 cluster. > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0 > However, we dont have any control over the Hadoop cluster and it will remain > in 2.6 > > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? > > Best Regards, -- Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive
If it's standalone mode, it's even easier. You should be able to connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just don't put hadoop 2.6 into your classpath. On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya wrote: > > Hello > > "spark.yarn.populateHadoopClasspath" is used in YARN mode correct? > However our Spark cluster is standalone cluster not using YARN. > We only connect to HDFS/Hive to access data.Computation is done on our spark > cluster running on K8s (not Yarn) > > > On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote: >> >> In Spark 3.0, if you use the `with-hadoop` Spark distribution that has >> embedded Hadoop 3.2, you can set >> `spark.yarn.populateHadoopClasspath=false` to not populate the >> cluster's hadoop classpath. In this scenario, Spark will use hadoop >> 3.2 client to connect to hadoop 2.6 which should work fine. In fact, >> we have production deployment using this way for a while. >> >> On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga >> wrote: >> > >> > Greetings, >> > >> > Hadoop 2.6 has been removed according to this ticket >> > https://issues.apache.org/jira/browse/SPARK-25016 >> > >> > We run our Spark cluster on K8s in standalone mode. >> > We access HDFS/Hive running on a Hadoop 2.6 cluster. >> > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0 >> > However, we dont have any control over the Hadoop cluster and it will >> > remain in 2.6 >> > >> > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? >> > >> > Best Regards, >> >> >> >> -- >> Sincerely, >> >> DB Tsai >> -- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 > > > > -- > Umanga > http://jp.linkedin.com/in/umanga > http://umanga.ifreepages.com -- Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive
Hello "spark.yarn.populateHadoopClasspath" is used in YARN mode correct? However our Spark cluster is standalone cluster not using YARN. We only connect to HDFS/Hive to access data.Computation is done on our spark cluster running on K8s (not Yarn) On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote: > In Spark 3.0, if you use the `with-hadoop` Spark distribution that has > embedded Hadoop 3.2, you can set > `spark.yarn.populateHadoopClasspath=false` to not populate the > cluster's hadoop classpath. In this scenario, Spark will use hadoop > 3.2 client to connect to hadoop 2.6 which should work fine. In fact, > we have production deployment using this way for a while. > > On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga > wrote: > > > > Greetings, > > > > Hadoop 2.6 has been removed according to this ticket > https://issues.apache.org/jira/browse/SPARK-25016 > > > > We run our Spark cluster on K8s in standalone mode. > > We access HDFS/Hive running on a Hadoop 2.6 cluster. > > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0 > > However, we dont have any control over the Hadoop cluster and it will > remain in 2.6 > > > > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? > > > > Best Regards, > > > > -- > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > -- Umanga http://jp.linkedin.com/in/umanga http://umanga.ifreepages.com
Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive
Hi Ashika, Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However, if you are determined, you can always apply a custom patch to spark codebase and support it. I would recommend moving to newer Hadoop. Thanks, On Mon, Jul 20, 2020 at 8:41 AM Ashika Umanga wrote: > Greetings, > > Hadoop 2.6 has been removed according to this ticket > https://issues.apache.org/jira/browse/SPARK-25016 > > We run our Spark cluster on K8s in standalone mode. > We access HDFS/Hive running on a Hadoop 2.6 cluster. > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0 > However, we dont have any control over the Hadoop cluster and it will > remain in 2.6 > > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? > > Best Regards, >