Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-20 Thread DB Tsai
In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
embedded Hadoop 3.2, you can set
`spark.yarn.populateHadoopClasspath=false` to not populate the
cluster's hadoop classpath. In this scenario, Spark will use hadoop
3.2 client to connect to hadoop 2.6 which should work fine. In fact,
we have production deployment using this way for a while.

On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga  wrote:
>
> Greetings,
>
> Hadoop 2.6 has been removed according to this ticket 
> https://issues.apache.org/jira/browse/SPARK-25016
>
> We run our Spark cluster on K8s in standalone mode.
> We access HDFS/Hive running on a Hadoop 2.6 cluster.
> We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
> However, we dont have any control over the Hadoop cluster and it will remain 
> in 2.6
>
> Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>
> Best Regards,



-- 
Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread DB Tsai
If it's standalone mode, it's even easier. You should be able to
connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just
don't put hadoop 2.6 into your classpath.

On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya
 wrote:
>
> Hello
>
> "spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
> However our Spark cluster is standalone cluster not using YARN.
> We only connect to HDFS/Hive to access data.Computation is done on our spark 
> cluster running on K8s (not Yarn)
>
>
> On Mon, Jul 20, 2020 at 2:04 PM DB Tsai  wrote:
>>
>> In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
>> embedded Hadoop 3.2, you can set
>> `spark.yarn.populateHadoopClasspath=false` to not populate the
>> cluster's hadoop classpath. In this scenario, Spark will use hadoop
>> 3.2 client to connect to hadoop 2.6 which should work fine. In fact,
>> we have production deployment using this way for a while.
>>
>> On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga  
>> wrote:
>> >
>> > Greetings,
>> >
>> > Hadoop 2.6 has been removed according to this ticket 
>> > https://issues.apache.org/jira/browse/SPARK-25016
>> >
>> > We run our Spark cluster on K8s in standalone mode.
>> > We access HDFS/Hive running on a Hadoop 2.6 cluster.
>> > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
>> > However, we dont have any control over the Hadoop cluster and it will 
>> > remain in 2.6
>> >
>> > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>> >
>> > Best Regards,
>>
>>
>>
>> --
>> Sincerely,
>>
>> DB Tsai
>> --
>> Web: https://www.dbtsai.com
>> PGP Key ID: 42E5B25A8F7A82C1
>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com



-- 
Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Ashika Umanga Umagiliya
Hello

"spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
However our Spark cluster is standalone cluster not using YARN.
We only connect to HDFS/Hive to access data.Computation is done on our
spark cluster running on K8s (not Yarn)


On Mon, Jul 20, 2020 at 2:04 PM DB Tsai  wrote:

> In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
> embedded Hadoop 3.2, you can set
> `spark.yarn.populateHadoopClasspath=false` to not populate the
> cluster's hadoop classpath. In this scenario, Spark will use hadoop
> 3.2 client to connect to hadoop 2.6 which should work fine. In fact,
> we have production deployment using this way for a while.
>
> On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga 
> wrote:
> >
> > Greetings,
> >
> > Hadoop 2.6 has been removed according to this ticket
> https://issues.apache.org/jira/browse/SPARK-25016
> >
> > We run our Spark cluster on K8s in standalone mode.
> > We access HDFS/Hive running on a Hadoop 2.6 cluster.
> > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
> > However, we dont have any control over the Hadoop cluster and it will
> remain in 2.6
> >
> > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
> >
> > Best Regards,
>
>
>
> --
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 42E5B25A8F7A82C1
>


-- 
Umanga
http://jp.linkedin.com/in/umanga
http://umanga.ifreepages.com


Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Prashant Sharma
Hi Ashika,

Hadoop 2.6 is now no longer supported, and since it has not been maintained
in the last 2 years, it means it may have some security issues unpatched.
Spark 3.0 onwards, we no longer support it, in other words, we have
modified our codebase in a way that Hadoop 2.6 won't work. However, if you
are determined, you can always apply a custom patch to spark codebase and
support it. I would recommend moving to newer Hadoop.

Thanks,

On Mon, Jul 20, 2020 at 8:41 AM Ashika Umanga 
wrote:

> Greetings,
>
> Hadoop 2.6 has been removed according to this ticket
> https://issues.apache.org/jira/browse/SPARK-25016
>
> We run our Spark cluster on K8s in standalone mode.
> We access HDFS/Hive running on a Hadoop 2.6 cluster.
> We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
> However, we dont have any control over the Hadoop cluster and it will
> remain in 2.6
>
> Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>
> Best Regards,
>