date:20151120

Re: Removing the Mesos fine-grained mode

2015-11-20 Thread Adam McElwee

I've used fine-grained mode on our mesos spark clusters until this week,
mostly because it was the default. I started trying coarse-grained because
of the recent chatter on the mailing list about wanting to move the mesos
execution path to coarse-grained only. The odd things is, coarse-grained vs
fine-grained seems to yield drastic cluster utilization metrics for any of
our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to
derail this conversation. Otherwise, details below:

We monitor our spark clusters with ganglia, and historically, we maintain
at least 90% cpu utilization across the cluster. Making a single
configuration change to use coarse-grained execution instead of
fine-grained consistently yields a cpu utilization pattern that starts
around 90% at the beginning of the job, and then it slowly decreases over
the next 1-1.5 hours to level out around 65% cpu utilization on the
cluster. Does anyone have a clue why I'd be seeing such a negative effect
of switching to coarse-grained mode? GC activity is comparable in both
cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

Thanks,
-Adam

On Fri, Nov 20, 2015 at 9:53 AM, Iulian Dragoș 
wrote:

> This is a good point. We should probably document this better in the
> migration notes. In the mean time:
>
>
> http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos
>
> Roughly, dynamic allocation lets Spark add and kill executors based on the
> scheduling delay. The min and max number of executors can be configured.
> Would this fit your use-case?
>
> iulian
>
>
> On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers 
> wrote:
>
>> As a recent fine-grained mode adopter I'm now confused after reading this
>> and other resources from spark-summit, the docs, ...  so can someone please
>> advise me for our use-case?
>>
>> We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs
>> which should take resources away from the streaming jobs and give 'em back
>> upon completion.
>>
>> Can someone point me at the docs or a guide to set this up?
>>
>> Thanks!
>>
>> - Jo Voordeckers
>>
>>
>> On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris 
>> wrote:
>>
>>> I was one that argued for fine-grain mode, and there is something I
>>> still appreciate about how fine-grain mode operates in terms of the way one
>>> would define a Mesos framework. That said, with dyn-allocation and Mesos
>>> support for both resource reservation, oversubscription and revocation, I
>>> think the direction is clear that the coarse mode is the proper way
>>> forward, and having the two code paths is just noise.
>>>
>>> -Chris
>>>
>>> From: Iulian Dragoș 
>>> Date: Thursday, November 19, 2015 at 6:42 AM
>>> To: "dev@spark.apache.org" 
>>> Subject: Removing the Mesos fine-grained mode
>>>
>>> Hi all,
>>>
>>> Mesos is the only cluster manager that has a fine-grained mode, but it's
>>> more often than not problematic, and it's a maintenance burden. I'd like to
>>> suggest removing it in the 2.0 release.
>>>
>>> A few reasons:
>>>
>>> - code/maintenance complexity. The two modes duplicate a lot of
>>> functionality (and sometimes code) that leads to subtle differences or
>>> bugs. See SPARK-10444
>>> 
>>>  and
>>> also this thread
>>> 
>>>  and MESOS-3202
>>> 
>>> - it's not widely used (Reynold's previous thread
>>> 
>>> got very few responses from people relying on it)
>>> - similar functionality can be achieved with dynamic allocation +
>>> coarse-grained mode
>>>
>>> I suggest that Spark 1.6 already issues a warning if it detects
>>> fine-grained use, with removal in the 2.0 release.
>>>
>>> Thoughts?
>>>
>>>

Re: Unhandled case in VectorAssembler

2015-11-20 Thread Joseph Bradley

Yes, please, could you send a JIRA (and PR)?  A custom error message would
be better.
Thank you!
Joseph

On Fri, Nov 20, 2015 at 2:39 PM, BenFradet 
wrote:

> Hey there,
>
> I noticed that there is an unhandled case in the transform method of
> VectorAssembler if one of the input columns doesn't have one of the
> supported type DoubleType, NumericType, BooleanType or VectorUDT.
>
> So, if you try to transform a column of StringType you get a cryptic
> "scala.MatchError: StringType".
> I was wondering if we shouldn't throw a custom exception indicating that
> this is not a supported type.
>
> I can submit a jira and pr if needed.
>
> Best regards,
> Ben.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Unhandled-case-in-VectorAssembler-tp15302.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Chester Chen

for #1-3, the answer is likely No.

  Recently we upgrade to Spark 1.5.1, with CDH5.3, CDH5.4 and HDP2.2  and
others.

  We were using CDH5.3 client to talk to CDH5.4. We were doing this to see
if we support many different hadoop cluster versions without changing the
build. This was ok for yarn-cluster spark 1.3.1, but could not get spark
1.5.1 started. We upgrade the client to CDH5.4, then everything works.

  There are API changes between Apache 2.4 and 2.6, not sure you can mix
match them.

Chester


On Fri, Nov 20, 2015 at 1:59 PM, Sandy Ryza  wrote:

> To answer your fourth question from Cloudera's perspective, we would never
> support a customer running Spark 2.0 on a Hadoop version < 2.6.
>
> -Sandy
>
> On Fri, Nov 20, 2015 at 1:39 PM, Reynold Xin  wrote:
>
>> OK I'm not exactly asking for a vote here :)
>>
>> I don't think we should look at it from only maintenance point of view --
>> because in that case the answer is clearly supporting as few versions as
>> possible (or just rm -rf spark source code and call it a day). It is a
>> tradeoff between the number of users impacted and the maintenance burden.
>>
>> So a few questions for those more familiar with Hadoop:
>>
>> 1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3?
>>
>> 2. If the answer to 1 is yes, are there known, major issues with backward
>> compatibility?
>>
>> 3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?
>>
>> 4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below
>> stop? To what extent do you care about running Spark on older Hadoop
>> clusters.
>>
>>
>>
>> On Fri, Nov 20, 2015 at 7:52 AM, Steve Loughran 
>> wrote:
>>
>>>
>>> On 20 Nov 2015, at 14:28, ches...@alpinenow.com wrote:
>>>
>>> Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months
>>> away.
>>>
>>> customer will need to upgrade the new Hadoop clusters to Apache 2.6 or
>>> later to leverage new spark 2.0 in one year. I think this possible as
>>> latest release on cdh5.x,  HDP 2.x are both on Apache 2.6.0 already.
>>> Company will have enough time to upgrade cluster.
>>>
>>> +1 for me as well
>>>
>>> Chester
>>>
>>>
>>> now, if you are looking that far ahead, the other big issue is "when to
>>> retire Java 7 support".?
>>>
>>> That's a tough decision for all projects. Hadoop 3.x will be Java 8
>>> only, but nobody has committed the patch to the trunk codebase to force a
>>> java 8 build; + most of *todays* hadoop clusters are Java 7. But as you
>>> can't even download a Java 7 JDK for the desktop from oracle any more
>>> today, 2016 is a time to look at the language support and decide what is
>>> the baseline version
>>>
>>> Commentary from Twitter here -as they point out, it's not just the
>>> server farm that matters, it's all the apps that talk to it
>>>
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201503.mbox/%3ccab7mwte+kefcxsr6n46-ztcs19ed7cwc9vobtr1jqewdkye...@mail.gmail.com%3E
>>>
>>> -Steve
>>>
>>
>>
>

Unhandled case in VectorAssembler

2015-11-20 Thread BenFradet

Hey there,

I noticed that there is an unhandled case in the transform method of
VectorAssembler if one of the input columns doesn't have one of the
supported type DoubleType, NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic
"scala.MatchError: StringType".
I was wondering if we shouldn't throw a custom exception indicating that
this is not a supported type.

I can submit a jira and pr if needed.

Best regards,
Ben.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Unhandled-case-in-VectorAssembler-tp15302.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Sandy Ryza

To answer your fourth question from Cloudera's perspective, we would never
support a customer running Spark 2.0 on a Hadoop version < 2.6.

-Sandy

On Fri, Nov 20, 2015 at 1:39 PM, Reynold Xin  wrote:

> OK I'm not exactly asking for a vote here :)
>
> I don't think we should look at it from only maintenance point of view --
> because in that case the answer is clearly supporting as few versions as
> possible (or just rm -rf spark source code and call it a day). It is a
> tradeoff between the number of users impacted and the maintenance burden.
>
> So a few questions for those more familiar with Hadoop:
>
> 1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3?
>
> 2. If the answer to 1 is yes, are there known, major issues with backward
> compatibility?
>
> 3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?
>
> 4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below
> stop? To what extent do you care about running Spark on older Hadoop
> clusters.
>
>
>
> On Fri, Nov 20, 2015 at 7:52 AM, Steve Loughran 
> wrote:
>
>>
>> On 20 Nov 2015, at 14:28, ches...@alpinenow.com wrote:
>>
>> Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months
>> away.
>>
>> customer will need to upgrade the new Hadoop clusters to Apache 2.6 or
>> later to leverage new spark 2.0 in one year. I think this possible as
>> latest release on cdh5.x,  HDP 2.x are both on Apache 2.6.0 already.
>> Company will have enough time to upgrade cluster.
>>
>> +1 for me as well
>>
>> Chester
>>
>>
>> now, if you are looking that far ahead, the other big issue is "when to
>> retire Java 7 support".?
>>
>> That's a tough decision for all projects. Hadoop 3.x will be Java 8 only,
>> but nobody has committed the patch to the trunk codebase to force a java 8
>> build; + most of *todays* hadoop clusters are Java 7. But as you can't even
>> download a Java 7 JDK for the desktop from oracle any more today, 2016 is a
>> time to look at the language support and decide what is the baseline
>> version
>>
>> Commentary from Twitter here -as they point out, it's not just the server
>> farm that matters, it's all the apps that talk to it
>>
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201503.mbox/%3ccab7mwte+kefcxsr6n46-ztcs19ed7cwc9vobtr1jqewdkye...@mail.gmail.com%3E
>>
>> -Steve
>>
>
>

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Reynold Xin

OK I'm not exactly asking for a vote here :)

I don't think we should look at it from only maintenance point of view --
because in that case the answer is clearly supporting as few versions as
possible (or just rm -rf spark source code and call it a day). It is a
tradeoff between the number of users impacted and the maintenance burden.

So a few questions for those more familiar with Hadoop:

1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3?

2. If the answer to 1 is yes, are there known, major issues with backward
compatibility?

3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?

4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below
stop? To what extent do you care about running Spark on older Hadoop
clusters.

On Fri, Nov 20, 2015 at 7:52 AM, Steve Loughran 
wrote:

>
> On 20 Nov 2015, at 14:28, ches...@alpinenow.com wrote:
>
> Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months
> away.
>
> customer will need to upgrade the new Hadoop clusters to Apache 2.6 or
> later to leverage new spark 2.0 in one year. I think this possible as
> latest release on cdh5.x,  HDP 2.x are both on Apache 2.6.0 already.
> Company will have enough time to upgrade cluster.
>
> +1 for me as well
>
> Chester
>
>
> now, if you are looking that far ahead, the other big issue is "when to
> retire Java 7 support".?
>
> That's a tough decision for all projects. Hadoop 3.x will be Java 8 only,
> but nobody has committed the patch to the trunk codebase to force a java 8
> build; + most of *todays* hadoop clusters are Java 7. But as you can't even
> download a Java 7 JDK for the desktop from oracle any more today, 2016 is a
> time to look at the language support and decide what is the baseline
> version
>
> Commentary from Twitter here -as they point out, it's not just the server
> farm that matters, it's all the apps that talk to it
>
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201503.mbox/%3ccab7mwte+kefcxsr6n46-ztcs19ed7cwc9vobtr1jqewdkye...@mail.gmail.com%3E
>
> -Steve
>

Re: Removing the Mesos fine-grained mode

2015-11-20 Thread Iulian Dragoș

This is a good point. We should probably document this better in the
migration notes. In the mean time:

http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos

Roughly, dynamic allocation lets Spark add and kill executors based on the
scheduling delay. The min and max number of executors can be configured.
Would this fit your use-case?

iulian


On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers 
wrote:

> As a recent fine-grained mode adopter I'm now confused after reading this
> and other resources from spark-summit, the docs, ...  so can someone please
> advise me for our use-case?
>
> We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs
> which should take resources away from the streaming jobs and give 'em back
> upon completion.
>
> Can someone point me at the docs or a guide to set this up?
>
> Thanks!
>
> - Jo Voordeckers
>
>
> On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris  wrote:
>
>> I was one that argued for fine-grain mode, and there is something I still
>> appreciate about how fine-grain mode operates in terms of the way one would
>> define a Mesos framework. That said, with dyn-allocation and Mesos support
>> for both resource reservation, oversubscription and revocation, I think the
>> direction is clear that the coarse mode is the proper way forward, and
>> having the two code paths is just noise.
>>
>> -Chris
>>
>> From: Iulian Dragoș 
>> Date: Thursday, November 19, 2015 at 6:42 AM
>> To: "dev@spark.apache.org" 
>> Subject: Removing the Mesos fine-grained mode
>>
>> Hi all,
>>
>> Mesos is the only cluster manager that has a fine-grained mode, but it's
>> more often than not problematic, and it's a maintenance burden. I'd like to
>> suggest removing it in the 2.0 release.
>>
>> A few reasons:
>>
>> - code/maintenance complexity. The two modes duplicate a lot of
>> functionality (and sometimes code) that leads to subtle differences or
>> bugs. See SPARK-10444
>> 
>>  and
>> also this thread
>> 
>>  and MESOS-3202
>> 
>> - it's not widely used (Reynold's previous thread
>> 
>> got very few responses from people relying on it)
>> - similar functionality can be achieved with dynamic allocation +
>> coarse-grained mode
>>
>> I suggest that Spark 1.6 already issues a warning if it detects
>> fine-grained use, with removal in the 2.0 release.
>>
>> Thoughts?
>>
>> iulian
>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Steve Loughran


On 20 Nov 2015, at 14:28, ches...@alpinenow.com 
wrote:

Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months away.

customer will need to upgrade the new Hadoop clusters to Apache 2.6 or later to 
leverage new spark 2.0 in one year. I think this possible as latest release on 
cdh5.x,  HDP 2.x are both on Apache 2.6.0 already. Company will have enough 
time to upgrade cluster.

+1 for me as well

Chester


now, if you are looking that far ahead, the other big issue is "when to retire 
Java 7 support".?

That's a tough decision for all projects. Hadoop 3.x will be Java 8 only, but 
nobody has committed the patch to the trunk codebase to force a java 8 build; + 
most of *todays* hadoop clusters are Java 7. But as you can't even download a 
Java 7 JDK for the desktop from oracle any more today, 2016 is a time to look 
at the language support and decide what is the baseline version

Commentary from Twitter here -as they point out, it's not just the server farm 
that matters, it's all the apps that talk to it


http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201503.mbox/%3ccab7mwte+kefcxsr6n46-ztcs19ed7cwc9vobtr1jqewdkye...@mail.gmail.com%3E

-Steve

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread chester

Assuming we have 1.6 and 1.7 releases, then spark 2.0 is about 9 months away. 

customer will need to upgrade the new Hadoop clusters to Apache 2.6 or later to 
leverage new spark 2.0 in one year. I think this possible as latest release on 
cdh5.x,  HDP 2.x are both on Apache 2.6.0 already. Company will have enough 
time to upgrade cluster.

+1 for me as well

Chester

Sent from my iPad

> On Nov 19, 2015, at 2:14 PM, Reynold Xin  wrote:
> 
> I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I 
> think everybody is for that.
> 
> https://issues.apache.org/jira/browse/SPARK-11807
> 
> Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That is to 
> say, keep only Hadoop 2.6 and greater.
> 
> What are the community's thoughts on that?
>

Re: Support for local disk columnar storage for DataFrames

2015-11-20 Thread Cristian O

Raised this for checkpointing, hopefully it gets some priority as it's very
useful and relatively straightforward to implement ?

https://issues.apache.org/jira/browse/SPARK-11879

On 18 November 2015 at 16:31, Cristian O 
wrote:

> Hi,
>
> While these OSS efforts are interesting, they're for now quite unproven.
> Personally would be much more interested in seeing Spark incrementally
> moving towards supporting updating DataFrames on various storage
> substrates, and first of all locally, perhaps as an extension of cached
> DataFrames.
>
> However before we get full blown update support, I would suggest two
> enhancements that are fairly straightforward with the current design. If
> they make sense please let me know and I'll add them as Jiras:
>
> 1. Checkpoint support for DataFrames - as mentioned this can be as simple
> as saving to a parquet file or some other format, but would not require
> re-reading the file to alter the lineage, and would also prune the logical
> plan. Alternatively checkpointing a cached DataFrame can delegate to
> checkpointing the underlying RDD but again needs to prune the logical plan.
>
> 2. Efficient transformation of cached DataFrames to cached DataFrames - an
> efficient copy-on-write mechanism can be used to avoid unpacking
> CachedBatches (row groups) into InternalRows when building a cached
> DataFrame out of a source cached DataFrame through transformations (like an
> outer join) that only affect a small subset of rows. Statistics and
> partitioning information can be used to determine which row groups are
> affected and which can be copied *by reference* unchanged. This would
> effectively allow performing immutable updates of cached DataFrames in
> scenarios like Streaming or other iterative use cases like ML.
>
> Thanks,
> Cristian
>
>
>
> On 16 November 2015 at 08:30, Mark Hamstra 
> wrote:
>
>> FiloDB is also closely reated.  https://github.com/tuplejump/FiloDB
>>
>> On Mon, Nov 16, 2015 at 12:24 AM, Nick Pentreath <
>> nick.pentre...@gmail.com> wrote:
>>
>>> Cloudera's Kudu also looks interesting here (getkudu.io) - Hadoop
>>> input/output format support:
>>> https://github.com/cloudera/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java
>>>
>>> On Mon, Nov 16, 2015 at 7:52 AM, Reynold Xin 
>>> wrote:
>>>
 This (updates) is something we are going to think about in the next
 release or two.

 On Thu, Nov 12, 2015 at 8:57 AM, Cristian O <
 cristian.b.op...@googlemail.com> wrote:

> Sorry, apparently only replied to Reynold, meant to copy the list as
> well, so I'm self replying and taking the opportunity to illustrate with 
> an
> example.
>
> Basically I want to conceptually do this:
>
> val bigDf = sqlContext.sparkContext.parallelize((1 to 100)).map(i => 
> (i, 1)).toDF("k", "v")
> val deltaDf = sqlContext.sparkContext.parallelize(Array(1, 5)).map(i 
> => (i, 1)).toDF("k", "v")
>
> bigDf.cache()
>
> bigDf.registerTempTable("big")
> deltaDf.registerTempTable("delta")
>
> val newBigDf = sqlContext.sql("SELECT big.k, big.v + IF(delta.v is null, 
> 0, delta.v) FROM big LEFT JOIN delta on big.k = delta.k")
>
> newBigDf.cache()
> bigDf.unpersist()
>
>
> This is essentially an update of keys "1" and "5" only, in a
> dataset of 1 million keys.
>
> This can be achieved efficiently if the join would preserve the cached
> blocks that have been unaffected, and only copy and mutate the 2 affected
> blocks corresponding to the matching join keys.
>
> Statistics can determine which blocks actually need mutating. Note
> also that shuffling is not required assuming both dataframes are
> pre-partitioned by the same key K.
>
> In SQL this could actually be expressed as an UPDATE statement or for
> a more generalized use as a MERGE UPDATE:
> https://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
>
> While this may seem like a very special case optimization, it would
> effectively implement UPDATE support for cached DataFrames, for both
> optimal and non-optimal usage.
>
> I appreciate there's quite a lot here, so thank you for taking the
> time to consider it.
>
> Cristian
>
>
>
> On 12 November 2015 at 15:49, Cristian O <
> cristian.b.op...@googlemail.com> wrote:
>
>> Hi Reynold,
>>
>> Thanks for your reply.
>>
>> Parquet may very well be used as the underlying implementation, but
>> this is more than about a particular storage representation.
>>
>> There are a few things here that are inter-related and open different
>> possibilities, so it's hard to structure, but I'll give it a try:
>>
>> 1. Checkpointing DataFrames - while a DF can be saved locally as
>> parquet, just using that as a checkpoint would currently require 
>> ex

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Steve Loughran


On 19 Nov 2015, at 22:14, Reynold Xin 
mailto:r...@databricks.com>> wrote:

I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I think 
everybody is for that.

https://issues.apache.org/jira/browse/SPARK-11807

Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That is to 
say, keep only Hadoop 2.6 and greater.

What are the community's thoughts on that?


+1

It's the common APIs under pretty much shipping; EMR, CDH & HDP, and there's no 
significant API changes between it and 2.7. [There's a couple of extra records 
in job submissions in 2.7 which you can get at with reflection for AM failure 
reset window and rolling log capture patterns]. It's also getting some ongoing 
maintenance (2.6.3 being planned for dec).

It's not perfect; if I were to list troublespots to me they are : s3a isn't 
ready for use; there's better logging and tracing in later versions. But those 
aren't at the API level.

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Saisai Shao

+1.

Hadoop 2.6 would be a good choice with many features added (like supporting
long running service, label based scheduling). Currently there's lot of
reflection codes to support multiple version of Yarn, so upgrading to a
newer version will really ease the pain :).

Thanks
Saisai

On Fri, Nov 20, 2015 at 3:58 PM, Jean-Baptiste Onofré 
wrote:

> +1
>
> Regards
> JB
>
>
> On 11/19/2015 11:14 PM, Reynold Xin wrote:
>
>> I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I
>> think everybody is for that.
>>
>> https://issues.apache.org/jira/browse/SPARK-11807
>>
>> Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That
>> is to say, keep only Hadoop 2.6 and greater.
>>
>> What are the community's thoughts on that?
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: Removing the Mesos fine-grained mode

Re: Unhandled case in VectorAssembler

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Unhandled case in VectorAssembler

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Re: Removing the Mesos fine-grained mode

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Re: Support for local disk columnar storage for DataFrames

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

12 matches

Site Navigation

Mail list logo

Footer information