subject:"Remove Hadoop 1 support \(Hadoop 2.2\) for Spark 1.5\?"

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-13 Thread Patrick Wendell

Yeah so Steve, hopefully it's self evident, but that is a perfect
example of the kind of annoying stuff we don't want to force users to
deal with by forcing an upgrade to 2.X. Compare the pain from Spark
users of trying to reason about what to do (and btw it seems like the
answer is simply that there isn't a good answer). And that will be
experienced by every Spark users who uses AWS and the Spark ec2
scripts, which are extremely popular.

Is this pain, in aggregate, more than our cost of having a few patches
to deal with runtime reflection stuff to make things work with Hadoop
1? My feeling is that it's much more efficient for us as the Spark
maintainers to pay this cost rather than to force a lot of our users
to deal with painful upgrades.

On Sat, Jun 13, 2015 at 1:39 AM, Steve Loughran ste...@hortonworks.com wrote:

 On 12 Jun 2015, at 17:12, Patrick Wendell pwend...@gmail.com wrote:

  For instance at Databricks we use
 the FileSystem library for talking to S3... every time we've tried to
 upgrade to Hadoop 2.X there have been significant regressions in
 performance and we've had to downgrade. That's purely anecdotal, but I
 think you have people out there using the Hadoop 1 bindings for whom
 upgrade would be a pain.

 ah s3n. The unloved orphan FS, which has been fairly neglected as being 
 non-strategic to anyone but Amazon, who have a private fork.

 s3n broke in hadopo 2.4 where the upgraded Jets3t went in with some patch 
 which swallowed exceptions (nobody should ever do that) and as result would 
 NPE on a seek(0) of a file of length(0). HADOOP-10457. Fixed in Hadoop 2.5

 Hadoop 2.6 has left S3n on maintenance out of fear of breaking more things, 
 future work is in s3a:,, which switched to the amazon awstoolkit JAR and 
 moved the implementation to hadoop-aws JAR. S3a promises: speed, partitioned 
 upload, better auth.

 But: it's not ready for serious use in Hadoop 2.6, so don't try. You need the 
 Hadoop 2.7 patches, which are in ASF Hadoop 2.7, will be in HDP2.3, and have 
 been picked up in CDH5.3. (HADOOP-11571). For Spark, the fact that the block 
 size is being returned as 0 in getFileStatus() could be the killer.

 Future work is going to improve performance and scale ( HADOOP-11694 )

 Now, if spark is finding problems with s3a performance, tests for this would 
 be great -complaints on JIRAs too. There's not enough functional testing of 
 analytics workloads against the object stores, especially s3 and swift. If 
 someone volunteers to add some optional test module for object store testing, 
 I'll help review it and suggest some tests to generate stress

 That can be done without the leap to Hadoop 2 —though the proposed 
 HADOOP-9565 work allowing object stores to declare that they are and publish 
 some of their consistency and atomicity semantics will be Hadoop 2.8+. If you 
 want your output committers to recognise when the destination is an 
 eventually constitent object store with O(n) directory rename and delete, 
 that's where the code will be.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-13 Thread Steve Loughran


 On 12 Jun 2015, at 17:12, Patrick Wendell pwend...@gmail.com wrote:
 
  For instance at Databricks we use
 the FileSystem library for talking to S3... every time we've tried to
 upgrade to Hadoop 2.X there have been significant regressions in
 performance and we've had to downgrade. That's purely anecdotal, but I
 think you have people out there using the Hadoop 1 bindings for whom
 upgrade would be a pain.

ah s3n. The unloved orphan FS, which has been fairly neglected as being 
non-strategic to anyone but Amazon, who have a private fork. 

s3n broke in hadopo 2.4 where the upgraded Jets3t went in with some patch which 
swallowed exceptions (nobody should ever do that) and as result would NPE on a 
seek(0) of a file of length(0). HADOOP-10457. Fixed in Hadoop 2.5

Hadoop 2.6 has left S3n on maintenance out of fear of breaking more things, 
future work is in s3a:,, which switched to the amazon awstoolkit JAR and moved 
the implementation to hadoop-aws JAR. S3a promises: speed, partitioned upload, 
better auth. 

But: it's not ready for serious use in Hadoop 2.6, so don't try. You need the 
Hadoop 2.7 patches, which are in ASF Hadoop 2.7, will be in HDP2.3, and have 
been picked up in CDH5.3. (HADOOP-11571). For Spark, the fact that the block 
size is being returned as 0 in getFileStatus() could be the killer.

Future work is going to improve performance and scale ( HADOOP-11694 )

Now, if spark is finding problems with s3a performance, tests for this would be 
great -complaints on JIRAs too. There's not enough functional testing of 
analytics workloads against the object stores, especially s3 and swift. If 
someone volunteers to add some optional test module for object store testing, 
I'll help review it and suggest some tests to generate stress

That can be done without the leap to Hadoop 2 —though the proposed HADOOP-9565 
work allowing object stores to declare that they are and publish some of their 
consistency and atomicity semantics will be Hadoop 2.8+. If you want your 
output committers to recognise when the destination is an eventually constitent 
object store with O(n) directory rename and delete, that's where the code will 
be.

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Steve Loughran

+1 for 2.2+

Not only are the APis in Hadoop 2 better, there's more people testing Hadoop 
2.x  spark, and bugs in Hadoop itself being fixed.

(usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

 On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
 How does the idea of removing support for Hadoop 1.x for Spark 1.5
 strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
 consistent with the modern 2.x line than 2.1 or 2.0.
 
 The arguments against are simply, well, someone out there might be
 using these versions.
 
 The arguments for are just simplification -- fewer gotchas in trying
 to keep supporting older Hadoop, of which we've seen several lately.
 We get to chop out a little bit of shim code and update to use some
 non-deprecated APIs. Along with removing support for Java 6, it might
 be a reasonable time to also draw a line under older Hadoop too.
 
 I'm just gauging feeling now: for, against, indifferent?
 I favor it, but would not push hard on it if there are objections.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Patrick Wendell

I feel this is quite different from the Java 6 decision and personally
I don't see sufficient cause to do it.

I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
that makes much sense since so many libraries still use those API's.
For YARN support, we already don't support Hadoop 1. So I'll assume
what you mean is to prevent or stop supporting from linking against
the Hadoop 1 filesystem binaries at runtime (is that right?).

The main reason I'd push back is that I do think there are still
people running the older versions. For instance at Databricks we use
the FileSystem library for talking to S3... every time we've tried to
upgrade to Hadoop 2.X there have been significant regressions in
performance and we've had to downgrade. That's purely anecdotal, but I
think you have people out there using the Hadoop 1 bindings for whom
upgrade would be a pain.

In terms of our maintenance cost, to me the much bigger cost for us
IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
major new API's were added. In comparison the Hadoop 1 vs 2 seems
fairly low with just a few bugs cropping up here and there. So unlike
Java 6 where you have a critical mass of maintenance issues, security
issues, etc, I just don't see as compelling a cost here.

To me the framework for deciding about these upgrades is the
maintenance cost vs the inconvenience for users.

- Patrick

On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 I'm personally in favor, but I don't have a sense of how many people still
 rely on Hadoop 1.

 Nick

 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
 ste...@hortonworks.com님이 작성:

 +1 for 2.2+

 Not only are the APis in Hadoop 2 better, there's more people testing
 Hadoop 2.x  spark, and bugs in Hadoop itself being fixed.

 (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

  On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
  How does the idea of removing support for Hadoop 1.x for Spark 1.5
  strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
  consistent with the modern 2.x line than 2.1 or 2.0.
 
  The arguments against are simply, well, someone out there might be
  using these versions.
 
  The arguments for are just simplification -- fewer gotchas in trying
  to keep supporting older Hadoop, of which we've seen several lately.
  We get to chop out a little bit of shim code and update to use some
  non-deprecated APIs. Along with removing support for Java 6, it might
  be a reasonable time to also draw a line under older Hadoop too.
 
  I'm just gauging feeling now: for, against, indifferent?
  I favor it, but would not push hard on it if there are objections.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Ram Sriharsha

+1 for Hadoop 2.2+

On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 I'm personally in favor, but I don't have a sense of how many people still
 rely on Hadoop 1.

 Nick

 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
 ste...@hortonworks.com님이 작성:

 +1 for 2.2+

 Not only are the APis in Hadoop 2 better, there's more people testing
 Hadoop 2.x  spark, and bugs in Hadoop itself being fixed.

 (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

  On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
  How does the idea of removing support for Hadoop 1.x for Spark 1.5
  strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
  consistent with the modern 2.x line than 2.1 or 2.0.
 
  The arguments against are simply, well, someone out there might be
  using these versions.
 
  The arguments for are just simplification -- fewer gotchas in trying
  to keep supporting older Hadoop, of which we've seen several lately.
  We get to chop out a little bit of shim code and update to use some
  non-deprecated APIs. Along with removing support for Java 6, it might
  be a reasonable time to also draw a line under older Hadoop too.
 
  I'm just gauging feeling now: for, against, indifferent?
  I favor it, but would not push hard on it if there are objections.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Shivaram Venkataraman

My 2 cents: The biggest reason from my view for keeping Hadoop 1 support
was that our EC2 scripts which launch an environment for benchmarking /
testing / research only supported Hadoop 1 variants till very recently.  We
did add Hadoop 2.4 support a few weeks back but that it is still not the
default option.

My concern is that people have higher level projects which are linked to
Hadoop 1.0.4 + Spark, because that is the default environment on EC2, and
that users will be surprised when these applications stop working in Spark
1.5. I guess we could announce more widely and write transition guides, but
if the cost of supporting Hadoop1 is low enough, I'd vote to keeping it.

Thanks
Shivaram

On Fri, Jun 12, 2015 at 9:11 AM, Ram Sriharsha sriharsha@gmail.com
wrote:

 +1 for Hadoop 2.2+

 On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 I'm personally in favor, but I don't have a sense of how many people
 still rely on Hadoop 1.

 Nick

 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
 ste...@hortonworks.com님이 작성:

 +1 for 2.2+

 Not only are the APis in Hadoop 2 better, there's more people testing
 Hadoop 2.x  spark, and bugs in Hadoop itself being fixed.

 (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

  On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
  How does the idea of removing support for Hadoop 1.x for Spark 1.5
  strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
  consistent with the modern 2.x line than 2.1 or 2.0.
 
  The arguments against are simply, well, someone out there might be
  using these versions.
 
  The arguments for are just simplification -- fewer gotchas in trying
  to keep supporting older Hadoop, of which we've seen several lately.
  We get to chop out a little bit of shim code and update to use some
  non-deprecated APIs. Along with removing support for Java 6, it might
  be a reasonable time to also draw a line under older Hadoop too.
 
  I'm just gauging feeling now: for, against, indifferent?
  I favor it, but would not push hard on it if there are objections.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen

How does the idea of removing support for Hadoop 1.x for Spark 1.5
strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
consistent with the modern 2.x line than 2.1 or 2.0.

The arguments against are simply, well, someone out there might be
using these versions.

The arguments for are just simplification -- fewer gotchas in trying
to keep supporting older Hadoop, of which we've seen several lately.
We get to chop out a little bit of shim code and update to use some
non-deprecated APIs. Along with removing support for Java 6, it might
be a reasonable time to also draw a line under older Hadoop too.

I'm just gauging feeling now: for, against, indifferent?
I favor it, but would not push hard on it if there are objections.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Nicholas Chammas

I'm personally in favor, but I don't have a sense of how many people still
rely on Hadoop 1.

Nick

2015년 6월 12일 (금) 오전 9:13, Steve Loughran
ste...@hortonworks.com님이 작성:

+1 for 2.2+

 Not only are the APis in Hadoop 2 better, there's more people testing
 Hadoop 2.x  spark, and bugs in Hadoop itself being fixed.

 (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)

  On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
 
  How does the idea of removing support for Hadoop 1.x for Spark 1.5
  strike everyone? Really, I mean, Hadoop  2.2, as 2.2 seems to me more
  consistent with the modern 2.x line than 2.1 or 2.0.
 
  The arguments against are simply, well, someone out there might be
  using these versions.
 
  The arguments for are just simplification -- fewer gotchas in trying
  to keep supporting older Hadoop, of which we've seen several lately.
  We get to chop out a little bit of shim code and update to use some
  non-deprecated APIs. Along with removing support for Java 6, it might
  be a reasonable time to also draw a line under older Hadoop too.
 
  I'm just gauging feeling now: for, against, indifferent?
  I favor it, but would not push hard on it if there are objections.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen

On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote:
 I would like to understand though Sean - what is the proposal exactly?
 Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
 removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think

Not entirely; you can see some binary incompatibilities that have
bitten recently. A Hadoop 1 program does not in general work on Hadoop
2 because of this.

Part of my thinking is that I'm not clear Hadoop 1.x, and 2.0.x, fully
works anymore anyway. See for example SPARK-8057 recently. I recall
similar problems with Hadoop 2.0.x-era releases and the Spark build
for that which is basically the 'cdh4' build.

So one benefit is skipping whatever work would be needed to continue
to fix this up, and, the argument is there may be less loss of
functionality than it seems. The other is being able to use later
APIs. This much is a little minor.


 The main reason I'd push back is that I do think there are still
 people running the older versions. For instance at Databricks we use
 the FileSystem library for talking to S3... every time we've tried to
 upgrade to Hadoop 2.X there have been significant regressions in
 performance and we've had to downgrade. That's purely anecdotal, but I
 think you have people out there using the Hadoop 1 bindings for whom
 upgrade would be a pain.

Yeah, that's the question. Is anyone out there using 1.x? More
anecdotes wanted. That might be the most interesting question.

No CDH customers would have been for a long while now, for example.
(Still a small number of CDH 4 customers out there though, and that's
2.0.x or so, but that's a gray area.)

Is the S3 library thing really related to Hadoop 1.x? that comes from
jets3t and that's independent.


 In terms of our maintenance cost, to me the much bigger cost for us
 IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
 major new API's were added. In comparison the Hadoop 1 vs 2 seems

Really? I'd say the opposite. No APIs that are only in 2.2, let alone
only in a later version, can be in use now, right? 1.x wouldn't work
at all then. I don't know of any binary incompatibilities of the type
between 1.x and 2.x, which we have had to shim to make work.

In both cases dependencies have to be harmonized here and there, yes.
That won't change.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen

I don't imagine that can be guaranteed to be supported anyway... the
0.x branch has never necessarily worked with Spark, even if it might
happen to. Is this really something you would veto for everyone
because of your deployment?

On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak tom...@gmail.com wrote:
 -1 to this, we use it with an old Hadoop version (well, a fork of an old
 version, 0.23). That being said, if there were a nice developer api that
 separates Spark from Hadoop (or rather, two APIs, one for scheduling and one
 for HDFS), then we'd be happy to maintain our own plugins for those.

 cheers,
 Tom


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Matei Zaharia

I don't like the idea of removing Hadoop 1 unless it becomes a significant 
maintenance burden, which I don't think it is. You'll always be surprised how 
many people use old software, even though various companies may no longer 
support them.

With Hadoop 2 in particular, I may be misremembering, but I believe that the 
experience on Windows is considerably worse because it requires these shell 
scripts to set permissions that it won't find if you just download Spark. That 
would be one reason to keep Hadoop 1 in the default build. But I could be 
wrong, it's been a while since I tried Windows.

Matei


 On Jun 12, 2015, at 11:21 AM, Sean Owen so...@cloudera.com wrote:
 
 I don't imagine that can be guaranteed to be supported anyway... the
 0.x branch has never necessarily worked with Spark, even if it might
 happen to. Is this really something you would veto for everyone
 because of your deployment?
 
 On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak tom...@gmail.com wrote:
 -1 to this, we use it with an old Hadoop version (well, a fork of an old
 version, 0.23). That being said, if there were a nice developer api that
 separates Spark from Hadoop (or rather, two APIs, one for scheduling and one
 for HDFS), then we'd be happy to maintain our own plugins for those.
 
 cheers,
 Tom
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Thomas Dudziak

0.23 (and hive 0.12) code base in Spark works well from our perspective, so
not sure what you are referring to. As I said, I'm happy to maintain my own
plugins but as it stands there is no sane way to do so in Spark because
there is no clear separation/developer APIs for these.

cheers,
Tom

On Fri, Jun 12, 2015 at 11:21 AM, Sean Owen so...@cloudera.com wrote:

 I don't imagine that can be guaranteed to be supported anyway... the
 0.x branch has never necessarily worked with Spark, even if it might
 happen to. Is this really something you would veto for everyone
 because of your deployment?

 On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak tom...@gmail.com wrote:
  -1 to this, we use it with an old Hadoop version (well, a fork of an old
  version, 0.23). That being said, if there were a nice developer api that
  separates Spark from Hadoop (or rather, two APIs, one for scheduling and
 one
  for HDFS), then we'd be happy to maintain our own plugins for those.
 
  cheers,
  Tom

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Thomas Dudziak

-1 to this, we use it with an old Hadoop version (well, a fork of an old
version, 0.23). That being said, if there were a nice developer api that
separates Spark from Hadoop (or rather, two APIs, one for scheduling and
one for HDFS), then we'd be happy to maintain our own plugins for those.

cheers,
Tom

On Fri, Jun 12, 2015 at 9:42 AM, Sean Owen so...@cloudera.com wrote:

 On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  I would like to understand though Sean - what is the proposal exactly?
  Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
  removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think

 Not entirely; you can see some binary incompatibilities that have
 bitten recently. A Hadoop 1 program does not in general work on Hadoop
 2 because of this.

 Part of my thinking is that I'm not clear Hadoop 1.x, and 2.0.x, fully
 works anymore anyway. See for example SPARK-8057 recently. I recall
 similar problems with Hadoop 2.0.x-era releases and the Spark build
 for that which is basically the 'cdh4' build.

 So one benefit is skipping whatever work would be needed to continue
 to fix this up, and, the argument is there may be less loss of
 functionality than it seems. The other is being able to use later
 APIs. This much is a little minor.


  The main reason I'd push back is that I do think there are still
  people running the older versions. For instance at Databricks we use
  the FileSystem library for talking to S3... every time we've tried to
  upgrade to Hadoop 2.X there have been significant regressions in
  performance and we've had to downgrade. That's purely anecdotal, but I
  think you have people out there using the Hadoop 1 bindings for whom
  upgrade would be a pain.

 Yeah, that's the question. Is anyone out there using 1.x? More
 anecdotes wanted. That might be the most interesting question.

 No CDH customers would have been for a long while now, for example.
 (Still a small number of CDH 4 customers out there though, and that's
 2.0.x or so, but that's a gray area.)

 Is the S3 library thing really related to Hadoop 1.x? that comes from
 jets3t and that's independent.


  In terms of our maintenance cost, to me the much bigger cost for us
  IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
  major new API's were added. In comparison the Hadoop 1 vs 2 seems

 Really? I'd say the opposite. No APIs that are only in 2.2, let alone
 only in a later version, can be in use now, right? 1.x wouldn't work
 at all then. I don't know of any binary incompatibilities of the type
 between 1.x and 2.x, which we have had to shim to make work.

 In both cases dependencies have to be harmonized here and there, yes.
 That won't change.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

13 matches

Site Navigation

Mail list logo

Footer information